challenges in querying a distributed relational database

15
Challenges in Querying a Distributed Relational Database

Upload: scalebase

Post on 26-Jan-2015

104 views

Category:

Data & Analytics


0 download

DESCRIPTION

This short presentation examines some common challenges that can occur when querying a distributed RDBMS. Challenges & Solutions

TRANSCRIPT

Page 1: Challenges in Querying a Distributed Relational Database

Challenges in Querying a Distributed Relational Database

Page 2: Challenges in Querying a Distributed Relational Database

2

The Challenges of Querying a Distributed RDBMS

This presentation examines some common challenges that can occur when querying a distributed RDBMS.

- Challenges- Solution

Page 3: Challenges in Querying a Distributed Relational Database

Challenges

Page 4: Challenges in Querying a Distributed Relational Database

4

The Challenges of Querying a Distributed RDBMS

A distributed relational database can give your application unlimited scalability. However, a number challenges can occur when querying a distributed RDBMS.

1. Aggregation2. Distinctive Value3. Joins4. Sub-Queries 5. The “Combination”

Page 5: Challenges in Querying a Distributed Relational Database

5

1 - The Aggregation Challenge

• Let’s assume that a company stores the HR data of several departments across multiple partitions.

• When requesting the average salary of all employees, all departments must be examined.

• If the average salary is calculated separately on each partition and later amalgamated with all other results, the final result will be inaccurate.

Page 6: Challenges in Querying a Distributed Relational Database

6

2 - The Distinctive Values Challenge

• Data entries, such as age or salary, will often repeat throughout the database.

• Finding identical values across multiple partitions can skew data analysis and produce false query results.

• When an application requests a list of distinct values, the data needs to be processed in a way where repetitions are eliminated from result set.

Page 7: Challenges in Querying a Distributed Relational Database

7

3 - The Joins Challenge

• Ideally, records that exist in different partitions should be joined after considering all of the query criteria.

The Sharding Conflict - attempting multiple joins from records that are situated across various partitions poses a challenge due to the Sharding Conflict

Page 8: Challenges in Querying a Distributed Relational Database

8

4 - The Sub-Queries Challenge

• Often the result of one query is needed to complete another query. This brings dependencies and complexity into the system.

For instance, a query examining all employees with above average salaries requires a sub-query to determine the average salary, considering all partitions. In order to yield correct results, this sub-query has to be processed independently, and before the parent query.

Page 9: Challenges in Querying a Distributed Relational Database

9

5 - The “Combination” Challenge

• Any combination of:• Aggregation• Distinctive Values• Joins• Sub-Query

For example, trying to get an average of the distinctive values of salary.

In order to accomplish this, we first need to eliminate repetitions and only then aggregate. It’s impossible to do both together.

Page 10: Challenges in Querying a Distributed Relational Database

Solution

Page 11: Challenges in Querying a Distributed Relational Database

11

Meeting the Challenges

• DBAs need to carefully consider how to arrange data across multiple partitions in a distributed database.

• Distributing the data with intelligence about the application, schema and workloads will help you avoid many conflicts.

• place data together what is used together

• Cross-partition queries will always exist. Considering the nature of the queries and the application is key to creating a functional distributed database.

Page 12: Challenges in Querying a Distributed Relational Database

12

ScaleBase – Your Distributed DDBMS Experts

ScaleBase provides specialized data distribution technology that resolve a broad range of these challenges1. ScaleBase Analysis Genie

• Free, SaaS data distribution policy builder• A guided analysis of the nature of your data, data

relationships and the functional use of your data 2. ScaleBase Software

• A distributed MySQL database management system

Page 13: Challenges in Querying a Distributed Relational Database

13

ScaleBase Analysis Genie, Free, SaaS

• Determines the best way to scale out a single MySQL instance to a distributed relational database

• Creates the best data distribution policy for your specific app by analyzing your schema and queries

• Ensures relational integrity of MySQL with the scalability of a modern distributed database architecture

• Automated or Expert mode: provides you visibility and control over all elements of data distribution policy

Page 14: Challenges in Querying a Distributed Relational Database

14

ScaleBase Software

ScaleBase is a distributed MySQL database management system. It is optimized for the cloud and deploys in minutes so you can scale out to an unlimited number of users, data and transactions

Dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds

Contact Us [email protected]

or Download free software

ScaleBase Softwarewww.scalebase.com/software/