advance concept in data bases unit-2 by arun pratap singh

Upload: arunpratapsingh

Post on 14-Oct-2015

118 views

Category:

Documents


6 download

DESCRIPTION

8878061993 OIST BhopalMTech -CSE II Semester RGPV Bhopal

TRANSCRIPT

  • PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER

  • PREPARED BY ARUN PRATAP SINGH 1

    1

    QUERY PROCESSING AND OPTIMIZATION INTRODUCTION :

    A database management system manages a large volume of data which can be retrieved by

    specifying a number of queries expressed in a high-level query language such as SQL.

    Whenever a query is submitted to the database system, a number of activities are performed

    to process that query. Query processing includes translation of high-level queries into low-

    level expressions that can be used at the physical level of the file system, query optimization,

    and actual execution of the query to get the result. Query optimization is a process in which

    multiple query-execution plans for satisfying a query are examined and a most efficient query

    plan is identified for execution.

    Basic Steps in Query Processing :- 1. Parsing and translation 2. Optimization 3. Evaluation

    Parsing and translation :

    Check syntax and verify relations. Translate the query into an equivalent relational algebra expression. Optimization :

    Generate an optimal evaluation plan (with lowest cost) for the query plan. Evaluation : The query-execution engine takes an (optimal) evaluation plan, executes that plan, and returns the answers to the query.

    UNIT : II

  • PREPARED BY ARUN PRATAP SINGH 2

    2

  • PREPARED BY ARUN PRATAP SINGH 3

    3

  • PREPARED BY ARUN PRATAP SINGH 4

    4

    QUERY OPTIMIZATION :

    Query optimization is a function of many relational database management systems. The query

    optimizer attempts to determine the most efficient way to execute a given query by considering the

    possible query plans.

    Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to

    database server, and parsed by the parser, they are then passed to the query optimizer where

    optimization occurs. However, some database engines allow guiding the query optimizer with hints.

  • PREPARED BY ARUN PRATAP SINGH 5

    5

    What is Query Optimization? Suppose you were given a chance to visit 15 pre-selected different cities in Europe. The

    only constraint would be Time -> Would you have a plan to visit the cities in any order?

    Plan:

    -> Place the 15 cities in different groups based on their proximity () to each other. -> Start with one group and move on to the next group. Important point made over here is that you would have visited the cities in a more organized manner, and the Time constraint mentioned earlier would have been dealt with efficiently.

    Query Optimization works in a similar way: There can be many different ways to get an answer from a given query. The result would be same in all scenarios. DBMS strive to process the query in the most efficient way (in terms of Time) to produce the answer. Cost = Time needed to get all answers

    Starting with System-R, most of the commercial DBMSs use cost-based optimizers. The estimation should be accurate and easy. Another important point is the need for

    being logically consistent because the least cost plan will always be consistently low.

  • PREPARED BY ARUN PRATAP SINGH 6

    6

  • PREPARED BY ARUN PRATAP SINGH 7

    7

    Steps in a Cost-based query optimization :

    1. Parsing 2. Transformation 3. Implementation 4. Plan selection based on cost estimates

    Query Flow :

  • PREPARED BY ARUN PRATAP SINGH 8

    8

    Query Parser Verify validity of the SQL statement. Translate query into an internal structure using relational calculus.

    Query Optimizer Find the best expression from various different algebraic expressions. Criteria used is Cheapness

    Code Generator/Interpreter Make calls for the Query processor as a result of the work done by the optimizer.

    Query Processor Execute the calls obtained from the code generator. Cost-based query Optimization: Algebraic Expressions If we had the following query- SELECT p.pname, d.dname FROM Patients p, Doctors d WHERE p.doctor = d.dname AND d.dgender = M

  • PREPARED BY ARUN PRATAP SINGH 9

    9

  • PREPARED BY ARUN PRATAP SINGH 10

    10

    SYNTAX ANALYZER :

    The syntax analyser takes the query from the users, parses it into tokens and analyses the

    tokens and their order to make sure they comply with the rules of the language grammar. If

    an error is found in the query submitted by the user, it is rejected and an error code together

    with an explanation of why the query was rejected is returned to the user.

    The syntax analyzer takes the query from the users, parses it into tokens andanalyses the tokens and their order to make sure they follow the rules of the language grammar. Is an error is found in the query submitted by the user, it is rejected and an error code together with an explanation of why the query was rejected is return to the user. A simple form of the language grammar that could use to implement SQL statement is given bellow : QUERY = SELECT + FROM + WHERE SELECT = SELECT + FROM = FROM + WHERE = WHERE + VALUE1 OP VALUE2 VALUE1 = VALUE / COLUMN NAME VALUE2 = VALUE / COLUMN NAME OP = >, =,

  • PREPARED BY ARUN PRATAP SINGH 11

    11

    QUERY DECOMPOSITION :

    The aims of query decomposition (1) To transform a high-level query into a relational algebra query. (2) To check that the query is syntactically and semantically correct.

    The typical stages of query decomposition are analysis, normalization, semantic analysis, simplification, and query restructuring.

    ANALYSIS :

  • PREPARED BY ARUN PRATAP SINGH 12

    12

  • PREPARED BY ARUN PRATAP SINGH 13

    13

  • PREPARED BY ARUN PRATAP SINGH 14

    14

    NORMALIZATION :

  • PREPARED BY ARUN PRATAP SINGH 15

    15

  • PREPARED BY ARUN PRATAP SINGH 16

    16

  • PREPARED BY ARUN PRATAP SINGH 17

    17

    SEMANTIC ANALYSIS :

  • PREPARED BY ARUN PRATAP SINGH 18

    18

  • PREPARED BY ARUN PRATAP SINGH 19

    19

    QUERY SIMPLIFIER :

  • PREPARED BY ARUN PRATAP SINGH 20

    20

  • PREPARED BY ARUN PRATAP SINGH 21

    21

  • PREPARED BY ARUN PRATAP SINGH 22

    22

    QUERY RECONSTRUCTION :

  • PREPARED BY ARUN PRATAP SINGH 23

    23

  • PREPARED BY ARUN PRATAP SINGH 24

    24

  • PREPARED BY ARUN PRATAP SINGH 25

    25

  • PREPARED BY ARUN PRATAP SINGH 26

    26

    QUERY OPTIMIZATION :

  • PREPARED BY ARUN PRATAP SINGH 27

    27

  • PREPARED BY ARUN PRATAP SINGH 28

    28

  • PREPARED BY ARUN PRATAP SINGH 29

    29

  • PREPARED BY ARUN PRATAP SINGH 30

    30

  • PREPARED BY ARUN PRATAP SINGH 31

    31

  • PREPARED BY ARUN PRATAP SINGH 32

    32

  • PREPARED BY ARUN PRATAP SINGH 33

    33

  • PREPARED BY ARUN PRATAP SINGH 34

    34

  • PREPARED BY ARUN PRATAP SINGH 35

    35

  • PREPARED BY ARUN PRATAP SINGH 36

    36

  • PREPARED BY ARUN PRATAP SINGH 37

    37

  • PREPARED BY ARUN PRATAP SINGH 38

    38

    The main aim of query optimization is to choose the most efficient way of implementing the

    relational algebra operations at the lowest possible cost. Therefore, the query optimizer

    should not depend solely on heuristics rules, but, it should also estimate the cost of executing

    the different strategies and find out the strategy with the minimum cost estimate. The method

    of optimising the query by choosing a strategy those results in minimum cost is called cost-

    based query optimization. The cost-based query optimization uses formulae that estimate the

  • PREPARED BY ARUN PRATAP SINGH 39

    39

    costs for a number of options and selects the one with lowest cost and most efficient to

    execute. The cost functions used in query optimization are estimates and not exact cost

    functions. So, the optimization may select a query execution strategy that is not the optimal

    one.

    The cost of an operation is heavily dependent on its selectivity, that is, the proportion of the

    input relation(s) that forms the output. In general, different algorithms are suitable for low-

    and high-selectivity queries. In order for a query optimiser to choose a suitable algorithm for

    an operation an estimate of the cost of executing that algorithm must be provided. The cost

    of an algorithm is dependent on the cardinality of its input. To estimate the cost of different

    query execution strategies, the query tree is viewed as containing a series of basic operations

    which are linked in order to perform the query. Each basic operation has an associated cost

    function whose argument(s) are the cardinality of its input(s). It is also important to know

    the expected cardinality of an operations output, since this forms the input to the next operation in the tree. The expected cardinalities are derived from statistical estimates of a

    querys selectivity, that is, the portion of the tuple satisfying the query.

    The main aim of query optimization is to choose the most efficient way of implementing the relational algebra operations at the lowest possible cost. Therefore the query optimizer should not depend solely on heuristic rules, but, it should also estimate the cost of executing the different strategies and find out the strategy with the minimum cost estimate. The method of optimizing the query by choosing a strategy those result in minimum cost is called cost-based query optimization. The cost-based query optimization uses the formula that estimate the cost for a number of options and selects the one with lowest cost and the most efficient to execute. The cost functions used in query optimization are estimates and not exact cost functions. The cost of an operation is heavily dependent on its selectivity, that is, the proportion of select operation(s) that forms the output. In general the different algorithms are suitable for low or high selectivity queries. In order for query optimizer to choose suitable algorithm for an operation an estimate of the cost of executing that algorithm must be provided. The cost of an algorithm is depend of a cardinality of its input. To estimate the cost of different query execution strategies, the query tree is viewed as containing a series of basic operations which are linked in order to perform the query. It is also important to know the expected cardinality of an operations output because this forms the input to the next operation. Cost Components of Query Execution :- The success of estimating the size and cost of intermediate relational algebra operations depends on the amount the accuracy of statistical data information stored with DBMS.

  • PREPARED BY ARUN PRATAP SINGH 40

    40

  • PREPARED BY ARUN PRATAP SINGH 41

    41

    STRUCTURE OF QUERY EVALUATION PLAN :

  • PREPARED BY ARUN PRATAP SINGH 42

    42

  • PREPARED BY ARUN PRATAP SINGH 43

    43

    PIPELINING AND MATERIALIZATION :

  • PREPARED BY ARUN PRATAP SINGH 44

    44

  • PREPARED BY ARUN PRATAP SINGH 45

    45

  • PREPARED BY ARUN PRATAP SINGH 46

    46

  • PREPARED BY ARUN PRATAP SINGH 47

    47

  • PREPARED BY ARUN PRATAP SINGH 48

    48

  • PREPARED BY ARUN PRATAP SINGH 49

    49

    SOME QUESTIONS

    Q . 1 Explain inter-query parallelism ?

    Ans : Inter-query parallelism is a form of parallelism in the evaluation of database queries, in

    which several different queries execute concurrently on multiple processors to improve the

    overall throughput of the system.

    When multiple non-conflicting requests are submitted to a database management system, then

    the system can execute them in parallel to improve the overall throughput. This form of parallelism is

    called inter-query parallelism. Inter-query parallelism is a consequence of the concurrency of user

    requests. It is orthogonal to intra-query parallelism, in which several processors cooperate for the faster

    execution of a single query.

    Inter-query parallelism results from the ability to execute multiple queries at the same time while intra-

    query parallelism is achieved by breaking up a single query into a number of subqueries each of which is

    executed at a different site, accessing a different part of the distributed database.

    If the user access to the distributed database consisted only of querying (i.e.,

    read-only access), then provision of inter-query and intra-query parallelism would

    imply that as much of the database as possible should be replicated. However, since most database

    accesses are not read-only, the mixing of read and update operations requires the implementation of

    elaborate concurrency control and commit protocols.

    o Queries/transactions execute in parallel with one another.

    o Increases transaction throughput; used primarily to scale up a transaction processing system to

    support a larger number of transactions per second.

    o Easiest form of parallelism to support, particularly in a shared-memory parallel database, because

    even sequential database systems support concurrent processing.

  • PREPARED BY ARUN PRATAP SINGH 50

    50

    o More complicated to implement on shared-disk or shared-nothing architectures

    o Locking and logging must be coordinated by passing messages between processors.

    o Data in a local buffer may have been updated at another processor.

    o Cache-coherency has to be maintained reads and writes of data in buffer must find

    latest version of data.

    Cache Coherency Protocol

    o Example of a cache coherency protocol for shared disk systems:

    o Before reading/writing to a page, the page must be locked in shared/exclusive mode.

    o On locking a page, the page must be read from disk

    o Before unlocking a page, the page must be written to disk if it was modified.

    o More complex protocols with fewer disk reads/writes exist.

    o Cache coherency protocols for shared-nothing systems are similar. Each database page is assigned

    a home processor. Requests to fetch the page or write it to disk are sent to the home processor.

    Q . 2 Discuss cost estimation in query optimization.

    Ans: Explain above.