mca mc0077 ii

Upload: sriram-chakrapani

Post on 14-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Mca Mc0077 II

    1/26

    NAME - KRUSHITHA.V.P

    ROLL NO. - 520791371

    ASSIGNMENT SET 2

    SUBJECT - MC0077

    ADVANCED DATABASE

    SYSTEM

    1

  • 7/27/2019 Mca Mc0077 II

    2/26

    August 2010

    Master of Computer Application (MCA) Semester 4MC0077 Advanced Database Systems

    Assignment Set 2

    6. Explain the following concepts with respect to DistributedDatabase Systems:A) Data ReplicationB) Options for Multi Master Replication

    A) Data Replication

    Replicationis the process of copying and maintaining database objects, such

    as tables, in multiple databases that make up a distributed database system.

    Changes applied at one site are captured and stored locally before being

    forwarded and applied at each of the remote locations. Advanced Replicationis a fully integrated feature of the Oracle server; it is not a separate server.

    Replication uses distributed database technology to share data between

    multiple sites, but a replicated database and a distributed database are not

    the same. In a distributed database, data is available at many locations, but a

    particular table resides at only one location. For example, the employees table

    resides at only the loc1.world database in a distributed database system that

    also includes the loc2.world and loc3.world databases. Replication means

    that the same data is available at multiple locations. For example, the

    employees table is available at loc1.world, loc2.world, and loc3.world. Some

    of the most common reasons for using replication are described as follows:

    Availability

    Replication provides fast, local access to shared data because it balances

    activity over multiple sites. Some users can access one server while other

    users access different servers, thereby reducing the load at all servers. Also,

    users can access data from the replication site that has the lowest access

    cost, which is typically the site that is geographically closest to them.

    Performance

    Replication provides fast, local access to shared data because it balances

    activity over multiple sites. Some users can access one server while other

    users access different servers, thereby reducing the load at all servers. Also,

    users can access data from the replication site that has the lowest access

    cost, which is typically the site that is geographically closest to them.

    2

  • 7/27/2019 Mca Mc0077 II

    3/26

    Disconnected Computing

    A Materialized View is a complete or partial copy (replica) of a target table

    from a single point in time. Materialized views enable users to work on a

    subset of a database while disconnected from the central database server.

    Later, when a connection is established, users can synchronize (refresh)

    materialized views on demand. When users refresh materialized views, they

    update the central database with all of their changes, and they receive any

    changes that may have happened while they were disconnected.

    Network Load Reduction

    Replication can be used to distribute data over multiple regional locations.Then, applications can access various regional servers instead of accessing

    one central server. This configuration can reduce network load dramatically.

    Mass Deployment

    Replication can be used to distribute data over multiple regional locations.

    Then, applications can access various regional servers instead of accessing

    one central server. This configuration can reduce network load dramatically.

    B) Options for Multi Master Replication

    Multi Master Replication (also called peer-to-peer or n-way replication)

    enables multiple sites, acting as equal peers, to manage groups of replicated

    database objects. Each site in a multi-master replication environment is a

    master site, and each site communicates with the other master sites.

    Options for Multi-Master Replication

    Asynchronous replication is the most common way to implement multi-

    master replication. However, you have two other options: Synchronous

    Replication and Procedural Replication.

    A Multi-Master replication environment can use either asynchronous or

    synchronous replication to copy data. With asynchronous replication, changes

    made at one master site occur at a later time at all other participating master

    sites. With synchronous replication, changes made at one master site occur

    immediately at all other participating master sites.

    3

  • 7/27/2019 Mca Mc0077 II

    4/26

    When you use synchronous replication, an update of a table results in the

    immediate replication of the update at all participating master sites. In fact,

    each transaction includes all master sites. Therefore, if one master site

    cannot process a transaction for any reason, then the transaction is rolled

    back at all master sites.

    Although you avoid the possibility of conflicts when you use synchronous

    replication, it requires a very stable environment to operate smoothly. If

    communication to one master site is not possible because of a network

    problem

    Procedural Replication

    Batch processing applications can change large amounts of data within a

    single transaction. In such cases, typical row-level replication might load a

    network with many data changes. To avoid such problems, a batch

    processing application operating in a replication environment can useOracles Procedural Replication to replicate simple stored procedure calls

    to converge data replicas. Procedural replication replicates only the call to a

    stored procedure that an application uses to update a table. It does not

    replicate the data modifications themselves.

    To use procedural replication, you must replicate the packages that modify

    data in the system to all sites. After replicating a package, you must generate

    a wrapper for the package at each site. When an application calls a

    packaged procedure at the local site to modify data, the wrapper ensures

    that the call is ultimately made to the same packaged procedure at all othersites in the replication environment. Procedural replication can occur

    asynchronously or synchronously.

    Conflict Detection and Procedural Replication

    When a replicating data uses procedural replication, the procedures that

    replicate data are responsible for ensuring the integrity of the replicated

    data. That is, you must design such procedures to either avoid or detect

    replication conflicts and to resolve them appropriately. Consequently,

    procedural replication is most typically used when databases are modified

    only with large batch operations. In such situations, replication conflicts areunlikely because numerous transactions are not contending for the same

    data.

    5. Explain the following concepts in the context of Fuzzy Databases:A) Need for Fuzzy Databases

    4

  • 7/27/2019 Mca Mc0077 II

    5/26

    B) Techniques for implementation of Fuzziness in DatabasesC) Classification of Data

    A) Need for Fuzzy Databases

    Need for Fuzzy Databases

    As the application of database technology moves outside the realm of a crisp

    mathematical world to the realm of the real world, the need to handle

    imprecise information becomes important, because a database that can

    handle imprecise information shall store not only raw data but also related

    information that shall allow us to interpret the data in a much deeper

    context, e.g. a query Which student is young and has sufficiently good

    grades? captures the real intention of the users query than a crisp query as

    SELECT * FROM STUDENT

    WHERE AGE < 19 AND GPA > 3.5

    Such a technology has wide applications in areas such as medical diagnosis,

    employment, investment etc. because in such areas subjective and uncertain

    information is not only common but also very important.

    B) Techniques for implementation of Fuzziness in Databases

    One of the major concerns in the design and implementation of fuzzy

    databases is efficiency i.e. these systems must be fast enough to make

    interaction with the human users feasible. In general, we have two feasibleways to incorporate fuzziness in databases:

    1. Making fuzzy queries to the classical databases

    2. Adding fuzzy information to the system

    C) Classification of Data

    The information data can be classified as following:

    1. Crisp: There is no vagueness in the information.

    e.g., X = 13

    Temperature = 90

    2. Fuzzy: There is vagueness in the information and this can be furtherdivided into two types as

    5

  • 7/27/2019 Mca Mc0077 II

    6/26

    a. Approximate Value:The information data is not totally vague and thereis some approximate value, which is known and the data, lies near that value.

    e.g., 10 _ X _ 15

    Temperature _ 85

    These are considered have a triangular shaped possibility distribution asshown below

    Possibility Distribution for an approximate value

    The parameter, d gives the range around which the information value lies.

    b. Linguistic Variable: A linguistic variable is a variable that apart from

    representing a fuzzy number also represents linguistic concepts interpreted

    in a particular context. Each linguistic variable is defined in terms of avariable which either has a physical interpretation (speed, weight etc.) or any

    other numerical variable (salary, absences, gpa etc.) A linguistic variable is

    fully characterized by a quintuple where,

    v is the name of the linguistic variable.

    T is the set of linguistic terms that apply to this variable.

    X is the universal set of the values of X.

    g is a grammar for generating the linguistic terms.

    T, a fuzzy set on m is a semantic rule that assigns to each term t

    X. The information in this case is totally vague and we associate a fuzzy set

    with the information. A linguistic term is the name given to the fuzzy set. e.g.,

    X is SMALL

    6

  • 7/27/2019 Mca Mc0077 II

    7/26

    Temperature is HOT

    These are considered have a trapezoidal shaped possibility distribution as

    shown below

    Possibility Distribution for a Linguistic Term SMALL for the Linguistic VariableHEIGHT

    4. Describe the following Data Mining Functions:A) Classification B) AssociationsC) Sequential/Temporal patterns D) Clustering/Segmentation

    Data mining methods may be classified by the function they perform oraccording to the class of application they can be used in.

    A) Classification

    Data Mining tools have to infer a model from the database, and in the case of

    Supervised Learning this requires the user to define one or more classes. The

    database contains one or more attributes that denote the class of a tuple and

    these are known as predicted attributes whereas the remaining attributes are

    called predicting attributes. A combination of values for the predicted

    attributes defines a class.

    When learning classification rules the system has to find the rules that

    predict the class from the predicting attributes so firstly the user has to

    define conditions for each class, the data mine system then constructs

    descriptions for the classes.

    Once classes are defined the system should infer rules that govern the

    classification therefore the system should be able to find the description of

    each class. The descriptions should only refer to the predicting attributes of

    7

  • 7/27/2019 Mca Mc0077 II

    8/26

  • 7/27/2019 Mca Mc0077 II

    9/26

    Sequential/temporal pattern functions analyze a collection of records over a

    period of time for example to identify trends. Where the identity of a

    customer who made a purchase is known an analysis can be made of the

    collection of related records of the same structure (i.e. consisting of a number

    of items drawn from a given collection of items). The records are related by

    the identity of the customer who did the repeated purchases. Such a situationis typical of a direct mail application where for example a catalogue merchant

    has the information, for each customer, of the sets of products that the

    customer buys in every purchase order. A sequential pattern function will

    analyze such collections of related records and will detect frequently

    occurring patterns of products bought over time. A sequential pattern

    operator could also be used to discover for example the set of purchases that

    frequently precedes the purchase of a microwave oven.

    Sequential pattern mining functions are quite powerful and can be used to

    detect the set of customers associated with some frequent buying patterns.

    Use of these functions on for example a set of insurance claims can lead to

    the identification of frequently occurring sequences of medical procedures

    applied to patients which can help identify good medical practices as well as

    to potentially detect some medical insurance fraud.

    D) Clustering/Segmentation

    Clustering and Segmentation are the processes of creating a partition so that

    all the members of each set of the partition are similar according to some

    metric. A Cluster is a set of objects grouped together because of their

    similarity or proximity. Objects are often decomposed into an exhaustiveand/or mutually exclusive set of clusters.

    Clustering according to similarity is a very powerful technique, the key to it

    being to translate some intuitive measure of similarity into a quantitative

    measure. When learning is unsupervised then the system has to discover its

    own classes i.e. the system clusters the data in the database. The system has

    to discover subsets of related objects in the training set and then it has to

    find descriptions that describe each of these subsets.

    There are a number of approaches for forming clusters. One approach is to

    form rules which dictate membership in the same group based on the level of

    similarity between members. Another approach is to build set functions that

    measure some property of partitions as functions of some parameter of the

    partition.

    3. Explain:

    9

  • 7/27/2019 Mca Mc0077 II

    10/26

    A) Data Dredging B) Data Mining Techniques

    A) Data Dredging

    Data Dredging or Data Fishing are terms one may use to criticize someones

    data mining efforts when it is felt the patterns or causal relationships

    discovered are unfounded. In this case the pattern suffers of over fitting on

    the training data.

    Data Dredging is the scanning of the data for any relationships, and then

    when one is found coming up with an interesting explanation. The

    conclusions may be suspect because data sets with large numbers of

    variables have by chance some "interesting" relationships. Fred Schwed said:

    "There have always been a considerable number of people who busy

    themselves examining the last thousand numbers which have appeared on aroulette wheel, in search of some repeating pattern. Sadly enough, they have

    usually found it."

    Nevertheless, determining correlations in investment analysis has proven to

    be very profitable for statistical arbitrage operations (such as pairs trading

    strategies), and correlation analysis has shown to be very useful in risk

    management. Indeed, finding correlations in the financial markets, when

    done properly, is not the same as finding false patterns in roulette wheels.

    Some exploratory data work is always required in any applied statistical

    analysis to get a feel for the data, so sometimes the line between good

    statistical practice and data dredging is less than clear. Most data mining

    efforts are focused on developing highly detailed models of some large data

    set. Other researchers have described an alternate method that involves

    finding the minimal differences between elements in a data set, with the goal

    of developing simpler models that represent relevant data.

    When data sets contain a big set of variables, the level of statistical

    significance should be proportional to the patterns that were tested. For

    example, if we test 100 random patterns, it is expected that one of them will

    be "interesting" with a statistical significance at the 0.01 level.

    Cross Validation is a common approach to evaluating the fitness of a model

    generated via data mining, where the data is divided into a training subset

    and a test subset to respectively build and then test the model. Common

    cross validation techniques include the holdout method, k-fold cross

    validation, and the leave-one-out method.

    B) Data Mining Techniques

    10

  • 7/27/2019 Mca Mc0077 II

    11/26

    Cluster Analysis

    In an unsupervised learning environment the system has to discover its own

    classes and one way in which it does this is to cluster the data in thedatabase as shown in the following diagram. The first step is to discover

    subsets of related objects and then find descriptions Ex: D1, D2, D3 etc.

    which describe each of these subsets.

    Clustering and segmentation basically partition the database so that each

    partition or group is similar according to some criteria or metric. Clustering

    according to similarity is a concept which appears in many disciplines. If a

    measure of similarity is available there are a number of techniques for

    forming clusters. Membership of groups can be based on the level of

    similarity between members and from this the rules of membership can be

    defined. Another approach is to build set functions that measure some

    property of partitions i.e. groups or subsets as functions of some parameter

    of the partition. This latter approach achieves what is known as optimal

    partitioning.

    Many data mining applications make use of clustering according to similarity

    for example to segment a client/customer base. Clustering according to

    optimization of set functions is used in data analysis.

    Clustering/segmentation in databases are the processes of separating a dataset into components that reflect a consistent pattern of behavior. Once the

    patterns have been established they can then be used to "deconstruct" data

    into more understandable subsets and also they provide sub-groups of a

    population for further analysis or action which is important when dealing with

    very large databases.

    11

  • 7/27/2019 Mca Mc0077 II

    12/26

    Induction

    A database is a store of information but more important is the informationwhich can be inferred from it. There are two main inference techniquesavailable i.e. deduction and induction.

    Deduction is a technique to infer information that is a logical

    consequence of the information in the database

    Induction has been described earlier as the technique to infer

    information that is generalised from the database as in the example

    mentioned above to infer that each employee has a manager. This is

    higher level information or knowledge in that it is a general statement

    about objects in the database. The database is searched for patterns

    or regularities.

    Decision Trees

    Decision Trees are simple knowledge representation and they classifyexamples to a finite number of classes, the nodes are labeled with attributenames, the edges are labeled with possible values for this attribute and theleaves labeled with different classes. Objects are classified by following apath down the tree, by taking the edges, corresponding to the values of theattributes in an object.

    The objects contain information on the outlook, humidity etc. Some objectsare positive examples denote by P and others are negative i.e. N.Classification is in this case the construction of a tree structure, which can beused to classify all the objects correctly.

    Decision Tree Structure

    Rule Induction

    12

  • 7/27/2019 Mca Mc0077 II

    13/26

    A Data Mining System has to infer a model from the database that is it may

    define classes such that the database contains one or more attributes that

    denote the class of a tuple i.e. the predicted attributes while the remaining

    attributes are the predicting attributes. A Class can then be defined by

    condition on the attributes. When the classes are defined the system should

    be able to infer the rules that govern classification, in other words the systemshould find the description of each class.

    Production rules have been widely used to represent knowledge in expert

    systems and they have the advantage of being easily interpreted by human

    experts because of their modularity i.e. a single rule can be understood in

    isolation and doesnt need reference to other rules. The propositional like

    structure of such rules has been described earlier but can summed up as if-

    then rules.

    Neural Networks

    Neural Networks are an approach to computing that involves developing

    mathematical structures with the ability to learn. The methods are the result

    of academic investigations to model nervous system learning. Neural

    Networks have the remarkable ability to derive meaning from complicated or

    imprecise data and can be used to extract patterns and detect trends that

    are too complex to be noticed by either humans or other computer

    techniques.

    Neural Networks have broad applicability to real world business problems and

    have already been successfully applied in many industries. Since neuralnetworks are best at identifying patterns or trends in data, they are well

    suited for prediction or forecasting needs including:

    Sales Forecasting Industrial Process Control

    Customer Research

    Data Validation

    Risk Management

    Target Marketing etc.

    The structure of a neural network looks something like the following:

    13

  • 7/27/2019 Mca Mc0077 II

    14/26

    Structure of a neural network

    On-line Analytical processing

    A major issue in information processing is how to process larger and larger

    databases, containing increasingly complex data, without sacrificing response

    time. The client/server architecture gives organizations the opportunity to

    deploy specialized servers which are optimized for handling specific data

    management problems. Until recently, organizations have tried to target

    Relational Database Management Systems (RDBMSs) for the complete

    spectrum of database applications. It is however apparent that there are

    major categories of database applications which are not suitably serviced by

    relational database systems. Oracle, for example, has built a totally newMedia Server for handling multimedia applications. Sybase uses an Object

    Oriented DBMS (OODBMS) in its Gain Momentum product which is designed

    to handle complex data such as images and audio. Another category of

    applications is that of On-Line Analytical Processing (OLAP).

    Multidimensional Conceptual View Transparency

    Accessibility

    Consistent Reporting Performance

    Client/Server Architecture

    Generic Dimensionality

    Dynamic Sparse Matrix Handling

    14

  • 7/27/2019 Mca Mc0077 II

    15/26

    Multi-User Support

    Unrestricted Cross Dimensional Operations

    Intuitive Data Manipulation

    Flexible Reporting

    Unlimited Dimensions and Aggregation Levels

    An alternative definition of OLAP has been supplied by Nigel Pendse defines

    OLAP as, Fast Analysis of Shared Multidimensional Information which means;

    Fast in that users should get a response in seconds and so doesnt lose their

    chain of thought;

    Analysis in that the system can provide analysis functions in an intuitive

    manner and that the functions should supply business logic and statisticalanalysis relevant to the users applications.

    Shared from the point of view of supporting multiple users concurrently;

    Multidimensional as a main requirement so that the system supplies a

    multidimensional conceptual view of the data including support for multiple

    hierarchies;

    Information is the data and the derived information required by the user

    application.

    It is essentially a way to build associations between dissimilar pieces of

    information using predefined business rules about the information you are

    using. Kirk Cruikshank of Arbor Software has identified three components to

    OLAP, in an issue of UNIX News on data warehousing;

    A multidimensional database must be able to express complexbusiness calculations very easily. The data must be referenced andmathematics defined. In a relational system there is no relationbetween line items which makes it very difficult to express businessmathematics.

    Intuitive navigation in order to `roam around data which requiresmining hierarchies.

    Instant response i.e. the need to give the user the information as quickas possible.

    Data Visualization

    15

  • 7/27/2019 Mca Mc0077 II

    16/26

    Data visualization makes it possible for the analyst to gain a deeper, more

    intuitive understanding of the data and as such can work well along side data

    mining. Data mining allows the analyst to focus on certain patterns and

    trends and explore in-depth using visualization. On its own data visualization

    can be overwhelmed by the volume of data in a database but in conjunction

    with data mining can help with exploration.

    2. Describe the following with respect to SQL3 DB specification:A) Complex Structures B) Hierarchical Structures C)RelationshipsD) Large OBjects, LOBs E) Storage of LOBs

    SQL3, defined as a standard in 1999, supports all SQL2 functions and

    provides an extended set of data-types, including user-defined data types

    and functions. Unfortunately, the SQL3 standard came after many of its

    features had been implemented in different ORDBMS Systems. One of the

    first was presented by Object Services and Consulting, Inc, probably in 1996

    or 97, in their posting of an object-oriented presentation of SQL3, similar to

    that implemented in Informix, This has lead to the existence of various

    dialects. For example, IBMs/DB2 and Oracles ORDBMS support slightly

    different versions of data-types, structures, and features.

    A) Complex Structures

    1. Create row type Address_tdefines the address structure that is used in line8.

    2. Street#, Street, are regular SQL2 specifications for atomic attributes.

    3. PostCode and Geo-Loc are both defined as having user defined data types,Pcode and Point respectively. Pcode is typically locally defined as a list ortable of valid postal codes, perhaps with the post office name.

    4. Create function Age_f defines a function for calculation of an age, as a

    decimal value, given a start date as the input argument and using a simplealgorithm based on the current date. This function is used as the data type inline 9 and will be activated each time the Person.age attribute is retrieved.

    The function can also be used as a condition clause in a SELECT statement.

    5. Create table PERSON initiates specification of the implementation structurefor the Person entity-type.

    16

  • 7/27/2019 Mca Mc0077 II

    17/26

    6. Id is defined as the primary key. The not null phrase only controls thatsome not null value is given. Theprimary keyphrase indicates that the DBMis to guaranty that the set of values for Id are unique.

    7. Name has a data-type, PersName, defined as a Row type similar to the one

    defined in lines 1-3. BirthDate is a date that can be used as the argument forthe function Age_f defined in line 4.

    8. Address is defined using the row type Address_t, defined in lines 1-3.Picture is defined as a BLOB, or Binary Large Object.Here there are no functions for content search, manipulation or presentation,which support BLOB data types. These must be defined either by the user asuser-defined functions, UDFs, or by the ORDBMS vendor in a supplementarysubsystem. In this case, we need functions for image processing.

    9.Age is defined as a function, which will be activated each time the attribute

    is retrieved. This costs processing time (though this algorithm is very simple),but gives a correct value each time the attribute is used.

    B) Hierarchical Structures

    1. Create table STUDENT initiates specification of the implementation of a

    subclass entity type.

    2. GPA, Level, are the attributes for the subclass, here with simple SQL2

    data types.

    3. under PERSON specifies the table as a subclass of the table PERSON. TheDBM thus knows that when the STUDENT table is requested, all attributes and

    functions in PERSON are also relevant. An OR-DBMS will store and use the

    primary key of PERSON as the key for STUDENT, and execute a join operation

    to retrieve the full set of attributes.

    4. Create table COURSE specifies a new table specification, as done for

    statements in lines 5 and 10 above.

    5. Id, Name, and Level are standard atomic attribute types with SQL2 data

    types. Id is defined as requiring a unique, non null value, as specified for

    PERSON in line 6 above.

    6. Note that attributes must have unique names within their tables, but the

    name may be reused, with different data domains in different tables.

    Both Id and Name are such attribute-names, appearing in both PERSON and

    COURSE, as is Level used in STUDENT and COURSE.

    17

  • 7/27/2019 Mca Mc0077 II

    18/26

    7. Course.Description is defined as a character large object, CLOB. A CLOB

    data type has the same defined character-string functions as char, varchar,

    and long char, and can be compared to these. User_id is defined as Ucode,

    which is the name of a user defined data type, presumably a list of

    acceptable user codes. The DB implementer must define both the data type

    and the appropriate functions for processing this type.

    8. User_Idis also specified as a foreign keywhich links the Course records to

    their "user" record, modeled as a category sub entity type, through the

    primary key in the User table.

    C) Relationships

    The relationship definition needs only SQL2 specifications.

    {Sid, Cid, and Term} form the primary key, PK. Since the key is

    composite, a separate Primary key clause is required. (As comparedwith the single attribute PK specifications for PERSON.Id andCOURSE.Id.)

    The 2 foreign key attributes in the PK, must be defined separately.

    TakenBy.Report is a foreign key to a report entity-type, forming aternary relationship as modeled in Figure 6.7a. The ON DELETE triggeris activated if the Report relation is deleted and assures that the FKlink has a valid value, in this case null.

    18

  • 7/27/2019 Mca Mc0077 II

    19/26

    SSM Concepts and Syntax

    D) Large OBjects, LOBs

    The SSM syntax includes data types for potentially very long media types,

    such as text, image, audio and video. If this model is to be realized in a single

    19

    http://resources.smude.edu.in/slm/wp-content/uploads/2010/07/clip-image02017.jpg
  • 7/27/2019 Mca Mc0077 II

    20/26

  • 7/27/2019 Mca Mc0077 II

    21/26

    DBMS vendors who provide differentiated blob types have also extended thebasic SQL string comparison operators so that they will function for LOBs, orat least CLOBs. These operators include the pattern match function "LIKE",which gives a true/false response if the search string is found/not found in the*LOB attribute.

    E) Storage of LOBs

    There are 3 strategies for storing LOBs in an or-DB:

    1. Embedded in a column of the defining relation, or2. Stored in a separate table within the DB, linked from the *LOB column

    of the defining relation.

    3. Stored on an external (local or geographically distant) medium, againlinked from the *LOB column of the defining relation.

    Embedded storage in the defining relation closely maps the logical view ofthe media object with its physical storage. This strategy is best if the other

    attributes of the table are primarily structural metadata used to specify

    display characteristics, for example length, language, format.

    The problem with embedded storage is that a DMS must transfer at least a

    whole tuple, more commonly a block of tuples, from storage for processing. If

    blobs are embedded in the tuples, a great deal of data must be transmitted

    even if the LOB objects are not part of the query selection criteria or the

    result.

    Separate table storage gives indirect access via a link in the defining relation

    and delays retrieval of the LOB until it is to be part of the query result set.

    Though this gives a two-step retrieval, for example when requesting an

    image of Joan Nordbotten, it will reduce general or average transfer time for

    the query processing system.

    A drawback of this storage strategy is a likely fragmentation of the DB area,

    as LOBs can be stored anywhere. This will decrease the efficiency of any

    algorithm searching the content of a larger set of LOBs.

    External storage is useful if the DB data is connected to established mediadatabases, either locally on CD, DVD, or on other computers in a network

    as will most likely be the case when sharing media data stored in

    autonomous applications, such as cooperating museums, libraries, archives,

    or government agencies. This storage structure eliminates the need for

    duplication of large quantities of data that are normally offered in read-only

    21

  • 7/27/2019 Mca Mc0077 II

    22/26

    mode. The cost is in access time which may currently be nearly unnoticeable.

    A good multimedia DMS should support each of these storage strategies.

    1. Explain the following with respect to Object Oriented databases:A) Query Processing Architecture B) Object Relational DatabaseImplementation

    A) Query Processing Architecture

    Query Processing Methodology

    A query processing methodology similar to relational DBMSs, but modified todeal with the difficulties discussed in the previous section, can be followed inOODBMSs.

    The steps of the methodology are as follows.

    1. Queries are expressed in a declarative language2. It requires no user knowledge of object implementations, access paths

    or processing strategies

    3. The calculus expression is first

    4. Calculus Optimization

    5. Calculus Algebra Transformation

    6. Type check

    7. Algebra Optimization

    8. Execution Plan Generation

    9. Execution

    22

  • 7/27/2019 Mca Mc0077 II

    23/26

    Object Query Processing Methodology

    B) Object Relational Database Implementation

    There are several methods and tools (data management systems) that canbe used for implementation of multimedia databases. The reasons are:

    Application Oriented: We are considering applications that have acombination of structured and multimedia data.

    Historic: Object-relational technology is an extension of the relationaltechnology that is the dominant tool for management of data inadministrative applications.

    Research Oriented: The claims of new technology should be testedbefore they are accepted.

    Pragmatic:

    o Free software is available for experimentation and testing.

    o Most readers of this text have a background in relationaltechnology.

    DB Components

    A database is defined as a logically coherent collection of related data,representing some aspect of the real world, designed, built, and populated forsome purpose. In addition to the user data stored in the database proper (in

    accordance to the above definition), two other data sets are stored within theDB area. These data are necessary to support efficient data storage,retrieval, and management, and include:

    1. A schema that defines the DB structure, is compiled from datadefinition language (DDL) statements.

    2. A set of indexes, used to support efficient data access and systemintegrity.

    23

  • 7/27/2019 Mca Mc0077 II

    24/26

    In addition, a library of methods, (functions and procedures) is maintained toprocess user input and/or database data. Methods are triggered by some DBevent and may trigger another method.

    User datacan be stored as a set of files or tables each of which represents

    some entity or relationship type. Large objects, LOBs, used for storage ofmedia data, can be stored either within the file/table area of the parententity, as implied for the Person.Picture

    Media objects as attributes

    The storage area for a DB is frequently non-contiguous and DB segmentsmay reside on separate storage units of a machine and/or machines atgeographically separate locations. Thus media data stored on local systemscan be viewed as belonging to the scope of the dbms, and may be under themanagement of the DBMS.

    An index can be specified for any combination of columns or fields in a tableor for elements of unstructured data in a file, such as terms in a text. Theprimary purpose of any index is to support efficient access to data items. Aneveryday example of an index is that found in the back of most textbooks.

    A DB index can be viewed as a table with 2 columns: an index term/value anda list of pointers to DB elements (entity instances/table rows) that contain

    24

  • 7/27/2019 Mca Mc0077 II

    25/26

    that value. In practice the index elements are ordered in some form of b-treeto minimize access time.

    Indexes may be unique or clustered, meaning that an index entry referencesonly one element or a set of elements, respectively. Unique indexes are

    commonly used to enforce the primary key integrity constraint that eachtuple in a relation must be unique. Cluster indexes provide fast access to setsof data containing the same values as the index term.

    The method library contains user-defined functions, procedures, assertionstatements, integrity rules and the trigger functions that maintain them. InORDBMS, this library can be extended to include user definitions of new (tothe DBMS) data types and the functions necessary for manipulate them. A DBschema contains the metadata specified for the database as defined usingthe DBMS/ Data Definition Language (DDL).

    25

  • 7/27/2019 Mca Mc0077 II

    26/26

    Modeling complex media objects