lesson 3: an introduction to data modeling

Upload: abhi1banerjeee

Post on 30-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    1/23

    Michael Brydon ([email protected])Last update: 02-May-01 1 o f 23

    Lesson 3: An introduction to data modeling

    3.1 Introduction: The importance ofconceptual models

    Before you sit down in front of the keyboardand start creating a database application, it iscritical that you take a step back and consideryour business problemin this case, the kitchensupply scenario presented in Lesson 2 from aconceptual point of view. To facilitate this

    process, a number of conceptual modeling techniques have been developed by computerscientists, psychologists, and consultants.

    For our purposes, we can think of aconceptual model as a picture of theinformation system we are going to build.

    To use an analogy, conceptual models areto information systems what blueprintsare to buildings.

    There are many different conceptual modelingtechniques used in practice. Each techniqueuses a different set of symbols and may focus on

    a different part of the problem (e.g., data,processes, information flows, objects, and soon). Despite differences in notation and focus,however, the underlying rationale forconceptual modeling techniques is always the

    same: understand the problem before you startconstructing a solution.

    There are two important things to keep in mindwhen learning about and doing data modeling:1. Data modeling is first and foremost a tool

    for communication .Their is no single rightmodel. Instead, a valuable model highlightstricky issues, allows users, designers, andimplementors to discuss the issues using thesame vocabulary, and leads to better designdecisions.

    2. The modeling process is inherentlyiterative: you create a model, check itsassumptions with users, make the necessarychanges, and repeat the cycle until you are

    sure you understand the critical issues.In this background lesson, you are going to use adata modeling techniquespecifically, Entity-Relationship Diagrams (ERDs)to model thebusiness scenario from Lesson 2. The datamodel you create in this lesson will form the

    foundation of the database that you usethroughout the remaining lessons.

    ?

    http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    2/23

    Introduction: The importance of conceptualAn introduction to data modeling

    3.1.1 What is data modeling?

    A data model is a simply a diagram thatdescribes the most important things in your

    business environment from a data-centric pointof view. To illustrate, consider the simple ERDshown in Figure 3.1 . The purpose of the diagramis to describe the relationship between the datastored about products and the data storedabout the organizations that supply theproducts.

    3.1.1.1 Entities and attributes

    The rectangles in Figure 3.1 are called entitytypes (typically shortened to entities) and

    the ovals are called attributes . The entities arethe things in the business environment aboutwhich we want to store data. The attributesprovide us with a means of organizing andstructuring the data. For example, we need tostore certain information about the productsthat we sell, such as the typical selling price ofthe product (Unit price) and the quantity ofthe product currently in inventory (Qty onhand). These pieces of data are attributes ofthe Product entity.

    It is important to note that the precise mannerin which data are used and processed within a

    particular business application is a separateissue from data modeling. For example, thedata model says nothing about how the value ofQty on hand is changed over time. The focusin data modeling is on capturing data about theenvironment. You will learn how to change thisdata (e.g., process orders so that the inventory

    values are updated) once you have masteredthe art of database design.

    A data modeler assumes that if the rightdata is available, the other elements ofthe application will fall into placeeffortlessly and wonderfully. For now, thisis a good working assumption.

    FIGURE 3.1: An ERD showing a relationshipbetween products and suppliers.

    Product

    suppliedby

    Supplier

    Unit price

    ProductID

    Qty on hand

    Name

    Address

    Entity

    Attributes

    Relationship

    Cardinality

    ?

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    3/23

    Introduction: The importance of conceptualAn introduction to data modeling

    3.1.1.2 Notation for relationships

    In addition to entities and attributes, Figure 3.1 shows a relationship between the two entities

    using a line and a diamond. The relationshipconstruct is usednot surprisinglyto indicatethe existence or absence of a relationshipbetween entities. A crows foot at either end ofa relationship line is used to denote thecardinality of the relationship.

    For example, the crows foot on the productside of the relationship in Figure 3.1 indicatesthat a particular supplier may provide yourcompany with several different products, suchas bowls, spatulas, wire whisks and so on. Theabsence of a crows foot on the supplier sideindicates that each product in your inventory is

    provided by a single supplier. Thus, therelationship in Figure 3.1 indicates that youalways buy all your wire whisks from the samecompany.

    3.1.1.3 Modeling assumptions

    The relationship shown in Figure 3.1 is calledone-to-many : each supplier supplies manyproducts (where many means any numberincluding zero) but each product is supplied byone supplier (where one means at mostone).

    The decision to use a one-to-many relationshipreflects an assumption about the business

    environment in which your wholesale companyoperates. However, it is easy to imagine adifferent environment in which each product issupplied by multiple suppliers. For example,many suppliers may carry a particular brand ofwire whisk. When you run out of whisks, it is upto you to decide where to place your order. Inother words, it is possible that a many-to-many relationship exists between suppliers andproducts.

    If multiple supplier exist, attributes of theproduct, such as its price and product numbermay vary from supplier to supplier. In thissituation, the data requirements of a many-to-many environment are slightly more complexthan those of the one-to-many environment. Ifyou design and implement your database aroundthe one-to-many assumption but then discoverthat certain goods are supplied by multiplesuppliers, much effort is going to be required tofix the problem.

    Herein lies the point of drawing an ERD:The diagram makes your assumptionsabout the relationships within a particularbusiness environment explicit before youstart building things.

    3.1.1.4 The role of the modeler

    In the environment used in these tutorials, youare the user, the designer, and the implementor

    ?

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    4/23

    Introduction: The importance of conceptualAn introduction to data modeling

    of the system. In a more realistic environment,however, these roles are played by differentindividuals (or groups) with differentbackgrounds and priorities. For example, acommon stereotype of implementors(programmers, database specialists, and so on)is that they seldom leave their cubicles tocommunicate with end-users of the softwarethey are writing. Similarly, it is generally safe toassume that users have no interest in, or

    understanding of, low-level technical details(such as the cardinality of relationships onERDs, mechanisms to enforce referentialintegrity, and so on). Thus, it is up to thebusiness analyst to bridge the communicationgap between the different groups involved inthe construction, use, and administration of an

    information system.As a business analyst (or more generally, adesigner), it is critical that you walk throughyour conceptual models with users and makesure that your modeling assumptions areappropriate. In some cases, you may have toexamine sample data from the existingcomputer-based or manual system to determinewhether (for instance) there are any productsthat are supplied by multiple suppliers.

    At the modeling stage, making changes such asconverting a one-to-many relationship to amany-to-many relationship is trivialall that isrequired is the addition of a crows foot to one

    end of the relationship, as shown in Figure 3.2 .In contrast, making the same change once youhave implemented tables, built a userinterface, and written code is a time-consumingand frustrating chore.

    Generally, you can count on the 10

    ruleof thumb when building software: thecost of making a change increases by anorder of magnitude for each stage of thesystems development lifecycle that youcomplete.

    FIGURE 3.2: An ERD for an environment inwhich there is a many-to-many relationship

    between products and suppliers.

    Product

    suppliedby

    Supplier

    Unit price

    ProductID

    Qty on hand

    Name

    Address

    The addition of asecond crowsfoot transformsthe one-to-manyrelationship into amany-to-manyrelationship.

    !

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    5/23

  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    6/23

    Introduction: The importance of conceptualAn introduction to data modeling

    as the labels make the diagram easier tointerpret.

    To illustrate, consider the relationship betweenproducts and suppliers shown in Figure 3.1 . Therelationship is described by the verb phrasesupplied by. Although one could have optedfor the shorter relationship name has instead,the resulting diagram (e.g., Supplier hasproduct) would be more difficult for readers ofthe diagram to interpret.

    3.1.2.3 Relationship direction

    One issue that sometimes troubles neophytedata modelers is that the direction of the

    relationship is not made explicit on thediagram. Returning to Figure 3.1 , it is obviousto me (since I drew the diagram) that therelationship should be read: Product issupplied by supplier. Reading the relationshipin the other direction (Supplier is supplied byproduct) makes very little sense to anyone whois familiar with the particular problem domain.

    Generally, ERDs make certain assumptionsabout the readers knowledge of the underlyingbusiness domain.

    A notational convention supported bysome CASE tools is to require two namesfor each relationship: one that makessense in one direction (e.g., is suppliedby), and another that makes sense in theopposite direction (e.g., supplies).Although double-naming may make thediagram easier to read, it also addsclutter (twice as many labels) andimposes an additional burden on themodeler.

    3.1.2.4 Cardinality

    As discussed in Section 3.1.1.2 , the cardinalityof a relationship constrains the number ofinstances of one entity type that can beassociated with a single instance of the otherentity type.

    The cardinality of relationships has animportant impact on number andstructure of the tables in the database.Consequently, it is important to get thecardinality right on paper before startingthe implementation.

    buys

    FIGURE 3.4: A relationship named buys.

    ?

    !

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    7/23

    Introduction: The importance of conceptualAn introduction to data modeling

    There are three fundamental types ofcardinality in ERDs:

    One-to-many You have already seen an

    example of a one-to-many relationship inFigure 3.1 . You will soon discover that one-to-many relationships are the bread andbutter of relational databases.

    One-to-one At this point in your datamodeling career, you should avoid one-to-one relationships. To illustrate the basicissue, consider the ERD shown in Figure 3.5 .Based on an existing paper-based system,the modeler has assumed that eachcustomer is associated with one customerrecord (i.e., a paper form containinginformation about the customer, such as

    address, fax number, and so on). Clearly,each customer has only one customerrecord and each customer record belongs toa single customer. However, if we automatethe system and get rid of the paper form,then there is no reason not to combine theCustomer and Customer Record entities into

    a single entity called Customer.

    In many cases, one-to-one relationshipsindicate a modeling error. When you havea one-to-one relationship such as the oneshown in Figure 3.5 , you should combinethe two entities into a single entity.

    Many-to-many The world is full of many-to-many relationships. A well-used exampleis Student takes course. Many-to-manyrelationships also arise when you considerthe history of an entity. To illustrate,consider the ERD shown in Figure 3.6 . Atfirst glance, the relationship betweenFamily and Single-Family Dwelling (SFD)might seem to be one-to-one since aparticular family can only live in one SFD ata time and each SFD can (by definition) onlycontain a single family. However, it ispossible for a family to live in different

    houses over time. Similarly, it is possiblethat many families inhabit a particularhouse over the years. Thus, if the conceptof time is considered, the relationshipbecomes many-to-many.

    We will discuss how you go about determiningcardinality in subsequent sections. At this point,it is sufficient to recognize that there are two

    !

    FIGURE 3.5: An incorrect one-to-onerelationship

    Customer associatedwith

    CustomerRecord

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    8/23

    Introduction: The importance of conceptualAn introduction to data modeling

    popular (and equivalent) approaches todenoting one and many on an ERD: thecrows foot notation you have already seen andthe 1:N notation.1. Crows foot notation In the crows foot

    notation, three little lines (resembling acrows foot) are used to indicate many.Not surprisingly, the absence of a crowsfoot indicates one. Thus, the relationshipin Figure 3.7 indicates that each product issupplied by at most one supplier, whereaseach supplier may supply many products..

    2. 1:N notation In the 1:N notation, thesymbol N (and/or M) is used to indicatemany whereas 1 is used to indicateone. An example of the 1:N notation isshown in Figure 3.8 .

    The ERD model also supports additionalcardinality information in the form ofcardinality constraints. To keep things

    simple, however, the discussion ofcardinality constraints is deferred untilSection 6.3.2 .

    3.1.2.5 Attributes

    Attributes are properties or characteristics of a

    particularly entity about which we wish tocollect and store data. In addition, there istypically one attribute that uniquely identifiesparticular instances of the entity. For example,each of your customers may have a uniquecustomer ID. Such attributes are known as keyattributes .

    FIGURE 3.6: What is the cardinality of thisrelationship?

    Family livesin

    Single-Family

    Dwelling

    Product suppliedby

    Supplier

    FIGURE 3.7: A one-to-many relationship incrows foot notation

    Productsupplied

    by Supplier

    N 1

    FIGURE 3.8: A one-to-many relationship in1:N notation

    ?

    http://tables2.pdf/http://tables2.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    9/23

    Learning objectives9 o f 23

    An introduction to data modeling

    For ERDs that are drawn manually, attributesare traditionally shown as ovals containing thename of the attribute. If the attribute is a key,it is typically underlined. A number ofattributes for the Customer entity are shown inFigure 3.9 .

    Adding attributes to ERDs can result invery cluttered diagrams. Some CASE toolslist the attributes inside the entityrectangle (see the discussion of CASEtools in Section 3.4.5 ). Another way to

    reduce clutter is to only show a handful ofcritical attributes on the diagram.

    3.2 Learning objectives! understand the core constructs of the

    entity-relationship model

    ! create an ERD based on yourunderstanding of a business scenario

    ! use associative entities to add

    attributes to relationships! gain some familiarity with the role ofdata modeling and CASE tools in thedevelopment process

    3.3 Exercises

    In the sections that follow, we step through theconstruction of an ERD for the kitchen supplyscenario. By following along, you should gain abetter understanding of the basic techniquesinvolved in data modeling as well as some of thedesign pitfalls that should be avoided.

    If this lesson was on how to play golf, youwould not read it and then assume thatyou are a good golfer. Golf is a skill thatrequires both theoretical knowledge andhours of practice. Thus, the only way tobecome a good golfer is to acquire a solidunderstanding of the fundamentals andthen go out and hit thousands of balls.The same principles apply to datamodeling.

    3.3.1 Starting simple

    Let us begin with the simplest and most

    essential statement one can make about the

    Customer

    CustIDName Phone No.

    Contact person

    FIGURE 3.9: A number of attributes of theCustomer entity are shown in ovals.

    ?!

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    10/23

    Exercises10 o f 23

    An introduction to data modeling

    wholesaling environment: customers buyproducts . It is natural that you would want toboth automate and informate (recallSection 2.4 ) this important business process.

    3.3.1.1 Step one: identify the entities

    Entities are physical things, organizations, rolesand events about which we want to storeinformation. In the wholesaling scenario, twoentities are immediately obvious: Customer and

    Product. However, before we add the entities toour diagram, it is important that we have a firmunderstanding of what exactly these entitiescorrespond to in the real world:1. Customers In the wholesaling

    environment, a customer is an organization,not a person. There may be a single personat the organization through whom weconduct our business. We will refer to thisperson as the contact person for thecustomer in order to maintain a cleandistinction between people andorganizations.

    2. Products It is not immediately clearwhether the Product entity refers to aspecific item or a class of similar items. Forexample, one of the products you sell is theFat Cat mug. The Product ID of the mug is88 4017 and it normally sells for $5.50.Note, however, that there are many

    individual Fat Cat mugs and each one is

    slightly different due to irregularities,variations in painting, and so on. In ourcase, there are advantages to ignoring theindividuality of each mug and treating themall as a single group of interchangeableitems. Thus, when we talk about aproduct or a SKU, we are talking aboutan entire class of similar instances, notindividual instances themselves. 1

    Having made these assumptions explicitly, we

    can now create our first ERD.

    Take out a piece of paper and a pencil.

    Unless you have a special-purpose CASEtool, it is seldom worth the effort to drawthe early drafts of your conceptualmodels on a computer.

    ERDs typically require many modificationsso you should not invest much timemaking your diagrams look nice. In fact,the diagram you are about to begin will

    1 In some environments, it may be necessary to treatproducts as individual items. For example, in theaerospace industry, there is a requirement to trackindividual parts by serial number in case a part fails.The requirement for unit effectivity necessitates adifferent set of assumptions about the Product entityand thus leads to a different database design.

    ?

    !

    http://scenario.pdf/http://scenario.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    11/23

    E iA i t d ti t d t d li g

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    12/23

    Exercises12 o f 23

    An introduction to data modeling

    3.3.1.4 Step four: identify a few important attributes

    There is a need to find a balance between thedescriptiveness of the ERD and the ease withwhich others can decipher it. As such, I preferto show only a handful of attributes on theERDs.

    Add a small number of important attributesto your diagram, as shown in Figure 3.13 .

    By adding attributes such as Qty on hand tothe Product entity, it is clear that the entity

    refers to a class of products, not an individual

    product. In contrast, if Serial number wereadded to the diagram as an attribute instead ofQty on hand, the reader of the ERD wouldcome to a different conclusion about themeaning of the Product entity.

    3.3.2 Dealing with many-to-manyrelationships

    Although the diagram in Figure 3.13 istechnically correct, it is missing a great deal ofinformation about how a purchase transactionoccurs in reality.

    To illustrate, consider the attribute Unit pricebelonging to the Product entity. Unit pricecontains the default selling price of theproduct. To understand why it is the defaultprice, consider the case of a Fat Cat mug that

    typically sells for $5.50. What if there is a

    FIGURE 3.12: Indicate the many-to-manycardinality of the relationship.

    Customer Productbuys

    Add a crows foot to the customer side to indicate that each productis purchased by many customers.

    1111

    Add a second crows foot to theproduct side to indicate that eachcustomer purchases many products.

    2222

    FIGURE 3.13: An initial ERD for the kitchensupply environment.

    Customer Productbuys

    ProductID

    Unit price Qty on hand

    CustID

    Name Contact person

    ExercisesAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    13/23

    Exercises13 o f 23

    An introduction to data modeling

    particular mug that has a minor flaw? Althoughthe mug can still be sold, the customer mayexpect a discounted price to compensate forthe flaw. The question is therefore: Where onthe diagram do we indicate the actual sellingprice of a particular mug?

    A second example is the purchase quantity.What if the customer purchases a dozen FatCat mugs? Where is this information recorded?The Qty ordered attribute does not belong to

    the Customer entity because customers ordermany products besides mugs. Similarly, Qtyordered does not belong to the Product entitybecause different customers may orderdifferent quantities.

    This is a problem that typically arises in many-

    to-many relationships: There are certainimportant attributes that do not seem to belongto either of the entities participating in therelationship. The solution is to assign theattributes to the relationship itself.

    3.3.2.1 Attributes of relationships

    The issues surrounding price and order quantityarise because the attributes belong to theinteraction of the entities in the many-to-manyrelationship. Thus, the price of a flawed mug isan attribute of a particular product being soldto a particular customer on a particular day.

    To summarize, there are a number of attributesthat should be attached to the buysrelationship in Figure 3.13 :

    Date the date on which the purchase ismade;

    Actual price the price at which the item(or multiple items within the same class ofproducts) are actually sold to the customer;

    Quantity ordered the number of items

    with a certain product ID requested by thecustomer; and,

    Quantity shipped the actual number ofitems shipped to the customer.

    3.3.2.2 Associative entities

    Given the number and importance of theattributes attached to the buys relationship,it makes sense to treat the relationship as anentity in its own right. To transform arelationship into an entity on an ERD, we use aspecial symbol called an associative entity . Thenotation for an associative entity is arelationship diamond nested inside of an entityrectangle, as shown in Figure 3.14 .

    To transform your many-to-many relationship(without attributes) into an associative entity(with attributes), do the following:

    ExercisesAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    14/23

    Exercises14 o f 23

    An introduction to data modeling

    Draw a rectangle around the buysrelationship.

    Replace the relationship name buys withan appropriate noun, for example Sale.

    Remember, entitiesincluding associativeentitiesare named with nouns.

    Decompose the many-to-many relationshipinto two one-to-many relationships.

    Add the attributes to the associative entity,as shown in Figure 3.14 .

    Although Sale is now treated as an entity,there is no requirement to add

    relationship diamonds between Product

    and Sale or Customer and Sale. Anassociative entity serves as an entity and a relationship at the same time.

    The meaning of the Sale associative entity inFigure 3.14 is the following: Each customer canbe involved in many sales transactions, but eachindividual sales transaction involves only onecustomer. Similarly, each product can beinvolved in many sales transactions, but eachsales transaction involves only one type ofproduct.

    3.3.2.3 Illustration

    To better understand how an associative entityworks, it is worthwhile to jump ahead a bit andconsider what the data might look like. InFigure 3.15 , sample data for the Customer,Product, and Sale entities are shown. There area couple of interesting things to notice aboutthe sample data:1. Each entity contains information relevant to

    that entity only. For example, Customeronly contains information about customers;Product only contains information aboutproducts, and so on.

    2. Each row in the Sale entity shows thedetails of a single sales transaction. Eachsales transaction consists of a particularproduct being sold to a particular customer.

    The transaction-specific information (such

    FIGURE 3.14: Transform a many-to-manyrelationship into an associative entity.

    Customer ProductSale

    DateQuantity ordered

    Quantity shipped Actual price

    ?

    !

    ExercisesAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    15/23

    Exercises15 o f 23

    An introduction to data modeling

    as the actual selling price and quantity

    ordered) are attributes of the Sale entity.3. By using the data for the Sale entity, it is

    possible to determine which products havebeen purchased by a particular customer.Similarly, it is possible to determine whichcustomers have purchased a particularproduct.

    4. Only the minimum amount of information

    required to identify the customer andproduct is included in the Sale associativeentity. For example, neither the name ofthe customer or the description of theproduct appear in Sale since thisinformation can easily be found elsewhereusing the values of Customer ID and

    Product ID respectively. In the context of

    FIGURE 3.15: Data showing the role of an associative entity

    Each customer can

    participate in manysales transactions.

    Each product canparticipate in manysales transactions.

    Customer Product

    Sale

    ExercisesAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    16/23

    Exercises16 o f 23

    An introduction to data modeling

    the Sale entity, the Customer ID andProduct ID attributes are called foreignkeys (foreign keys are so important thatLesson 6 is devoted to the topic).

    3.3.3 Revising the ERD

    There is an important problem with the ERD asits now stands. The constraint that each salestransaction involves only a single productappears to be at odds with the reality of the

    business situation described in Lesson 2. Forexample, the sales transactions with whichwe are most familiarthe customer orders youreceive by fax typically request manydifferent products: a dozen Fat Cat mugs,two dozen spatulas, some wire whisks, and soon. The mismatch between the diagram and thebusiness environment means that the ERD mustbe revised.

    3.3.3.1 Identifying the problem

    The problem with the ERD in Figure 3.14 is thatignores the technology (broadly speaking) used

    by customers to place orders. Specifically,customers normally wait until they need enoughstock to make an order worthwhile. In addition,factors such as minimum order values andshipping costs favor the batching of small,single-product orders into large, multi-productorders.

    By taking the technology used for ordering intoaccount, it becomes clear that we have failedto model an important event entity : the arrivalof an order.

    Be carefulnot all pieces of paper in theexisting business process areautomatically event entities. Forexample, the invoices that we send to ourcustomers are more properly thought ofas reports (which can be generated fromthe information already contained inother entities). With practice, thedistinction between entities and non-entities will become clear.

    3.3.3.2 Adding the new entity

    In this section, you are going to modify your ERDto include an Order entity.

    Create a new ERD consisting of entities forcustomers, orders and products.

    Here is where a CASE tool pays off: youcan delete entities or move entitiesaround the screen and the relationshipsand connector lines follow automatically.

    Create relationships to reflect the fact thatcustomers place orders and orders consistof products.

    !

    ?

    ExercisesAn introduction to data modeling

    http://tables2.pdf/http://scenario.pdf/http://scenario.pdf/http://tables2.pdf/http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    17/23

    17 o f 23g

    Add cardinality symbols to reflect the factthat each customer can place many orders,but each order belongs to a single customer.

    Add cardinality symbols to reflect the factthat each order can contain many productsand each product may be contained in manyorders.

    Add a handful of attributes to the diagramto help clarify the meaning of the entities.

    The resulting ERD is shown in Figure 3.16 .

    3.3.3.3 Creating OrderDetails

    Although Figure 3.16 is a great improvementover our previous ERD, much of the sameinformation missing from Figure 3.13 such asactual price and quantity orderedis missingfrom the new ERD. As a consequence, we musttransform the contains relationship into anassociative entity with its own attributes.

    Transform the contains relationship into

    an associative entity using the proceduredescribed in Section 3.3.2.2 .

    To remain consistent with M ICROSOFTssample databases, I recommend using thename Order Detail for the associativeentity. Alternatives include Line Item

    and Order Item.

    The resulting ERD is shown in Figure 3.17 . Tohelp understand the relationship between theOrder entity and the Order Detail associativeentity, look at the orders included in the projectpackage . Each order has header information(such as customer, order date, and so on) andmultiple order details. Each detail hasinformation such as quantity ordered, quantityshipped, and price.

    ?

    FIGURE 3.16: Revise the ERD to include anOrder entity.

    Customer

    Product

    places Order

    contains

    ProductID

    Unit price

    Qty on hand

    CustID Name

    Contact person

    OrderID

    Date

    DiscussionAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    18/23

    18 o f 23g

    3.4 Discussion

    3.4.1 Logical versus physical models

    A distinction is typically made between logical and physical data models:

    Logical data models logical models

    capture general information about entities

    and relationships and are used forcommunication with business users.

    Physical data models physical models

    serve as a precise specification for theimplemented system. As a consequence,the models must take into account thetechnology used to store the data. Forexample, a given logical data modeltranslates into very different physical datamodels depending on whether the target

    technology is a file-based system, arelational database, or an object-orienteddatabase.

    Normally, you start with a high-level logicalmodel and refine with the help of users overseveral iterations. Once you are happy with thelogical model, you transform it into a physicalmodel and hand it to a database administrator (DBA) for implementation as a database.

    For relational databases, the translationprocess from logical to physical is relativelystraightforward and involves the followingsteps:1. Decompose all many-to-many

    relationships Since the relationaldatabase model does not support many-to-many relationships, you must replace allmany-to-many relationships withassociative entities as described in

    Section 3.3.2.2 .

    FIGURE 3.17: Create an associative entity tomodel individual order details.

    Customer

    Product

    places Order

    ProductID

    Unit price

    Qty on hand

    CustID Name

    Contact person

    OrderID

    Date

    Orderdetail

    Actual price

    Quantity ordered

    Quantity shipped

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    19/23

    DiscussionAn introduction to data modeling

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    20/23

    20 o f 23

    between the release of A CCESSversion 2.0 andACCESS2000 to make the product moreaccessible to data modeling neophytes. Forexample, there is a table analyzer, a tabledesign wizard, and all sorts of other aidsintended to automate the database designprocess. In my view, there are three problemswith MICROSOFTs dumbing-down strategy:1. No wizard or add-in tool is going to change

    the fact that database management systems

    (DBMSs) are specialized software packagesthat presuppose an enormous amount ofprior knowledge. Even the error messagesgenerated by A CCESS(as you will soondiscover) can only be understood if youhave a firm grasp on the theoreticalfundamentals of the relational database

    model. For example, what do you do whenyou accidentally violate a referentialintegrity constraint? What is a primarykey? What is an ambiguous outer join?

    2. Trends in application developmentincreasingly emphasize data and de-emphasize programming. Thus, a solidunderstanding of your data is critical. Putanother way, just about anyone can build areasonably sophisticated system if theunderlying database is well designed(indeed, by doing these tutorials, you willsee just how far you can go without writinga single line of programming code). In

    contrast, if the database design is poor, youwill have to be a programming wizard justto create the illusion that the applicationworks.

    3. CASE tools from vendors such as O RACLE,COMPUTERASSOCIATES, VISIO(now part ofMICROSOFT), and many others translate ERDsdirectly into database tables. Thus, if youknow how to draw diagrams similar to theone shown in Figure 3.17 , you know how to

    design databases.In short, the last thing you want to do is rely onwizards to shelter you from the database designprocess. Instead, you want to get in at thenitty-gritty level and understand the trade-offsbetween various designs. Once the design iscomplete you can use wizards and shortcuts foreverything else.

    3.4.4 How do I learn about data modeling?

    One important problem that I perceive as auniversity professor is that despite theimportance of data modeling, it is very difficultto find good practical training as a datamodeler.

    In the standard computer science databasecourse, we tend to focus on theoretical issuessuch as relational algebra, set theory, indexing,normalization, and so on. Although such

    knowledge is certainly important for DBAs and

    Application to the projectAn introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    21/23

    21 o f 23

    other technical professionals, it provides littleguidance when we are faced with real-worldmodeling problems.

    Conversely, the introductory informationsystems (IS) courses that we offer in businessschools provide only cursory treatment of datamodeling and database design. I suppose therationale is that it should be possible to hire acomputer science graduate to do the datamodeling!

    3.4.5 CASE tools and the design process

    Computer-aided software engineering (CASE)tools are software packages that simplify theprocess of creating conceptual models. Inaddition, some CASE packages are more than

    just drawing tools: they translate the datamodels into database tables, programming codetemplates, and so on.

    To illustrate, consider the diagram inFigure 3.19 which was drawn using thedatabase modeling tool in V ISIOENTERPRISE.The database modeling tool allows a designer tostart with a a physical ERD-like diagram and addimplementation-level metadata (data aboutdata). For example, in Figure 3.20 , thephysical-level properties of the ActualPrice attribute are specified in a dialog box.

    Once the physical-level metadata has been

    added to the model, many CASE tools can

    generate table schemas for the targetdatabase. For example, O RACLEDESIGNERcangenerate structured query language (SQL) datadefinition commands for O RACLEdatabases. V ISIO ENTERPRISEcan generate SQL or create the tablesin various database packages (including A CCESS)directly.

    Given CASE tools with this type of functionality,it is clear that the most important skill fordatabase designers is not a complete knowledge

    of SQL syntax. Instead, it is the ability toanalyze a real-world problem and create a datamodel that accurately and elegantly capturesthe critical elements of the problem.

    3.5 Application to the project

    Complete your own ERD for the order entryscenario. Remember that at this point, thescope of the project is very limited. It willbe expanded somewhat in subsequentlessons.

    HINT: If you get stuck, you can refer to the V ISIO diagram in Figure 3.19 . Keep in mind,however, that Figure 3.19 is a physical ERD; you should be working with logical ERDs at this stage of the developmentprocess.

    Application to the project22 f 23

    An introduction to data modeling

    http://map.pdf/http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    22/23

    22 o f 23

    CustNameBillingAddressCityProvStatePostalCodeRegionCodeContactPersonContactPhoneOnLineOrdering

    CustID

    Customer

    CustIDOrderDateProcessed

    OrderID

    Order

    places

    QtyOrderedQtyShippedActualPrice

    OrderIDProductID

    OrderDetail

    consists of

    DescriptionUnitUnitPriceQtyOnHand

    ProductID

    Product

    is a

    FIGURE 3.19: An physical database model for the kitchen supply environment created using a CASEtool.

    VISIO ENTERPRISE canbe used to drawvarious styles of logical and physicaldata models. This isan example of aphysical model.

    Attributes areshown within thebody of the entityand primary keysare shown in bold.

    Crows footnotation is usedindicate thecardinality of therelationships.

    Since this is a physical database diagram, thereare no many-to-many relationships or associativeentities. For example, the Order Detail associativeentity is shown as a regular entity.

    http://map.pdf/
  • 8/14/2019 Lesson 3: An Introduction to Data Modeling

    23/23