data management: databases and organizations richard watson

87
Data Management: Databases and Organizations Richard Watson Summary of Chapters 3-6 prepared by Kirk Scott 1

Upload: clare

Post on 16-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Data Management: Databases and Organizations Richard Watson. Summary of Chapters 3-6 prepared by Kirk Scott. Data Modeling and SQL. Chapter 3. The Single Entity Chapter 4. The One-to-Many Relationship Chapter 5. The Many-to-Many Relationship - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Management:  Databases and Organizations Richard Watson

1

Data Management: Databases and OrganizationsRichard Watson

Summary of Chapters 3-6 prepared by Kirk Scott

Page 2: Data Management:  Databases and Organizations Richard Watson

2

Data Modeling and SQL

• Chapter 3. The Single Entity• Chapter 4. The One-to-Many Relationship• Chapter 5. The Many-to-Many Relationship• Chapter 6. One-to-One and Recursive

Relationships

Page 3: Data Management:  Databases and Organizations Richard Watson

3

Introduction

• Large parts of these overheads will be somewhat repetitive

• They cover in general terms some of the things that were specifically illustrated by concrete SQL examples

• However, the repetition shouldn’t be harmful• It should put the examples into a broader context, and

add new examples to flesh the ideas out• The ultimate goal is for the basic concepts and

diagramming to be clear so that there will be no trouble considering design questions in chapter 7

Page 4: Data Management:  Databases and Organizations Richard Watson

4

Chapter 3. The Single Entity

• The author starts with the entity relationship diagramming conventions and the concept of a single entity

• The author represents an entity with a box containing its name in capital letters inside, at the top

• Full field names are given after that in small letters

• The primary key field is marked with an asterisk

Page 5: Data Management:  Databases and Organizations Richard Watson

5

Page 6: Data Management:  Databases and Organizations Richard Watson

6

Page 7: Data Management:  Databases and Organizations Richard Watson

7

Page 8: Data Management:  Databases and Organizations Richard Watson

8

Page 9: Data Management:  Databases and Organizations Richard Watson

9

Page 10: Data Management:  Databases and Organizations Richard Watson

10

• Different diagramming conventions are perfectly acceptable, as long as you are consistent

• The name of the entity may be given above the box representing it

• You may choose just to capitalize just the first letter of the name

Page 11: Data Management:  Databases and Organizations Richard Watson

11

• In theory, you could qualify field names, although this would be redundant, given the entity name at the top

• You could also use short names for fields if space is at a premium

• Primary keys could be marked with pk or underlined

Page 12: Data Management:  Databases and Organizations Richard Watson

12

Chapter 4. The One-to-Many Relationship

• The author uses the crow’s foot to mark a one-to-many relationship in a ER diagram

• In a simple ER diagram fields may not be listed, just entity names and crow’s feet

• In a more complete diagram, fields can be listed

Page 13: Data Management:  Databases and Organizations Richard Watson

13

• The author does not include the embedded pk/fk in the list of fields in the fk/many table because it is redundant

• I do not follow this convention• I believe that in the interests of clarity it is

worthwhile to include the fk in the list of fields

Page 14: Data Management:  Databases and Organizations Richard Watson

14

Page 15: Data Management:  Databases and Organizations Richard Watson

15

Page 16: Data Management:  Databases and Organizations Richard Watson

16

Page 17: Data Management:  Databases and Organizations Richard Watson

17

Page 18: Data Management:  Databases and Organizations Richard Watson

18

Chapter 5. The Many-to-Many Relationship

• As is known, the many-to-many relationship is the most “complicated” of the relationships

• The book presents some interesting examples that arise in real situations

• They illustrate ideas that are not immediately apparent from the examples that have gone before

• The first example is based on a bill of sale, shown on the next overhead

Page 19: Data Management:  Databases and Organizations Richard Watson

19

The Bill of Sale Example: An Interesting Case of a pk/fk Relationship

Page 20: Data Management:  Databases and Organizations Richard Watson

20

• The book analyzes this situation as consisting of base entities which are a sale and the items which are sold

• There is a many-to-many relationship between these base entities because each sale can consist of many items

• Also, each item can be present in many sales• The book’s ER for this analysis is shown on the

next overhead

Page 21: Data Management:  Databases and Organizations Richard Watson

21

Page 22: Data Management:  Databases and Organizations Richard Watson

22

• When first introducing many-to-many relationships, I referred to the table in the middle

• More formally, the book refers to an associative entity

• The associative entity is the table in the middle that captures the relationship between two base entities

Page 23: Data Management:  Databases and Organizations Richard Watson

23

• In the ER notation for this example the + sign is used

• This has not been seen before• For the purposes of understanding the book’s

example, it is important to know what this means

Page 24: Data Management:  Databases and Organizations Richard Watson

24

• The + sign is shown over a crow’s foot• It symbolizes the fact that the embedded fk is

part of the pk of the table it’s embedded in• You have seen an example of a table in the

middle where the pk is the concatenation of the two embedded fk’s

• This example is not the same as that

Page 25: Data Management:  Databases and Organizations Richard Watson

25

• In this example the saleno is the pk of the Sale table• It is embedded as a fk in the Lineitem table• A saleno value will appear in the Lineitem table as

many times as there are separate lines belonging to the sale

• These separate lines are identified by lineno’s• The lineno’s are not embedded fk’s based on the

unique identifiers, itemno’s, of entries in the Item table

Page 26: Data Management:  Databases and Organizations Richard Watson

26

• An alternative way of representing the relationship would be to list the fields of the table in the middle this way:

• saleno pk, fk• lineno pk• itemno fk• lineqty• lineprice• Note again that the saleno is both a pk and a fk, while

the lineno is purely pk

Page 27: Data Management:  Databases and Organizations Richard Watson

27

• At first glance it may seem a little strange, but the table in the middle contains every line of every sale, listed separately

• It is the saleno and the lineno together which uniquely identify the entries in the Lineitem table

• This model actually reflects reality well• It differs, in particular, from the car sale example

Page 28: Data Management:  Databases and Organizations Richard Watson

28

• In the car sale example, there were individual cars that were sold

• In the example database they were only shown as being sold once

• In reality, the same car might be sold more than once

• This could be modeled by making the salesdate part of the pk of the Carsale table

Page 29: Data Management:  Databases and Organizations Richard Watson

29

• In the Sale, Lineitem, Item example, the items are not actually individual items

• An item is a kind of item, like a screw or a shovel or a microwave oven

• The seller may have many of each kind of item in stock and doesn’t distinguish between individual items

Page 30: Data Management:  Databases and Organizations Richard Watson

30

• Multiple instances of the same (kind of) item may be sold to the same customer

• Also, the same (kind of) item can be sold to more than one customer

• It’s not incredibly difficult, but it’s worth emphasizing that the itemno does appear in the table in the middle as a fk

• This tells which item that line of a sale was in reference to• However, the itemno is not part of the pk of the table in

the middle

Page 31: Data Management:  Databases and Organizations Richard Watson

31

• In a perfect world, you might argue that each item should appear on only one line of a sale

• If so, then you could dispense with individual line numbers and use the itemno as part of the pk instead

• However, reality makes the given solution better

Page 32: Data Management:  Databases and Organizations Richard Watson

32

• When creating a data model, it should be flexible and accommodate all possibilities

• Could a customer, in the middle of making a purchase, decide that more instances of a certain item were desired?

• If so, do you allow this, and how do you support it?

Page 33: Data Management:  Databases and Organizations Richard Watson

33

• From a business point of view, few things are more destructive than a computer system whose model imposes artificial constraints on the user (seller and customer)

• Of course, if a customer decides that more instances are desired you want to sell them

Page 34: Data Management:  Databases and Organizations Richard Watson

34

• Have you ever heard any kinds of things like these:

• “I’d like to let you buy more, but the computer won’t allow it.”

• The customer wants to scream• “I’d like to let you buy more, but it will be

necessary to start a completely new bill of sale.”• The customer wants to scream

Page 35: Data Management:  Databases and Organizations Richard Watson

35

• “I’d like to let you buy more, but it will be necessary to go back and modify the earlier line of the sale for that item.”

• The customer again wants to scream• The best scenario would go like this:• “Oh, you want 20 instead of 10? We’ll just add

another line here at the bottom for another 10.”• The customer sighs with satisfaction…

Page 36: Data Management:  Databases and Organizations Richard Watson

36

Relational Division, For All, and Not Exists

• The book points out that SQL, with operations like AND, OR, NOT, and so on, has qualities of algebra

• Similarly, there are set operators like UNION• Although Microsoft Access SQL doesn’t

support INTERSECT, some implementations do

Page 37: Data Management:  Databases and Organizations Richard Watson

37

• The Cartesian product represents a form of multiplication for relations

• The results of a join operation are a subset of the results of a product

• In an algebraic system, the existence of a multiplication operation implies the existence of a division operation

Page 38: Data Management:  Databases and Organizations Richard Watson

38

• As pointed out when doing the concrete SQL examples, there is no FOR ALL operator

• However, double NOT EXISTS can accomplish the same thing

• For those who are theoretically inclined, it may be worth noting that it is FOR ALL/double NOT EXISTS which is analogous to division in a relational system

Page 39: Data Management:  Databases and Organizations Richard Watson

39

• In any case, on test one, not everyone was clear on the order and role of the tables in a double NOT EXISTS query

• This book shows a ER diagram of 3 tables capturing a many-to-many relationship

• This diagram is labeled generically, but it is of the same structure as the Lineitem example

Page 40: Data Management:  Databases and Organizations Richard Watson

40

• It then outlines the double NOT EXISTS query that could be written for it

• The fact that this models the Lineitem example is not important

• The table in the middle could have a completely concatenated primary key

• It could also have its own, separate primary key

Page 41: Data Management:  Databases and Organizations Richard Watson

41

• The important point is that the base tables are at the ends of the ER diagram

• The book refers to these as target and source, respectively

• The table in the middle, the associative entity, is labeled Target-Source by the book

Page 42: Data Management:  Databases and Organizations Richard Watson

42

• If you want to find those rows of the target which are in relation to all of the rows of the source,

• Then in the double NOT EXISTS query:– The target appears first, in the outermost query– The source appears second, in the middle, in the first

nested subquery– And the table in the middle appears last, in the second

nested subquery• The ER diagram and the schematic query are shown

on the next overhead

Page 43: Data Management:  Databases and Organizations Richard Watson

43

Page 44: Data Management:  Databases and Organizations Richard Watson

44

A Design with a Cycle

• The next diagram illustrates a design containing a cycle

• Such designs will become especially important when considering normalization, the theory of correctness in designs

• For the time being simply note that there is nothing preventing designs with cycles

Page 45: Data Management:  Databases and Organizations Richard Watson

45

Page 46: Data Management:  Databases and Organizations Richard Watson

46

A Concatenated Key with Date

• The next example design is one where both of the embedded foreign keys are part of the primary key of a table in the middle

• However, it is more complicated than that because a date field is also included in the primary key

• This allows the same pair of base values to be paired with each other more than once

Page 47: Data Management:  Databases and Organizations Richard Watson

47

Page 48: Data Management:  Databases and Organizations Richard Watson

48

A Simple Concatenated Key

• The next design is actually somewhat simpler• It also has two embedded pk/fk’s in the table

in the middle• The table in the middle isn’t pure key though• There is also a non-key attribute field for the

table in the middle

Page 49: Data Management:  Databases and Organizations Richard Watson

49

Page 50: Data Management:  Databases and Organizations Richard Watson

50

The Music CD Library Example

• In the overheads for chapters 3 and 4 some very primitive starting designs were given for a collection of music CD’s

• At the end of chapter 5, with the capability to model many-to-many relationships, this model blossoms

• On the next overhead an 8 entity design is shown• Note that 4 of the 8 entities can be classified as

associative entities

Page 51: Data Management:  Databases and Organizations Richard Watson

51

Page 52: Data Management:  Databases and Organizations Richard Watson

52

• The next overhead shows the music CD design blossoming further

• In the book, the new relationships are analyzed• I will not list the analysis here• In theory it would be possible to compare the two

designs and understand what additional assumptions/capabilities result from the new design

• The new design should be a better model of reality, with fewer exceptions and more flexibility

Page 53: Data Management:  Databases and Organizations Richard Watson

53

Page 54: Data Management:  Databases and Organizations Richard Watson

54

Chapter 6. One-to-One and Recursive Relationships

• What one-to-one relationships are should be clear

• The book uses the term recursive relationship for those cases where a table is in a relationship with itself

Page 55: Data Management:  Databases and Organizations Richard Watson

55

One-to-One Relationships

• You may recall some of the different options for capturing one-to-one relationships

• If this is truly one-to-one in all cases at all times, then this can be a single relation

• Otherwise, you end up embedding the pk of one entity as a fk in another

Page 56: Data Management:  Databases and Organizations Richard Watson

56

• Maintaining this as a one-to-one relationship then becomes a question of data integrity

• When choosing which pk to embed as a fk, you should take into consideration any possible exceptions or changes in the relationship in the future

• The book has a number of examples which illustrate details of this concept

Page 57: Data Management:  Databases and Organizations Richard Watson

57

• The book’s examples start with a company with a two level management hierarchy

• There are bosses of departments and there is an overall managing director

• The (non-ER) diagram on the following overhead illustrates this

Page 58: Data Management:  Databases and Organizations Richard Watson

58

Page 59: Data Management:  Databases and Organizations Richard Watson

59

• Next the book shows an ER diagram illustrating that departments have employees and that departments have bosses

• A garden variety crow’s foot doesn’t have to be labeled

• A one-to-one relationships should be labeled

Page 60: Data Management:  Databases and Organizations Richard Watson

60

Page 61: Data Management:  Databases and Organizations Richard Watson

61

• The foregoing diagram doesn’t explicitly show whether the pk of Dept is embedded as a fk in Emp or vice-versa

• In this case it is likely that the pk of Emp is embedded as a fk in Dept

• This is because, all else being equal, a department will have a boss

• However, few employees will be bosses• There would be lots of nulls if there were a

“department which you’re the boss of” field in Emp

Page 62: Data Management:  Databases and Organizations Richard Watson

62

A One-to-One Recursive Relationship

• Next, the book considers recording which employee is which other employee’s boss

• This leads to what the book calls a recursive relationship

• This is when there is a one-to-many relationship between a table and itself

• Such a one-to-many relationship should be labeled because the meaning of the embedding would not necessarily be clear

• An ER diagram illustrating this follows

Page 63: Data Management:  Databases and Organizations Richard Watson

63

Page 64: Data Management:  Databases and Organizations Richard Watson

64

• The previous design may not be ideal• If every employee is assigned to a department, it

would seem that the employee’s boss would be the boss of that department

• At first glance, at the very least, this appears to be redundant

• Redundancy means that information is repeated, and it opens up the possibility of inconsistencies between the repeated representations of the same data

Page 65: Data Management:  Databases and Organizations Richard Watson

65

• However, this is another problem that arises from real life

• Ask yourself, what departments are the bosses of departments assigned to?

• For example, if “Bob” is the head of Marketing and his department is listed as Marketing, is he his own boss?

• It should be apparent that his boss is the managing director

Page 66: Data Management:  Databases and Organizations Richard Watson

66

• Another detail that might be considered is split assignments or temporary assignments

• If an employee is split 50-50 between departments, who is their boss?

• If an employee is only temporarily assigned to a department, who is their boss?

• The apparently redundant design allows such cases to be handled with full flexibility

Page 67: Data Management:  Databases and Organizations Richard Watson

67

A One-to-One Recursive Relationship that Forms a Linked List

• The next example the book pursues is a little artificial

• However, something like it might arise in real life, and this provides an introduction to the idea

• It is possible for there to be a one-to-one relationship between a table and itself

Page 68: Data Management:  Databases and Organizations Richard Watson

68

• The following overhead illustrates the idea with the succession of monarchs

• The idea is that the pk of the monarch table is embedded as a fk in the table

• Every monarch except the first has the previous monarch recorded

• The problem could also be solved by simply recording a numbering for the monarchs

Page 69: Data Management:  Databases and Organizations Richard Watson

69

Page 70: Data Management:  Databases and Organizations Richard Watson

70

A Many-to-Many Recursive Relationship

• The next example considers a table in a many-to-many relationship with itself

• This is another example drawn from real life which is very instructive about how relational databases work

• It is helpful because it brings out one of the limitations of relational databases

• It provides insight into the subject of object-oriented databases

Page 71: Data Management:  Databases and Organizations Richard Watson

71

• Whenever a table is in a relationship with itself, the book refers to this as a recursive relationship

• As far as I’m concerned, the use of the term recursive is optional, although descriptive

• I am just as happy in this context with saying “in a relationship with itself”

• In any case, consider the ER diagram on the next overhead and the explanatory remarks that follow

Page 72: Data Management:  Databases and Organizations Richard Watson

72

Page 73: Data Management:  Databases and Organizations Richard Watson

73

• The idea is that the Product table contains entries for stand-alone products (possible sub-products) and for products (super-products) that consist of collections of other products

• Potentially the Product table might also contain things (sub-products) which themselves aren’t even individual products, but which only exist as components of finished products

Page 74: Data Management:  Databases and Organizations Richard Watson

74

• The Assembly table is the table which shows the relationship between products and sub-products (whether those sub-products have an independent existence or not)

• Notice that both of the crows’ feet in the diagram have + signs on them

• This means that the pk of an assembly is the concatenation of the embedded fk’s of a (super) product and a (sub) product

Page 75: Data Management:  Databases and Organizations Richard Watson

75

• In addition, the Assembly table has a quantity field, telling how many of the sub-product there are in the super-product

• If you assume that this is just a two-level hierarchy with super-products and sub-products, things seem relatively clear

• However, both from a database point of view and a real life point of view, there is no need for this restriction to apply

Page 76: Data Management:  Databases and Organizations Richard Watson

76

• There is no reason why a given product might not consist of several other (sub) products

• Each of these (sub) products, in turn might be super-products consisting of other sub-products, and so on

• Now the descriptiveness of the term recursion becomes apparent

Page 77: Data Management:  Databases and Organizations Richard Watson

77

• There is no theoretical limit on how deeply things might be in this kind of “has-a” relationship

• Practically speaking, the only limit is how many rows there are in the Product table

• This last claim leads to one more observation

Page 78: Data Management:  Databases and Organizations Richard Watson

78

• Data integrity would require that no product be a super-product or sub-product of itself

• Otherwise you would have a containment cycle

• It seems apparent that in real life this shouldn’t occur

Page 79: Data Management:  Databases and Organizations Richard Watson

79

• The product-assembly relationship crops up reasonably frequently in real life

• If you think about it, what’s really being captured is a tree-like containment structure

• Manufacturing is a problem domain where this is relevant

Page 80: Data Management:  Databases and Organizations Richard Watson

80

• A given sub-assembly is made of screws and panels

• A given super-assembly is made of multiple sub-assemblies

• That super-assembly, in turn, is part of something larger, and so on

Page 81: Data Management:  Databases and Organizations Richard Watson

81

SQL and Recursive Relationships

• The given relational design works, to a certain extent, but it has shortcomings

• For example, it is not necessarily an easy to understand or natural way to envision tree-like relationships

• In particular, consider what you know about SQL and what kind of query you might liked to execute against products and assemblies

Page 82: Data Management:  Databases and Organizations Richard Watson

82

• SQL is non-procedural• For a given product you could ask for all of it’s

immediate sub-products or sub-assemblies• However, it would not be possible to form a

query that would retrieve all of the constituent parts of a given product

• SQL won’t allow you to travel “down the tree”

Page 83: Data Management:  Databases and Organizations Richard Watson

83

Object-Oriented Databases• It is these problems that led, at least in part, to the

development of what are known as object-oriented databases

• In essence, O-O databases are constructed around tree-like containment

• Although extremely useful in some problem domains, it is estimated that O-O db’s have about 5% of the commercial market

• The remaining 95% is relational because relational db’s are applicable and convenient in so many other problem domains

Page 84: Data Management:  Databases and Organizations Richard Watson

84

The CD Music Library Again

• The chapter concludes with the latest version of the CD music library

• It illustrates several points• Although the ER diagram is useful for getting the big

picture, it’s becoming clear that without written text explaining the problem and the assumptions made, you haven’t completely and clearly document what’s going on

• This example illustrates another point, which is also relevant to the final project

Page 85: Data Management:  Databases and Organizations Richard Watson

85

• You might have thought that a CD music library was a pretty simple, toy application

• Notice that it has grown to 13 tables, twice as many as you’re required to have for your project

• It is likely that before you’re finished with your project, you will be simplifying the problem you tackled so that you meet the minimum requirements without inviting too much trouble for yourself

• The ER diagram is shown on the following overhead

Page 86: Data Management:  Databases and Organizations Richard Watson

86

Page 87: Data Management:  Databases and Organizations Richard Watson

87

The End