how to read a data model
DESCRIPTION
This is presntation on how you can read a data model and understand the data and business rules contained in it. It is intended for non-technical peopleTRANSCRIPT
How to read a data model?
1
By: Sanjay Sharma Consulting Enterprise and Data Architect e.mail: [email protected]
2
Goal
• To develop basic literacy about data models.
– To understand what it contains.
– To understand how information in it can be used
more effectively.
We would not touch upon
technicalities of developing
a data model.
3
Session Structure
• What is a data model, its need and context.
• Different types of data models
• Semantics of data models
• How to read a data model
• How to use data models more effectively
• Question – answers.
4
Why Model?
• John Boyd (1927-1997)– Military Strategist and Thinker
• Most original military thinker since Sun Tzu (600BC)
• OODA Loop: Every organization/organism uses OODA loop to adapt to its surroundings and survive.
5
Why model?
• Observation is information gathering.
• Orientation is developing a mental framework of information by understanding its structure and relationships .
• Models are observation as well as orientation tools which use symbols for real world facts.
• Models are effective because human mind absorbs more information visually than textually.
• Models in business and IT – Enterprise Models,
Business Process Models, Workflow Models
Interaction Models, Network Models etc.
6
Why model data?
• Data is a distinct component of an information system – the other component is application logic.
• It needs to be described in such a way that it is clearly and precisely communicated to all stake holders- information analysts, application developers, data analysts, database administrators etc.
• Every data element must have a
defined business purpose.
A data model is an un-ambiguous
and precise description of data,
its structure and relationships
agreed upon by all stakeholders.
7
What is a data model?
• It is a paper sheet with coloured rectangles and tangled web of crow-feet lines joining them……
• For a given information system, it is graphical representation of data elements, their relationships and constraints governing the data.
HospitalConsultationReport
inReportID: int IDENTITY (FK)
hospConReportID: int NOT NULL
doctorPersonRoleID: int NOT NULL
doctorLU: text NULL
dictationDate: datetime NULL
diagnosis: text NULL
findings: text NULL
procedures: text NULL
InReportHospital
inReportID: int NOT NULL (FK)
hospitalOrgRoleID: int NULL
hospitalLU: text NULL
admissionDate: datetime NULL
dischargeDate: datetime NULL
dischargeNote: text NULL
InReportOther
occupReportTypeCode: int NULL
servProvOrgRoleID: int NULL
servProvLU: text NULL
therapistPersonRoleID: int NULL
therapistLU: text NULL
inReportID: int NOT NULL (FK)
findings: text NULL
recs: text NULL
HospitalOtherReport
inReportID: int IDENTITY (FK)
hospitalOtherReportID: int NOT NULL
reportDate: datetime NULL
hosOthReportTypeCode: int NULL
source: varchar(50) NULL
findings: text NULL
procedures: text NULL
comments: text NULL
HospitalImagingReport
inReportID: int IDENTITY (FK)
hospImgReportID: int NOT NULL
reportDate: datetime NULL
proceduresCode: int NULL
findings: text NULL
opinions: text NULL
8
What is the context?
9
Types of data models
• Data can be described with different perspectives: Object- Role Models, Entity-Relationship Diagrams(ERDs), Data Flow Diagrams ( DFDs), UML Class Diagrams etc.
• Entity-Relationship (ER) Diagrams most popular for data modeling as they can easily be converted into relational database designs.
InReportHospital
inReportID: int NOT NULL (FK)
hospitalOrgRoleID: int NULL
hospitalLU: text NULL
admissionDate: datetime NULL
dischargeDate: datetime NULL
dischargeNote: text NULL
HospitalConsultationReport
inReportID: int IDENTITY (FK)
hospConReportID: int NOT NULL
doctorPersonRoleID: int NOT NULL
doctorLU: text NULL
dictationDate: datetime NULL
diagnosis: text NULL
findings: text NULL
procedures: text NULL
HospitalOtherReport
inReportID: int IDENTITY (FK)
hospitalOtherReportID: int NOT NULL
reportDate: datetime NULL
hosOthReportTypeCode: int NULL
source: varchar(50) NULL
findings: text NULL
procedures: text NULL
comments: text NULL
HospitalImagingReport
inReportID: int IDENTITY (FK)
hospImgReportID: int NOT NULL
reportDate: datetime NULL
proceduresCode: int NULL
findings: text NULL
opinions: text NULL
GlasgowComaScale
gcsID: int IDENTITY
inReportID: int NOT NULL (FK)
time: datetime NULL
eyes: tinyint NULL
verbal: tinyint NULL
motor: tinyint NULL
total: tinyint NULL
InReportAmb
inReportID: int NOT NULL (FK)
scene: varchar(50) NULL
sceneTime: datetime NULL
destination: varchar(50) NULL
destinationTime: datetime NULL
complaint: varchar(100) NULL
injuryMech: text NULL
history: text NULL
medications: text NULL
allergies: text NULL
consciousness: varchar(50) NULL
airwayControlCode: int NULL
note: text NULL
InReportPsychTest
inReportID: int NOT NULL (FK)
source: varchar(50) NULL
assessmentDate: varchar(30) NULL
conclusions: text NULL
summary: text NULL
recs: text NULL
NeuroPsychTest
neuroPsychTestID: int IDENTITY
inReportID: int NOT NULL (FK)
test: varchar(75) NULL
result: varchar(75) NULL
note: text NULL
10
Types of ERD – domain model
• Domain Model(Subject Area Model): A very high level (10,000 feet) conceptual model showing the major entities and their relationships in a business or problem domain
• Only entities are shown
11
Scope of domain models
• Business Domain Models or Business Subject Area Models – Very high level covering entire business
• Application Domain Models or Application Subject Area Models – covering an application/package.
12
Types of ERD – logical models
• Logical Models: Showing entities and their logical relationships for a given information system.
TOTAL LOSS REQUEST RECORD
Claim File Id (FK)
RequestNumber
ActualMileageFlag
CommentsNotOnValuation
CommentsOnValuation
Condition
Equipment
MarketValue
Other
OtherAdj
OtherDesc
Packages
RequestUploadFlag
SalvageType
SearchDays
SearchExtent
TransferFee
ValuationLevel
ValuationStatus
Create Date
ESTIMATE DAIS CHUNKS
Claim File Id (FK)
Sequence Number
DAIS Data
VEHICLE REPAIR LOG
Vehicle Repair Log Claim File Id (FK)
Vehicle Repair Log Secondary Id
Vehicle Repair Log Logon Id
Vehicle Repair Log TimeStamp
Vehicle Repair Log Car In Date
Vehicle Repair Log Car In Time
Vehicle Repair Log Customer Contact Date
Vehicle Repair Log Customer Contact Time
Vehicle Repair Log Car Out Date
Vehicle Repair Log Car Out Time
Vehicle Repair Log Exclude Flag
VRL_PVRT_NUM_DAYS
ESTIMATE PRINT IMAGE LINE
Claim File Id (FK)
EstimateID (FK)
Estimate Print Line Number
Estimate Print Line Text
CLAIM FILE
Claim File Id
ICBC Claim Number
ICBC Form Id
ClaimStatus
ControlLogNumber
EstimateCount
Creation Date
Creation Time
LastNet
PrimaryImpactPoint
SecondaryImpactPoint
Entered Car Model Year
Entered Car Model VIN
ADPHostControlLogNumber
DeviceAssetNumber
PenPro Claim Number
AcctControlNo
Adjuster Resource Name
Adjuster Resource Number
LossSecondPayee
LossPayee
LossType
LossDate
PolicyNumber
Insured Name
Claim Centre Number
Claim Centre Name
CLF_DAIS_NUM_BYTES
CLF_DAIS_NUM_ROWS
Claim Number Check Digit
Exposure Code
Kind Of Loss Code
Person Organization Id
Licence Series Year
Declared Value
Gross Vehicle Weight
CLAIM FILE ESTIMATE GROUP
Claim File Id (FK)
Claim Program Type
Estimating Business Facility Number
Maximum Estimate Id
Current Status
Last Status Change Timestamp
Stale Claim Flag
BF Logical Supplement Count
13
Types of ERD-physical models
• Physical Models: The model showing the physical implementation of logical model at data storage level.
• Contains columns for implementing relationships and fast data access.
• Most tools can create
schema scripts from
physical models.
AUTOSOURCE_REQUEST
ASR_CLF_ID: DECIMAL(15,0) NOT NULL (FK)
ASR_REQ_ID: SMALLINT NOT NULL
ASR_ADXE_CREATE_ID: VARCHAR2(35) NOT NULL
ASR_EST_ID: SMALLINT NOT NULL
ASR_PRODUCT_TYP: CHAR(1) NOT NULL
ASR_DEVICE_NME: VARCHAR2(10) NOT NULL
ASR_SEARCH_DAYS: VARCHAR2(30) NOT NULL
ASR_SEARCH_PROV_CD: VARCHAR2(30) NOT NULL
ASR_SEARCH_PROV: VARCHAR2(30) NOT NULL
ASR_SEARCH_POSTAL: VARCHAR2(30) NOT NULL
ASR_SEARCH_CITY: VARCHAR2(30) NOT NULL
ASR_ASHOST_REQ_NUM: CHAR(8) NOT NULL
ASR_CURRENT_STAT: CHAR(18) NOT NULL
ASR_ADJ_POLARITY: CHAR(6) NOT NULL
ASR_ADJ_VALUE: DEC(8,0) NOT NULL
ASR_ADJ_DESC: VARCHAR2(30) NOT NULL
ASR_TITLE_FEE: DEC(4,0) NOT NULL
ASR_TRANSFER_FEE: DEC(4,0) NOT NULL
ASR_SALVAGE_TYP: SMALLINT NOT NULL
ASR_PUB_COMMENT: VARCHAR2(1000) NOT NULL
ASR_PRIV_COMMENT: VARCHAR2(1000) NOT NULL
ASR_RECEIVED_DTE: DATE NULL
AS_REQ_CONDITION
ASRC_CLF_ID: DECIMAL(15,0) NOT NULL (FK)
ASRC_REQ_ID: SMALLINT NOT NULL (FK)
ASRC_SEQ_NUM: SMALLINT NOT NULL
ASRC_COMPONENT: VARCHAR2(72) NOT NULL
ASRC_COND_TYP: CHAR(1) NULL
ASRC_CNDTYP_RATING: CHAR(18) NOT NULL
ASRC_COND_RATE: SMALLINT NOT NULL
ASRC_COND_DATE: DATE NULL
ASRC_COND_VALUE: DECIMAL(6,0) NOT NULL
ASRC_COND_NAME: VARCHAR2(30) NOT NULL
ASRC_COND_NOTES: VARCHAR2(30) NULL
CLAIM_FILE
CLF_ID: DECIMAL(15,0) NOT NULL
CLF_ICBC_CLM_NUM: CHAR(7) NOT NULL
CLF_ICBC_FORM_ID: CHAR(1) NOT NULL
CLF_CLM_STAT: SMALLINT NOT NULL
CLF_CNTL_LOG_NUM: CHAR(25) NOT NULL
CLF_EST_CNT: SMALLINT NOT NULL
CLF_SCHED_DTE: DATE NULL
CLF_SCHED_TME: DATE NULL
CLF_LAST_NET: DECIMAL(8,2) NOT NULL
CLF_PRIM_IMP_PNT: SMALLINT NOT NULL
CLF_SEC_IMP_PNT: SMALLINT NOT NULL
CLF_SCHED_YEAR: SMALLINT NOT NULL
CLF_SCHED_VIN: CHAR(20) NOT NULL
CLF_ADPH_CNTL_NUM: CHAR(7) NOT NULL
CLF_DEV_ASSET_NUM: CHAR(10) NOT NULL
CLF_PENPRO_CLM_NUM: CHAR(25) NOT NULL
CLF_ACCT_CNTL_NUM: CHAR(17) NOT NULL
CLF_ADJ_RSRC_NME: CHAR(35) NOT NULL
CLF_ADJ_RSRC_NUM: CHAR(5) NOT NULL
CLF_LOSS_SECND_PAY: CHAR(30) NOT NULL
CLF_LOSS_PAYEE: CHAR(30) NOT NULL
CLF_LOSS_TYP: SMALLINT NOT NULL
CLF_LOSS_DTE: DATE NULL
CLF_PLCY_NUM: CHAR(12) NOT NULL
CLF_INS_NME: CHAR(27) NOT NULL
CLF_CLM_CNTR_NUM: CHAR(3) NOT NULL
CLF_CLM_CNTR_NME: CHAR(30) NOT NULL
CLF_DAIS_NUM_BYTES: INTEGER NOT NULL
CLF_DAIS_NUM_ROWS: SMALLINT NOT NULL
CLF_CLM_NUM_CD: CHAR(1) NOT NULL
CLF_EXP_CDE: CHAR(1) NOT NULL
CLF_KOL_CDE: CHAR(2) NOT NULL
CLF_PO_ID: DECIMAL(15,0) NOT NULL
CLF_LIC_SER_YEAR: CHAR(1) NOT NULL
CLF_DEC_VALUE: DECIMAL(7,0) NOT NULL
CLF_GR_VEH_WT: CHAR(6) NOT NULL
CLF_PR_ID: DECIMAL(15,0) NULL
CLF_AQT_CDE: CHAR(3) NOT NULL
CLF_MIN_NO_DAM_TYP: CHAR(2) NOT NULL
CLF_EST_REM_CRC: INTEGER NOT NULL
CLF_EST_REM_CH_FLG: CHAR(1) NOT NULL
CLF_PURGE_FLG: CHAR(1) NOT NULL
CLF_PURGE_DTE: DATE NULL
VEHICLE_REPAIR_LOG
VRL_CLF_ID: DECIMAL(15,0) NOT NULL (FK)
VRL_SEC_ID: SMALLINT NOT NULL
VRL_LOGON_ID: CHAR(8) NOT NULL
VRL_TMESTMP: TIMESTAMP NOT NULL
VRL_CAR_IN_DTE: DATE NOT NULL
VRL_CAR_IN_TME: DATE NOT NULL
VRL_CUST_CNTCT_DTE: DATE NULL
VRL_CUST_CNTCT_TME: DATE NULL
VRL_CAR_OUT_DTE: DATE NULL
VRL_CAR_OUT_TME: DATE NULL
VRL_EXCLUDE_FLG: CHAR(1) NOT NULL
VRL_PVRT_NUM_DAYS: SMALLINT NULL
14
Semantics of data models
• Data models use graphical notations and text strings called ‘Verb Phrases’.
• The semantics of notations depends upon the modeling technique followed and the tool being used.
15
Entities
• A Thing of significance for business for which data has to be stored and manipulated.
• Nouns representing Objects, Events, Concepts, Relationships, Actions…..
• In data models represented as rectangles.
• Examples: Insurance policy, Claim, Vehicle, Event etc.
16
Entity sub-types
• Some entities have many subtypes
• PERSON and ORGANIZATION entities are sub types of PARTY entity
• FULL TIME EMPLOYEE and CONTRACT EMPLOYEE are sub types of EMPLOYEE entity
• They are depicted as contained in main entity or as child of main entity
Employee Full Time Contract
Party
Person Organization
17
Attributes
• The properties of Entities for which data has to be collected and stored.
• Attributes are represented as text strings contained inside the entities in data models.
• Example- Policy holder`s name, event date, claim amount etc
18
Relationships
• Relationships represent how entities interact and create, use, modify or delete each other.
• They are represented by different types of lines going from one entity to another.
---------------- ________ ------------- _________ ________
19
Cardinality of relationship
• Cardinality of relationship is number of instances of entities at the two ends of relationships.
• It is represented by 3 domain values – Zero, One or Many
• It may be shown as a circle, a vertical line and a crow feet at the end of relationship lines or some other symbol.
• Sometimes it is represented as ‘0’, ‘1’ or ‘n’ on relationship lines.
Policy Claim
Product Line Item ..1.. 0…n
20
Optionality of relationships
• Optionality of relationship means whether the entity ‘may be present’ or ‘must be present’ in the relationship.
• It may be represented as ‘solid line’ or ‘broken line’ part in the relationship ( or some other way)
Policy Claim ---------- _____
21
Self Referencing Relationships
22
Verb phrases
• Verb Phrases describe relationship between two entities going from one entity to another in both directions.
Claim Policy Holder
Paid to
Makes
organization Employee Employs
Works for
23
Keys
• Keys are for navigating through data: information retrieval
• Primary Keys: A primary key is a group of attributes that uniquely identifies an entity instance. Every entity has exactly one primary key
• Foreign Keys: Navigating to attribute of an entity from another entity. FK attributes implement relationships and are owned by parent entities.
24
Relationships- identifying vs. non-identifying
• The parent entity is needed to identify the child entity.
25
Domains
• A named set of data values all of the same data type, upon which the actual value for an attribute instance is drawn.
• Every attribute must be defined on exactly one underlying domain. Multiple attributes may be based on the same underlying domain.
• Example of domain – – Gender- M, F
– Province -Varchar(2) – BC, AB, ON, NF, QC, MN, SC, YU
– Short Description- Varchar(40)
– Long Description – Varchar(2000)
– Unique Identifier – Integer(9)
26
Cost of wrong domains
• NASA spacecraft Mars Climate Orbiter crashed on mars surface in
1998. The spacecraft was using domain with USMB units(pound force seconds ) whereas the control center was using domain
based on SI units(newton seconds). Total cost - $327.6 million
• European Ariane 5 expendable launch system blast occurred 37 seconds after launch in 1996- Wrong use of domain(Integer vs Float) caused integer overflow - Total cost - $8 Billion
27
Types of notations
• Different types of semantic notations are available for ER diagramming
– Chen Notations
– IDEF1X
– Information Engineering
– Barker Notations
28
Types of notations-IDEF1X
.
Independent Entities
Dependent Entities
Many-to-Many
Zero-One or Many
------------
------------
------------
Optional
Category Complete Category In-Complete
Identifying – Solid lines Non-Identifying- Dashed lines
P
Mandatory
Z
Discriminator
Attributes
29
Types of notations-IDEF1X
• Supported by most of the available tools.
• More geared towards developing physical database design
• Needs combination of notations to capture rules.
• These combinations not easily understood by business people- difficult to use in JAD sessions.
30
IDEF1X model
31
Types of notations – Information Engineering(IE)
Entities
One to Many
Identifying
Many to Many
Non-Identifying
-----------------
Zero-or-One
---------------
One and only One
Sub Type Sub Type
Super Type
Exclusive OR in Finkelstein
Attributes Attributes
Sub Type Sub Type
Zero-One or Many
32
Types of notations-Information Engineering ( IE)
• Two variations - Clive Finkelstein and James Martin
• Different tools implement different variations of the notations.
• In the original version, attributes not shown on the entities but in a separate document like Martins` Bubble Chart
• Supported by most of the available modeling tools.
• Easy to understand notations
• Suitable for JAD sessions.
33
IE model
34
Types of notations- Barker
.
_____ --------
Zero or More
-------- ____
Zero or One
Exclusive OR
Super Type
Sub Type
One or More
One to One
Solid-Dashed lines for Optionality
Entities
35
Types of notations: Barker
• # before attribute – unique identifier attribute
• Solid circle are for required attributes
• Blank circles for optional attributes
• Sub Types are mutually exclusive
• Sub Types are always complete.
• A line across relationship means the relationship is identifying.
36
Types of notations- Barker
• Developed by Richard Barker in UK in 1986.
• Adopted by Oracle for its case methodology.
• Simple and easily understood by business people.
• Not supported by all tools.
37
Barker model
38
39
Reading business rules
Each <Entity 1>
{may be | must be } Optionality
<relationship> Verb Phrase
{zero |only one | one or more} Cardinality
<Entity 2>
An EMPLOYEE A DEPARTMENT
must be may be
staff of composed of
only one one or more
DEPARTMENT EMPLOYEE
40
Reading business rules
• A CLAIM FILE may contain Zero, One or More TOTAL LOSS REQUEST RECORD
• A TOTAL LOSS REQUEST RECORD must be on only one CLAIM FILE
41
Reading business rules
• A CLAIM FILE may have vehicle detail in zero one or more VEHICLE RECORD
• A VEHICLE RECORD must be (..?..) one and only one CLAIM FILE
42
Reading a data model
• Find out what notations are being used.
• Get a chart of the notations giving graphical representations and their descriptions.
• Look at the important entities in the model – entities which are center of many relationships.
• Look at the definition of the entity. The definition should convey the role entity plays in business.
• Following relationship lines and reading verb phrases, move from one entity to another.
• Note the relationships implemented in the model.
• Note the cardinality and optionality rules.
• Read the business rule implemented for the entities.
43
Let us read a data model
44
Reading a data model-gleaning the business rules
• It is an attributed logical model. • It is using Information Engineering (IE) notations. • A PARTY may place Zero, One or Many PURCHASE ORDER
• A PURCHASE ORDER must be received from only one PARTY. • A PARTY must be of either PERSON or ORGANIZATION type. • A PURCHASE ORDER may contain Zero, One or Many LINE ITEM. • A LINE ITEM must be placed on only one PURCHASE ORDER. • A PRODUCT may be on Zero, One or More LINE ITEM • A LINE ITEM must shows only one PRODUCT. • A PRODUCT may be of SOURCED PRODUCT or SERVICE Type • Party Identifier is key identifier for PARTY. • Product Identifier is key identifier for PRODUCT.
45
Reading a data model-gleaning the business rules
• Purchase Order Number combined
with PARTY Identifier is Primary
identifier for PURCHASE ORDER
• Line Item Number, Product Identifier,
Party Identifier and Purchase Order
Number combined is Primary identifier
for LINE ITEM
• Surname is attribute of
PERSON only
• Business Number is attribute of ORGANIZATION only.
• Sourced From is attribute of SOURCED PRODUCT only
• Cost Amount is attribute of SOURCED PRODUCT only.
• Service Location is attribute of SERVICE only.
• Rate Per Hour is attribute of SERVICE only.
46
Reading a data model- deriving real value
• Very important exercise for flushing out hidden and missing business rules- minimize ‘later day change requests’.
• Value is in critical examination of business rules. – A PURCHASE ORDER must be received from only one PARTY :
• Can a party transfer its purchase order to another party?
• What if a party is dissolved, merged or acquired by another party after placing a purchase order? Do we need to know about original party?
• Can two parties place a combined order to obtain volume discount?
– Business Number is attribute of ORGANIZATION only,
• There are individuals who are incorporated and have a business number. Should we capture their business number?
– A PRODUCT may be of SOURCED PRODUCT or SERVICE Type • What about sourced products requiring installation service and support? Should we
invoice service on a separate purchase order
47
Avoiding high cost of change
.
48
Data models – maximizing ROI.
• Make data modeling mandatory part of development life cycle.
• Standardize on use of data modeling tool so everybody is familiar with its semantics.
• Provide training to users in modeling
tool and its semantics.
• Capture additional business rules
in separate documents for their
completeness.
• Keep data models up to date.
49
Further readings
• Help section of the data modeling tools: most of the tools come with good support documentations on modeling methodology and notations.
– Data Model Patterns: Convention of Thought by David C. Hay
– Data Modeling Made Simple: A Practical Guide for Business and IT Professionals by Steve Hoberman
– Data Modeling for the Business: A Handbook for Aligning the Business with IT using High-Level Data Models (Take It with You Guides) - By Steve Hoberman, Donna Burbank, Chris Bradley
50
Thank you for joining
51