data-ed webinar: design & manage data structures

58
Data structures enable you to store and organize data so that it can be used efficiently. But how do you know to apply the correct one? There is a difference between structuring master data, reference data and analytics data. This webinar will discuss the various data structures available and when to use each one. We will show how data structures should support your organizational strategy and how each method can contribute to business value. Learning Objectives: Application of correct data structures to fit business needs How different structures create different business value Date: July 8, 2014 Time: 2:00 PM ET Presented by: Dave Marsh & Peter Aiken Copyright 2013 by Data Blueprint Welcome: Design/Manage Data Structures 1

Upload: dataversity

Post on 15-Jul-2015

675 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Data-Ed Webinar: Design & Manage Data Structures

Data structures enable you to store and organizedata so that it can be used efficiently. But how doyou know to apply the correct one? There is adifference between structuring master data,reference data and analytics data. This webinar will discuss the various data structures available and when to use each one. We will show how data structures should support your organizationalstrategy and how each method can contribute tobusiness value. Learning Objectives: • Application of correct data structures to fit business needs • How different structures create different business value Date: July 8, 2014Time: 2:00 PM ETPresented by: Dave Marsh & Peter Aiken

Copyright 2013 by Data Blueprint

Welcome: Design/Manage Data Structures

1

Page 2: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Get Social With Us!

Like Us on Facebook www.facebook.com/

datablueprint Post questions and comments Find industry news, insightful

content and event updates.

Join the Group Data Management &

Business Intelligence Ask questions, gain insights and collaborate with fellow

data management professionals

Live Twitter Feed Join the conversation! Follow us: @datablueprint @paiken Ask questions and submit your comments: #dataed

3

Page 3: Data-Ed Webinar: Design & Manage Data Structures

Presented by Dave Marsh & Peter Aiken, Ph.D.

Design & Manage Data Structures

Marco Level

Page 4: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Your PresentersDave Marsh • Lead Data

Consultant, Data Blueprint

• 30+ Years experience designing and building solutions for the private and public sectors.

• Architecture/Design experience in: - Transactional processing - Shop floor automation - Data Warehousing - Identity Management - Mobile

Peter Aiken • 30+ years DM

experience • 9 books/many articles • Experienced with 500+ data

management practices • Multi-year immersions: US

DoD, Nokia, Deutsche Bank, Wells Fargo, & Commonwealth of VA

4

Page 5: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline: Design/Manage Data Structures

6

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Page 6: Data-Ed Webinar: Design & Manage Data Structures

Maslow's Hierarchy of Needs

Copyright 2013 by Data Blueprint 7

Page 7: Data-Ed Webinar: Design & Manage Data Structures

You can accomplish Advanced Data Practices without becoming proficient in the Basic Data Management Practices however this will: • Take longer • Cost more • Deliver less • Present

greaterrisk

Data Management Practices Hierarchy

Basic Data Management Practices

Advanced Data

Practices • MDM • Mining • Big Data • Analytics • Warehousing • SOA

Data Program Management

Data Stewardship Data Development

Data Support Operations

Organizational Data Integration

Copyright 2013 by Data Blueprint 8

Page 8: Data-Ed Webinar: Design & Manage Data Structures

Data Program Coordination

Feedback

DataDevelopment

Copyright 2013 by Data Blueprint

StandardData

Data Management is an Integrated System of Five Practice AreasOrganizational Strategies

Goals

BusinessData

Business Value

Application Models & Designs

Implementation

Direction

Guidance

9

OrganizationalData Integration

DataStewardship

Data SupportOperations

Data Asset Use

Integrated Models

Leverage data in organizational activities

Data management processes and infrastructure

Combining multiple assets to produce extra value

Organizational-entity subject area data

integration

Provide reliable data access

Achieve sharing of data within a business area

Page 9: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Five Integrated DM Practice Areas

10

Manage data coherently.

Share data across boundaries.

Assign responsibilities for data.Engineer data delivery systems.

Maintain data availability.

Data Program Coordination

DataDevelopment

Organizational Data Integration

DataStewardship

Data SupportOperations

Page 10: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

DAMA DM BoK & CDMP Data Management Functions

• Published by DAMA International – The professional association for Data

Managers (40 chapters worldwide) – DMBoK organized around – Primary data management functions focused

around data delivery to the organization (more at dama.org)

– Organized around several environmental elements

• CDMP – Certified Data Management Professional – DAMA International and ICCP – Membership in a distinct group made up of

your fellow professionals – Recognition for your specialized knowledge in

a choice of 17 specialty areas – Series of 3 exams – For more information, please visit:

• http://www.dama.org/i4a/pages/index.cfm?pageid=3399 • http://iccp.org/certification/designations/cdmp

11

Page 11: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

13

Page 12: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

What is a data structure?• "An organization of information, usually in memory, for better

algorithm efficiency, such as queue, stack, linked list, heap, dictionary, and tree, or conceptual unity, such as the name and address of a person. It may include redundant information, such as length of the list or number of nodes in a subtree."

• Some data structure characteristics – Grammar for data objects

• Grammar is the principles or rules of an art, science, or technique "a grammar of the theater"

– Constraints for data objects

– Sequential order – Uniqueness – Arrangement

• Hierarchical, relational, network, other

– Balance – Optimality

http://www.nist.gov/dads/HTML/datastructur.html

14

Page 13: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

How are data structures expressed as architectures?• Details are

organized into larger components

• Larger components are organized into models

• Models are organized into architectures

A B

C D

A B

C D

A

D

C

B

15

Page 14: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

How are data structures expressed as architectures?• Attributes are organized into

entities/objects – Attributes are characteristics of "things" – Entitles/objects are "things" whose

information is managed in support of strategy – Examples

• Entities/objects are organized into models – Combinations of attributes and entities are

structured to represent information requirements – Poorly structured data, constrains organizational information delivery

capabilities – Examples

• Models are organized into architectures – When building new systems, architectures are used to plan development – More often, data managers do not know what existing architectures are and -

therefore - cannot make use of them in support of strategy implementation – Why no examples?

16

Page 15: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Data

Data

Information

Fact Meaning

Request

A Model Specifying Relationships Among Important Terms

[Built on definition by Dan Appleton 1983]

Intelligence

Strategic Use

1. Each FACT combines with one or more MEANINGS. 2. Each specific FACT and MEANING combination is referred to as a DATUM. 3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST 4. INFORMATION REUSE is enabled when one FACT is combined with more than one

MEANING. 5. INTELLIGENCE is INFORMATION associated with its USES.

Wisdom & knowledge are often used synonymously

Data

Data

Data Data

17

Page 16: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

19

Page 17: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

History (such as it is)• Automate existing manual

processing • Data management was:

– Running millions of punched cards through banks of sorting, collating & tabulating machines

– Results printed on paper or punched onto more cards

– Data management meant physically storing and hauling around punched cards

• Tasks (check signing, calculating, and machine control) were implemented to provide automated support for departmental-based processing

• Creating information silos • Data Processing Manager

20

Page 18: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Chief Information Officer

21

Page 19: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

CFO Necessary Prerequisites/Qualifications• CPA

• CMA

• Masters of Accountancy

• Other recognized degrees/certifications

• These are necessary but insufficient prerequisites/qualifications

22

Page 20: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

CIO Qualifications• No specific qualifications • Typically technological fields:

– Computer science

– Software engineering

– Information systems

• Business – Master of Business Administration

– Master of Science in Management

• Business acumen and strategic perspectives have taken precedence over technical skills. – CIOs appointed from the business side of the organization

• Especially if they have project management skills.

23

Page 21: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

What do we teach knowledge workers about data?

What percentage of the deal with it daily?

24

Page 22: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

26

Page 23: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data LeverageLess R

OT

Technologies

Process

People

• Permits organizations to better manage their sole non-depleteable, non-degrading, durable, strategic asset - data – within the organization, and – with organizational data exchange partners

• Leverage – Obtained by implementation of data-centric technologies, processes, and

human skill sets – Increased by elimination of data ROT (redundant, obsolete, or trivial)

• The bigger the organization, the greater potential leverage exists

• Treating data more asset-like simultaneously 1. lowers organizational IT costs and 2. increases organizational knowledge worker productivity

27

Page 24: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Structure Questions

Program F

Program E

Program DProgram G

Program H

Program I

Applicationdomain 2Application

domain 3

• Who makes decisions about the range and scope of common data usage?

28

Page 25: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Running Query

29

Page 26: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Optimized Query

30

Page 27: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Repeat 100s, thousands, millions of times ...

31

Page 28: Data-Ed Webinar: Design & Manage Data Structures

Death by 1000 Cuts

Copyright 2013 by Data Blueprint 32

Page 29: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

5 Basic Data Structures

Indexed Sequential File: Built-in index permits location of records of persons with last names starting with "T"

Index

Program: Where is the record for person "Townsend?"

Index: Start looking here where the "Ts" are stored

Relational Database: Records are related to each other using relationships describable using relational algebra

Flat File: Records are typically sorted according to some criteria and must be searched from the beginning for each access

Program: Must start at the beginning and read each record when looking for

person "Townsend?"

Network Database: Records are related to each other using arranged master records associated with multiple detail records using linked lists and pointers Associative

Concept-oriented Multi-dimensional

XML database 3NF

Star schema Data Vault

Hierarchical Database: Records are related to each other hierarchically using 'parent child' relationships

33

Page 30: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data structures organized into an Architecture• How do data structures support

organizational strategy? • Consider the opposite question?

– Were your systems explicitly designed to be integrated or otherwise work together?

– If not, then what is the likelihood that they will work well together?

– In all likelihood your organization is spending between 20-40% of its IT budget compensating for poor data structure integration

– They cannot be helpful as long as their structure is unknown

• Two answers/two separate strategies – Achieving efficiency and

effectiveness goals – Providing organizational dexterity for rapid

implementation

34

Page 31: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Single Data Store

No Single Data Store• The thought of a single monolithic data

store which can service all of an organization’s information needs has long since been abandoned. In the modern data management topology, multiple data stores are created to service specific processing needs and user groups within the organization.

• Implications: • The needs characteristics of the multitude

of the audiences served by the data structures

• Data lifecycle • The design styles (old and new) utilized to

organize the data to service the audiences • A breakdown of the various stores • The resultant store characteristics

35

Page 32: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Conclusions• 1 is not enough • Most

organizations have far to many different data structures and they become barriers to progress and integration

• Not much expertise to figure out these challenges

36

Page 33: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

38

Page 34: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Personas (The Requirements)

Operational Performer

Interested in alerts, notifications and

reporting based on current values (real-

time) data. They use the information to make

decisions and changes in the transactional

systems. These changes are targeted to

improve the organizations ability to

deliver in the short term.

Operational Analyst (Manager)

Interested in aggregated real-time data for their

domain of responsibility. The data is displayed

using visualization techniques of

scorecards, charts and reports, preferably within a single dashboard. The

searching is for favorable/unfavorable

trends to indicate adjustments are needed in the staff & resource

allocations.

Data Analyst Responsible to support detailed and typically

complex analysis requests from business

users/consumers of data. The analyst role

span both the operational and

historical time windows and thus they need to be

versed in both the operational and analytic

environments.

Data Miner/Scientist

Responsible for using statistical and machine learning techniques to identify patterns from

the data. These patterns are correlated into

insights and actions for better business

outcomes. The miner may use operational and historical data for

research.

Executive Consumer Receives the data through summary

dashboards with drill down/through

capabilities. Request detailed analysis and

reporting on High Value Question from the Data

Analyst and Data Miners. These

consumers are looking at the data to make short and long term

decisions to improve the organizational efficiency

and customer experience.

Operational Analytic

39

Page 35: Data-Ed Webinar: Design & Manage Data Structures

• Operational interest is high when data is introduced to the operational stores. This interest wanes over time.

• Analytic interest is low when data is first introduced. The interest increases as the data is collected and combined with other enterprise data.

Copyright 2013 by Data Blueprint

Persona Data Interest

Operational Interest

Analytic Interest

Interest

Time

40

Page 36: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

42

Page 37: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Topology Today

43

Page 38: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Store Purpose a review of the Data Topology• Master Data

– Master Data is the term used to describe the data domains that drive business activities. Master data is the data that must first be in place before business transactions can occur. Master data is often shared across the organizational business units and it is typically at the center of business strategies. The transaction defines the business/process event (order, dispatch, sales) while the Master Data describes the ‘who’ (customers, drivers, account reps), the ‘what’ (load), the ‘when’ (date, time) and the ‘where’ (origin and destination location).

• Online Transaction Processing (OLTP) – “Transactional data” is the term used to describe the data involved in

the execution of the business activities. Transactional data associates master data (i.e. customers and products) to a business activity that often represents a unit or work, such as the creation of an order.

• The Master Data and OLTP stores are where data is initially created and persisted within the organization’s data and thus carry a special classification of System of Record (SOR). They are created to capture the transactional data as it arrives and makes the data available for the processes and services. The data arrives into these databases through manual entry or automated feeds. These data stores are logically (and sometimes physically) separated by the transactional subject area they are created to serve.

OLTP 1

OLTP 2

OLTP n...

MasterData

44

Page 39: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Store Purpose a review of the Data Topology• Operational Data Store (ODS)

– An Operational Data Store (ODS) is created to integrate data from two or more SORs for the purposes of data integration. The ODS is normally created to satisfy reporting needs across functional SOR boundaries. The ODS should hold very little historical information and should focus on maintaining the most up-to-date data needed by the organization for daily operations. Depending on the application requirements, the ODS may institute a near real-time data feed from the source applications. The ODS is expected to be technically accurate and is considered to be an Authoritative Source. The data it contains can be used for non-critical needs instead of having to access the SOR. The more frequently the data is pushed into the ODS environment, the less reliance there will be on direct access to SORs for data reporting needs.

• Enterprise Data Warehouse (EDW) – An Enterprise Data Warehouse (EDW) is responsible for collection

and integration of data from either SORs or from the Operational Data Store. An EDW has an enterprise scope as it will pull from many (if not all) SORs. The focus of the data warehouse is to be historical in nature and in many instances is loaded with a latency (every 24 hours). The data warehouse is created to support historical analytics. The expectation of the data warehouse is to be exhaustive in the data it collects with a focus being on collecting and storing of the data.

Enterprise Data Warehouse

(EDW)

Operational Data Store

(ODS)

45

Page 40: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Store Purpose a review of the Data Topology

• Data Marts – A Data Mart is a subset of a data warehouse, it

is created to address specific questions and/or subject area of questions. A Data Mart is built and tuned to deliver the data to the end users, it exists to get the data out from the data warehouse.

Data Mart

46

Page 41: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Store Purpose a review of the Data Topology• Event Data Store

– Is the data store which logs, stores and reports the discrete business and technical events which occur within the process. This data store is a critical, and often overlooked data domain for managing, controlling and creating transparency into the business processes. The events are used to report out the overall health of the processes in both business and technical terms. This consolidated solution is key to obtaining a 360 view of the processes.

• Metadata Store – Metadata is a broad term which includes descriptive

elements in both business and technical terms. It covers: business terms, data elements descriptions, element display formats, element valid values, element quality targets, etc. Metadata is critical to an organization as it describes the organization’s business and processing infrastructure in detail. Metadata is entertainingly defined as “data about the data”. That is, Metadata characterizes other data and makes it easier to retrieve, interpret and use information.

Technical Metadata

Metadata StoreBusiness

Metadata

Event Data Store

Bus OPS Events

Tech OPS Events

47

Page 42: Data-Ed Webinar: Design & Manage Data Structures

Operational i n co n tr a st

w i th

AnalyticSubject-Oriented

Databases which are focused on a single or small set of business

functions

Integrated Collecting and semantically aligning data from disparate sources to achieve a homogeneous viewVolatile

Data which may change frequently

Non-Volatile Data for which entered into the database will not change

Atomic Low grain data, each transaction,

each order with all of the attributes

Aggregate A summary of multiple orders or transactions performed to transform the atomic detail into more comprehensible information

Current Valued: The data and the system represents what is

current in this moment; not yesterday, not last week --- now

Time Variant Data: is marked and stored with a date/time element where questions of what was it yesterday and last week can be answered

Copyright 2013 by Data Blueprint

Data Store Characteristics

48

Page 43: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline

50

Page 44: Data-Ed Webinar: Design & Manage Data Structures

• 3rd Normal Form (3NF) – Inmon

• Dimensional – Kimball

• Data Vault – Lindstad

Copyright 2013 by Data Blueprint

Data Structure Design Styles

51

Page 45: Data-Ed Webinar: Design & Manage Data Structures

• 3rd Normal Form Modeling • A mathematical data design

technique founded in the early 70s by E.F. Codd.

• Organizes data in simple rows and columns - Entities

• Creates connections between the entities called relationships to show how the data is inter-related

• It is purest form 3NF removes all data redundancies – a piece of data is stored only once

• 3NF is based on mathematics, give the same facts to different modelers; the model should be the same.

• Creates a visual (Entity Relation Diagram - ERD) which may be understood by less technical personnel

• 3NF is the modeling style most popularly used for operationally focused data stores.

Copyright 2013 by Data Blueprint

Design Styles – 3NF

52

Page 46: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Design Styles – Dimensional Modeling• A data design approach create and

refined by Ralph Kimball in the 80s • Organizes data in Facts and

Dimensions – Fact tables record the events (what)

within the business domain

– Dimension tables describing who, when, how and where

• Created to exploit the capabilities of the relational database to retrieve and report against large volumes of data.

• There are 2 variations to Dimensional Modeling: – Star Schema – Snowflake

53

Page 47: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Design Styles – Data Vault

• Newest of the relational database modeling techniques. • Conceived in the 1990s by Dan Linstedt • Focuses on linking the data from multiple disparate

locations without forcing the data to be semantically aligned

NOTE: There is a Data Ed presentation schedule for 14 October 2014 to cover the details of Data Vault designs!

54

Page 48: Data-Ed Webinar: Design & Manage Data Structures

DATA STORE AUDIENCE SERVED BUILD CHARACTERISTICS DESIGN STYLE

OPERATIONAL

Master Data

OLTP

ODS

Event

ANALYTIC

Data Warehouse

Data Mart

Copyright 2013 by Data Blueprint

Summary/Take AwaysDATA STORE AUDIENCE SERVED BUILD CHARACTERISTICS DESIGN STYLE

OPERATIONAL

Master Data Operations Manager Operational Analyst

Subject Oriented Volatile Atomic

Current Valued

3NF

OLTP Operational Performer Operations Manager

Subject Oriented Volatile Atomic

Current Valued

3NF

ODSOperational Manager Operational Analyst Executive Consumer

Integrated Volatile Atomic

Current Valued

3NF

Event All Personas

Integrated Volatile Atomic

Current Valued

3NF

ANALYTIC

Data Warehouse Data Miner/Scientist

Integrated Non-volatile

Atomic Time Variant

3NF trending to Data Vault

Data MartOperational Analyst

Data Analyst Executive Consumer

Subject Oriented Non-volatile

Atomic -or- Aggregated Time Variant

Dimensional

55

Page 49: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

• Context: Data Management/DAMA/DM BoK/CDMP?

• What is a data structure?

• Structured data storage, a bit of history and context

• Why are data structures important?

• Data Personas/Usage (interest over time)

• Data Topology and alignment to the data audience

• Internal data structures to fit the needs

• Q & A?

Outline: Design/Manage Data Structures

56

Page 50: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Questions to Ask• Are you ready for a data

warehouse?• Foundational Practices• Is the business environment

constantly evolving?• Will you get it right the first time?• Do you have an agreed upon

enterprise-wide vocabulary• Is your data warehouse intended to

be the enterprise audit-able systemof record?

• Extract, Transform and Load• Data Transformations• How fast do you need results?• Performance of inserts vs reads• Project deliverables

57

Page 51: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Upcoming EventsAugust Webinar: Data Management Maturity August 12, 2014 @ 2:00 PM ET/11:00 AM PT !Sign up here: • www.datablueprint.com/webinar-schedule • www.Dataversity.net !!!!!!!!!!!Brought to you by:

58

Page 52: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Questions?

+ =

59

Page 53: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Why Architectural Models?• Would you build a house without an architecture sketch? • Would you like to have an estimate how much your new house is going to cost? • If you hired a set of constructors from all over the world to build your house, would you

like them to have a common language? • Would you like to verify the proposals of the construction team before the work gets

started? • If it was a great house, would you like to build something rather similar again, in

another place? • Would you drill into a wall of your house without a map of the plumbing and electric

lines? • Model is the sketch of the system to be built in a project. • Your model gives you a very good idea of how demanding the implementation work is

going to be! • Model is the common language for the project team. • Models can be reviewed before thousands of hours of implementation work will be

done. • It is possible to implement the system to various platforms using the same model. • Models document the system built in a project. This makes life easier for the support

and maintenance!

Would you build a house without an architecture sketch?

Model is the sketch of the system to be built in a project.

Would you like to have an estimate how much your new house is going to cost?

Your model gives you a very good idea of how demanding the implementation work is going to be!

If you hired a set of constructors from all over the world to build your house, would you like them to have a common language?

Model is the common language for the project team.

Would you like to verify the proposals of the construction team before the work gets started?

Models can be reviewed before thousands of hours of implementation work will be done.

If it was a great house, would you like to build something rather similar again, in another place?

It is possible to implement the system to various platforms using the same model.

Would you drill into a wall of your house without a map of the plumbing and electric lines?

Models document the system built in a project. This makes life easier for the support and maintenance!

60

Page 54: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Inmon Implementation

61

Page 55: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Kimball Implementation

62

Page 56: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Data Vault Implementation

63

Page 57: Data-Ed Webinar: Design & Manage Data Structures

Copyright 2013 by Data Blueprint

Hybrid Approach• (http://www.kimballgroup.com/2004/03/03/differences-of-opinion/) • Learn Data Vault – “dv-in-kimball-bus-architecture”

64

Page 58: Data-Ed Webinar: Design & Manage Data Structures

10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056