data modeling & data integration

26
Data Modeling & Data Integration Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series August 24 th , 2017

Upload: dataversity

Post on 29-Jan-2018

855 views

Category:

Technology


2 download

TRANSCRIPT

Data Modeling & Data IntegrationDonna Burbank

Global Data Strategy Ltd.

Lessons in Data Modeling DATAVERSITY Series

August 24th, 2017

Global Data Strategy, Ltd. 2017

Donna Burbank

Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi-faceted across consulting, product development, product management, brand strategy, marketing, and business leadership.

She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment

of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market.

As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. She was on the review committee for the Object Management Group’s (OMG) Information Management Metamodel (IMM) and the Business Process Modeling Notation (BPMN). Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advices and gains insight on the

latest BI and Analytics software in the market.

She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications. She can be reached [email protected] is based in Boulder, Colorado, USA.

2

Follow on Twitter @donnaburbankToday’s hashtag: #LessonsDM

Global Data Strategy, Ltd. 2017

Lessons in Data Modeling Series

• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture

• February 23rd Data Modeling and Business Intelligence

• March Conceptual Data Modeling – How to Get the Attention of Business Users

• April The Evolving Role of the Data Architect – What does it mean for your Career?

• May Data Modeling & Metadata Management

• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling

• July Data Modeling & Metadata for Graph Databases

• August Data Modeling & Data Integration

• September Data Modeling & MDM

• October Agile & Data Modeling – How Can They Work Together?

• December Data Modeling, Data Quality & Data Governance

3

This Year’s Line Up

Global Data Strategy, Ltd. 2017

Both Business & Technical Drivers Require Data Integration

4

A Data Model is a Common Reference Hub for Business & Technical Rules

Business Drivers

Technology Drivers

Enterprise Knowledge Inventory

Mergers & Acquisitions

Innovation & Collaboration

Efficiency & Agility

Etc…

Data ModelData Warehousing

Master Data Management (MDM)

Data Lake

APIs & Application Integration

Etc…

The Data Model is the Common

Reference

Global Data Strategy, Ltd. 2017

Levels of Data Models

5

Conceptual

Logical

Physical

Purpose

Communication & Definition of Business Concepts & Rules

Clarification & Detail of Business Rules &

Data Structures

TechnicalImplementation on a Physical Database

Audience

Business StakeholdersData Architects

Data ArchitectsBusiness Analysts

DBAsDevelopers

Business Concepts

Data Entities

Physical Tables

Business StakeholdersData Architects

EnterpriseSubject Areas

Organization & Scoping of main business domain areas

Data Integration

Team

Data Models are helpful for data integration at each level.

Global Data Strategy, Ltd. 2017

Enterprise Knowledge Inventory

• A data model describes a business, particularly at the conceptual & logical levels. It provides:• Inventory of the key data assets that run the organization

• Clarification of core terminology & business definitions (e.g. what do we mean by Location…)

• Definition of core business rules & practices (e.g. Staff are assigned to only one Location…)

6

In a data-driven business, data is your core Intellectual Property (IP)

A “one page” enterprise business model should provide an overview of what the business does & how it operates. - i.e. The model to the left is most likely a retail organization, not a health care provider.

Global Data Strategy, Ltd. 2017

Enterprise Knowledge Inventory

• A data model describes YOUR business, with your unique business rules, terminology, & definitions. • It describes the unique way your organization operates

• It is your IP -- protect & manage it accordingly

7

In a data-driven business, data is your core Intellectual Property (IP)

Your Organization

Integration

You’ll need to integrate with applications, partners, agencies, etc. -- but they should not necessarily re-define how your organization operates.

Etc.

Applications & Partners should NOT:• Hold your data “captive”• Re-define how you do business, just to fit

their canned data model. (There may be a reason to change, but do it purposefully).

Global Data Strategy, Ltd. 2017

Enterprise Knowledge Inventory

• A common question is whether to use an industry standard data model.• Industry models can be a helpful reference & guide

• But don’t blindly follow them, without customizing for your unique organization.

• Just as your organization is unique, so is its data.

8

In a data-driven business, data is your core Intellectual Property (IP)

Your Organization

Reference

Industry data models can be a great reference, but you’ll likely want to customize them to fit your unique organization.

Industry Standard Model

Global Data Strategy, Ltd. 2017

Mergers & Acquisitions

• Data is a large part of the value of a business acquisition• Increasingly, data is a key driver for acquisitions – obtaining the data that another firm maintains about

customers, products, recipes, innovations, etc.

• The data holds the rules, history & IP of the business.

• Just as you would take an inventory of products, you need to data a data inventory – via a data model (reverse engineering).

• Disparate business processes are often manifested in the data. Ignoring these key business process differences can wreak havoc on operations.

9

Organization BOrganization A

What issues might arise in integrating the customer accounts from the two organizations?

Global Data Strategy, Ltd. 2017

Efficiency & Agility

10

• In many organizations, a great deal of time, energy, and brainpower is wasted:• Reformatting or re-working data from disparate sources

• Searching to understand the meaning of data

• Looking for data that is unavailable

• Siloes often exist, with key information not being shared – not out of malice, but because common, published inventory & standards don’t exist – i.e. lacking a common data model.

I’m just about done with my spreadsheet – Customers by Region, Age, and Income Level. Great!

If I have to reformat this spreadsheet one more time to account for mismatched Region Codes, I’m going to shoot myself.

Etc.!

Why can’t we get Income Levels for our customers? This is so dumb.

Global Data Strategy, Ltd. 2017

Innovation & Collaboration

• An Enterprise Data Model provides a “catalogue” of an organization’s data asset.

• Staff are able to see all of the data available across the organization – spurring innovation & collaboration.

11

Sharing the catalogue of enterprise data assets

I didn’t realize that the Insurance Dept was tracking Weather

Events. I could use that to link Weather to Product Sales for

Trend Analysis!! Cool!

12

Technical Data IntegrationMany styles & methods

Global Data Strategy, Ltd. 2017

Data Modeling for Data Warehousing & Business Intelligence

• What is the definition of customer?• Where is the data stored?• How is it structured?• Who uses or owns the data?

Data Warehouse BI Report:Customers by Region

• What are the definitions of key business terms?• What do I want to report on?• How do I optimize the database for these reports?

Data Modeling helps answer:

For Data Warehousing For BI Reporting

Data Modeling helps answer:

• Data Modeling is the “Intelligence behind Business Intelligence”• Creating business meaning & context

• Understand source and target data systems

• Optimize data structures to align queries with reports

Show me all customers by region

Source Systems

Relational ModelDimensional Model

Global Data Strategy, Ltd. 2017

The Need for Data Warehousing

True or False: “We don’t need data warehousing any more because storage is so cheap and processing power is so fast with today’s modern hardware.”

14

Global Data Strategy, Ltd. 2017

The Need for Data Warehousing

• True or False: “We don’t need data warehousing any more because storage is so cheap and processing power is so fast with today’s modern hardware.”

15

I can’t find anything in this file cabinet. It’s just a bunch of papers without any folders or organization!

Don’t worry—just get more file cabinets.

Much of the value in data warehousing is making data consumable & understandable for ease of reporting.

Global Data Strategy, Ltd. 2017

Metadata MattersEven with today’s advanced hardware & storage options, self-service BI tools, and data

science skills & tools, attention needs to be paid to the quality, context, & structure of data

Raw data used in Self-Service Analytics and BI environments is often so poor that many data scientists and BI professionals

spend an estimated 50 – 90% of their time cleaning and reformatting data to make it fit for purpose.(4

Source: DataCenterJournal.com

Correcting poor data quality is a Data Scientist’s least favorite task, consuming on average 80% of their working day

Source: Forbes 2016

(aka Data Models & Metadata)

If I have to reformat this spreadsheet one more time to account for mismatched Region Codes, I’m going to shoot myself.

Global Data Strategy, Ltd. 2017

Master Data Management (MDM)• Master Data Management (MDM) is the practice of identifying, cleansing, storing & governing

core data assets of the organization (e.g. customer, product, etc.)

• There are many architectural approaches to MDM. Two are the following:

17

Centralized Virtualized/Registry

MDM

Virtualization Layer

• Core data stored in a common schema in a centralized “hub”.

• Used as a common reference for operational systems, DW, etc.

• Data remains in source systems.

• Referenced through a common virtualization layer.

BOTH require a Data Model

Global Data Strategy, Ltd. 2017

MDM Data Models

• In an MDM Model, the core attributes for master data entities can be identified.

• In some cases, Stewardship can be defined at the attribute level.• Multiple groups update/create/monitor certain field

values.

• Certain attributes are core to all (e.g. demographics info)

• (More on this next month…)

18

Core, Shared Attributes

Team A

Team B

Team C

Patient

Patient ID

Date of Birth

SSN

First Name

Last Name

Maiden Name

Middle Name

Name Prefix

Name Suffix

Date of Death

Phone Number

Email

Gender

Marital Status

Race

Ethnicity

Religion

Primary Language

Secondary Language

Primary Diagnosis Group

Secondary Diagnosis Group

Competency Status

Education Level

Need of Detox

Risk of Harm to Self

Special Needs Requirement

Current Risk

Veteran Status

National Guard/Military Reserve

Pregnant

Employment Status

Number in Household

Household Income

Living Arrangement

No Mailings

Global Data Strategy, Ltd. 2017

Data Lake Big Data Model

• With the Big Data and NoSQL paradigm, “Schema-on-Read” means you do not need to know how you will use your data when you are storing it.

19

File systemhdfs dfs -put /local/path/userdump /hdfs/path/data/users

Table StructuresCreate table …

AnalysisAnalyze & understand the data. Build a data structure to suite your needs.

• You do need to know how you will use your data when you are using it and model accordingly.

• For example, you may first place the data on HDFS in files, then apply a table structure in Hive.

• Apache Hive provides a mechanism to project structure onto the data in Hadoop.

Hive

HDFS

Exploration

Global Data Strategy, Ltd. 2017

An Enterprise Data Inventory for the Data Lake

• An Enterprise Data Model can help provide an Inventory for what data resides in the Data Lake.

20

Twitter Feeds

Hive Table for Staff

NOAA Weather Feeds

Hive Table for Product

Sensor Data

Allowing for Innovation & Discovery

Global Data Strategy, Ltd. 2017

APIs and Application Integration• APIs are a standard ways to share data to/from applications.

• These should be mapped to the Enterprise model, but the API model/design should focus on the User Perspective.

21

Enterprise Perspective User Perspective

PersonObject

PersonPersonID: stringPersonFirstName: stringPersonLastName: string

GetPersonObject (GET)

PutPersonObject (PUT)

Application

Global Data Strategy, Ltd. 2017

Summary

• Business Data Models help create an Enterprise Knowledge Inventory that can help with business drivers such as:

• Innovation & Collaboration

• Mergers & Acquisitions

• Efficiency & Agility

• Business Data Models help define the core definitions & rules for your enterprise data

• Data is your organization’s IP

• A Data model is an inventory of the data asset

• Technical Data Models support the various ways to integrate data from Data Warehousing, Data Lakes, MDM, to APIs.

• Technical source & target formats are key

• Helps define stewardship & ownership

• Format should suit the purpose & audience

• Using Data Models are a core part of data integration helps provide both structure & meaning for both business & technical team members

Global Data Strategy, Ltd. 2017

About Global Data Strategy, Ltd

• Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology.

• Our passion is data, and helping organizations enrich their business opportunities through data and information.

• Our core values center around providing solutions that are:• Business-Driven: We put the needs of your business first, before we look at any technology solution.• Clear & Relevant: We provide clear explanations using real-world examples.• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s

size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of

technical expertise in the industry.

23

Data-Driven Business Transformation

Business StrategyAligned With

Data Strategy

Visit www.globaldatastrategy.com for more information

Global Data Strategy, Ltd. 2017

Contact Info

• Email: [email protected]

• Twitter: @donnaburbank

@GlobalDataStrat

• Website: www.globaldatastrategy.com

24

Global Data Strategy, Ltd. 2017

Lessons in Data Modeling Series

• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture

• February 23rd Data Modeling and Business Intelligence

• March Conceptual Data Modeling – How to Get the Attention of Business Users

• April The Evolving Role of the Data Architect – What does it mean for your Career?

• May Data Modeling & Metadata Management

• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling

• July Data Modeling & Metadata for Graph Databases

• August Data Modeling & Data Integration

• September Data Modeling & MDM

• October Agile & Data Modeling – How Can They Work Together?

• December Data Modeling, Data Quality & Data Governance

25

This Year’s Line Up

Global Data Strategy, Ltd. 2017

Questions?

26

Thoughts? Ideas?