business semantics for data governance and stewardship
Post on 17-Jul-2015
576 Views
Preview:
TRANSCRIPT
Business SemanticsFor Data Governance & Stewardship
Dr. Pieter De Leenheer
Sloan HallStanford University
Feb 4 - 2015
Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
La Trahison des Images (Magritte, 1929)
La Trahison des Images (2)
https://deleenheer.wordpress.com/2009/12/15/magrittes-flirting-with-semantics/
What we talk about when we talk about
no Data Governance
Who approved this?
I wish these guys
spoke our
language
I can’t understand
this report !
I’ve never seen this
code! Who
introduced this ?
This doesn’t seem
right. Are we sure
this data is correct ?
The Problem
This rule is
different in our
country !
This is an exception
to the rule !
Glossary Search
• How frequently do you look up a word for your business?
• To what purpose?– Clarification– Differentiation
• What are your main sources?• Hierarchy-based navigation or key-word based
search?• Authoritative Truth or trust?
From Truth to Trust: Behind the Curtains
https://www.research.ibm.com/visual/projects/history_flow/results.htm
Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
Spectrum of Business Semantics
Welty, C., Lehmann, F., Gruninger, G., and Uschold, M. (1999). Ontology: Expert systems all over again? In Invited panel at AAAI-99: The National Conference on Artificial Intelligence, Austin, Texas, USA.
The Big ‘Metadata’ BangCatalogue and text files
• The start of an organization’s data management
• Represented by shared folders with lists of things such as product, customer, templates
• First ‘clouds’ of metadata– Naturally emerge as by-product
– For human consumption
– Locally understood
• From this point exponential expansion:
• in volume• in consumers (receiver)• in producers (sender)• in entropy
Glossary• List of terms and definitions
e.g., http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/data-governance-and-stewardship-materials/
Thesaurus
• add homo-, syno, mero-, hyper- and hyponymous relations
Taxonomy
• Formalized representation of a “thesaurus”• Generalize and specialize properties and relations
– generalize Vendor and Customer with similar properties into Party
– specialize Location into Home Address and Office Addressbecause of different properties
• Classifying a thing as a Term, Data Element or System– E.g., “customer” vs. “CUST_TBL” vs. “CRM” to determine
ownership
• Inheritance-based reasoning such as syllogisms– Premise: “John doe” is a lead– Premise: All leads receive a mortgage offering– Conclusion : “John Doe” receives a mortgage offering
Frames
Logical constraints
• Modal Logic:
– context determines meaning, truthfulness, validity
– plausibility vs. necessity
• Modalities determine:
– who owns a term per region, process, function
– where and how enforce terms
– What the definition is of a term
Hierarchical Context in ACORD
Multidimensional Context
Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
Situating an organization’s level of glossary need
size characterizing events business needs technology support status
1 to 50first term-and-condition templates, first products, customers
a catalogue of items like customers, products and offerings spreadsheet database
51 to 100
first customer segmentationlead engine setupbusiness functions defined
as the catalogues grow in size, transform loose descriptions and definitions in text files into a glossaryof terms
shared file folders (for lead, prospect, customer, product, offering)
101 to 500
business functions populatedinter-functional business processes developproduct and customer data volumes grow
the need for a thesaurus for comparing glossaries, differentation of customer types, pricing models, reporting templateslocal data analytics and storage
Spreadsheet, mediawiki, functional processes like salesforce, SDLC, servicenow; forecasting tools, reporting tools, databases
501 to 1000
invested growthmergers and acquisition take placefirst signs of corrupt data reports on the board table
the need to transforming thesauri into taxonomiesand data models and architecture framesISO/ACORD/BCBS standardization
mediawikis go viral without proper alignemnt between them; first metadata tools in IT to align certain functions, business limited to spreadsheets
1001 plus
global operationsone or more red flaggs: legal (regulatory compliance breached): organizational (CxO fired), bad reputation (fraud), financial loss (penalties, debt)
Reporting standards transformed into corporate data policies and rules and data qualityModalities as to who are to define them and how and where to enforce them have been setThe need for the CDO function is mentioned but resistance from CIO/CTOBig Data opportunities loom beyond the data nebula (screen with universe).
platform with several data management systems (infa, ibm, oracle) scared by M&A. Lineage fragmented, not properly validated by businessdata governance organization theorized (or failed before) so no one takes accountability, lack of functional descriptions or enterprise-wide championshipGlossaries’ usefulness implodes as their numbers increaseThe enterprise data model is common ground for IT but useless to the business. Validation is urgent.
Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
Principles of Business Semantics
• Democracy
• Emergence
• Perspective rendering
• Perspective unification
• Validation
http://www.academia.edu/874733/Business_semantics_management_A_case_study_for_competency-centric_HRM
Principles at work in the Situation Map
• Emergence is a continuous principle at work• Unification and rendering continuous in flux but
at two different frequencies (B vs. IT)• Validation is limited to technical lineage• Democracy and Business Validation (socio-
technical) are lacking
• Reactive rather than pro-active governance (defining) and stewardship (enforcing)
• Lack of tools
Overview
• Communication: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
Gradually Build Trust based on Stewardship and Validation
• What?
– Qualitative meta data: e.g., definition for
address, codes, mappings, classifications, etc.
• Who?
– Roles and responsibilities for people
• How ?
– Collaborative workflows to orchestrate
people in achieving high-quality meta-data
– Start Simple, Buy-in, Council
– Measure Maturity and Trust
– Separate stewardship from integration
Data Governance Council: Governance Operating Model
Roles &
Responsibilities
Processes &
Workflow
Asset Types &
Traceability
Data Governance
Organization
Data Stewardship Activities
Data Quality
Development
IT / Operational Data Management Activities
Data
Modeling
Metadata
Lineage
Establishes & drives
Aligns & Coordinates
Reports & Escalates
Monitors & Remediates
Metadata
Scanning
Reference Data
Authoring
Data
Integration
Collibra Business
Semantics Glossary (BSG)
Collibra Reference Data
Accelerator (RDA)
Hierarchy
Management
Business &
Data Definitions
Business
Traceability
Semantic
Modeling
Mapping
Specifications
Policy
Management
Business
Rules
Data Quality
Rules
Data Quality
Reporting
Issue
Management
Reference Data
Crosswalks
Master Data
StewardshipData Quality Profiling
DQ Defect
Resolution
Collibra Data Stewardship
Manager (DSM)
Collibra Platform
Other Data Management
Vendor products
...
Example in Health Insurancehttp://prezi.com/ve1ws8jmpqcn/workflow/
Global Data Governance
• Objective– n Enterprise service buses => 1 Global Information Market Place
• Challenges – Data Service = data sharing agreement across organization silos, policies,
regulations, semantic assumptions. E.g., Address
– No clear balance between data ownership and control:
• responsibilities are not set
• for each data point : increasing exposure to risk regarding quality and policy compliance
• Service is more about trust because truth is relative
Solution
Solution
One Global Information Hub
Solution Phase 1 : Jun-Sept
One Global Information Hub
Solution Phase 2 : Oct-Nov
One Global Information Hub
Solution Phase 2 : Oct-Nov
One Global Information Hub
Solution Phase 3 : Dec -
One Global Information Hub
Solution
One Global Information Hub
What is to be governed?
Data Governance Questions
• What does the term ”address” mean?
• How is term “address" represented?
• In what system are data elements on ”address” recorded?
• What views does a data sharing agreement include?
• To which policy does my data sharing agreement comply?
• What country is my term “address” classified?
• …
Collibra Traceability Paths
Term has attributes definition, description, etc.
Term is represented by Data Element
Data Element has system of record System
Data sharing Agreement groups Data View
…
Business Term ≠
Data Elementhttps://compass.collibra.com/display/COOK/Asset+Types+and+Traceability+Requirements
Operating Model
Traceability Diagram
Who? RACI
How is it to be governed?
• Status Types and Workflows
– For Domains, Terms, Users, and later for Issues and Data Sharing
Agreements
BUSINESS SEMANTICS GLOSSARY
Candidate In Progress
Under Review
Accepted In Revision
Rejected
Term requested on
the domain page 1 1
1
2
2
3
3
2
3
Depricated
4
5
Workflows
1
2
Propose Business Term
Edit Business Term
3 Onboarding Business Term
4 Deprecate Business Term
5 Reactivate Business Term
https://compass.collibra.com/display/COOK/Lifecycle%3A+Workflows+and+Status+Types
How it it to be governed? Propose Workflow
How it it to be governed? Onboarding Workflow
How it it to be governed? Approval Workflow
Questions for the Audience
We presume the starting point is glossary.
• What factors would make it impossible?
• Know of cases where it has been achieved without?
• Is it possible to establish data governance without a glossary?
top related