idq summit2014 ronald damhof - it's all about the data
DESCRIPTION
"It's all about the data, a managerial perspective" - these are the slides of the presentations I gave at Data Modeling Zone 2014 in Hamburg and at the International Data Quality Summit in Richmond (VA) 2014.TRANSCRIPT
R.D.Damhof – October 2014 – IDQ Summit 2014
It’s all about the data !
A managerial perspective
By Ronald Damhof
R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014
I am an opinionated kind a guy…. !
R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014
Who am I - My Data Manifesto
The X commandments of data management !I. Context is leading !II. Data is the ultimate proprietary asset, it is to be managed and
governed in line with morals & ethics, internal and external rules and legislation
!III. Stop center apps and process over data; data first, facts first !IV. It is all about the quality of our product; the data. Get clean,Stay
clean, Get access !V. Thou shall abstract
and separate concerns rigorously !
!
R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014
Who am I - My Data Manifesto
The X commandments of data management !!VI. a) Thou shall make a fundamentalistic distinction between Fact
and Context b) Thou shall not forsake ‘Time’
!VII.Data architecture is not the same as technology architecture !VIII.The science and practice of Information & Data Modeling needs
to be uphold, improved and taught !IX. Specify, Standardise, Automate & Productise !X. Thou can not buy your way out of the data misery you are in
!
!
R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014
XI There is a new saviour in town. Its name is Hadoop and
it calls to us from its mountain: !
‘we got a lake and thou shall throw all your data in it. The water will be clean so you can drink it, the water will flow so it will irrigate your lands, grow your stock, feed your kids and
of course bring you world peace…..’ !
nah, kidding ;-) !
Who am I - My Data Manifesto
The X commandments of data management
R.D.Damhof – September 2014 – Data Modeling Zone
R.D.Damhof – Prudenza BV - Copyright - 22 mei 2014R.D.Damhof – October 2014 – IDQ Summit 2014
R.D.Damhof – October 2014 – IDQ Summit 2014
R.D.Damhof – October 2014 – IDQ Summit 2014
Logistics & Manufacturing
R.D.Damhof – October 2014 – IDQ Summit 2014
R.D.Damhof – October 2014 – IDQ Summit 2014
Push/Supply/Source driven Pull/Demand/Product driven
▪ Mass deployment ▪ Control > Agility!▪ Validation of “ingredients” ▪ Repeatable & predictable processes ▪ Standardized processes ▪ High level of automation ▪ Relatively high IT/Data expertise
▪ Piece deployment ▪ Agility > Control!▪ Plausibility ▪ User-friendliness ▪ Relatively low IT expertise ▪ Domain expertise essential
All facts, fully temporal Truth, Interpretation, Context
Business Rules Downstream
The Data Push Pull Point
R.D.Damhof – October 2014 – IDQ Summit 2014
Systematic
Opportunistic
▪ User and developer are separated ▪ Defensive Governance; focus on control and compliance ▪ Strong focus on non-functionals; auditability, robustness, traceability, …. ▪ Centralised and organisation-wide information domain ▪ Configured and controlled deployment environment (dev/tst/acc/prod)
▪ User and developer are the same person or closely related ▪ Offensive governance; focus on adaptability & agility ▪ Decentralised,personal/workgroup/department/theme information domain ▪ All deployment is done in production
The Development Style
R.D.Damhof – October 2014 – IDQ Summit 2014
Development Style
Systematic
Opportunistic
I II
III IV
Research, Innovation & Design
“Shadow IT, Incubation, Ad-hoc,
Once off”
Push/Supply/Source driven Pull/Demand/Product driven
Data Push/Pull
Point
ContextFacts
A Data Deployment Quadrant
R.D.Damhof – October 2014 – IDQ Summit 2014
7 Applications of the Quadrant
R.D.Damhof – October 2014 – IDQ Summit 2014
(1) How we produce
R.D.Damhof – October 2014 – IDQ Summit 2014
How we produce, process variants
R.D.Damhof – October 2014 – IDQ Summit 2014
How we produce, automation
Rephrased - somewhat more nerdy:!• Model-driven, metadata driven!• Declarative instead of imperative !!Rephrased - somewhat more popular: !“In Data, the developer is the data modeller”
R.D.Damhof – October 2014 – IDQ Summit 2014
Production-line: Data orientation
Data Products Information Products
Access to data
Analytical tools
Processing Power
Production-line: Forms orientation
Eg. XBRL
How we produce, production lines
R.D.Damhof – October 2014 – IDQ Summit 2014
(2) How we organize
R.D.Damhof – October 2014 – IDQ Summit 2014
To centralize or to decentralize
R.D.Damhof – October 2014 – IDQ Summit 2014
(3) How we govern
R.D.Damhof – October 2014 – IDQ Summit 2014
How we govern, products
R.D.Damhof – October 2014 – IDQ Summit 2014
I II
III IV
Deliverant is Accountable
Demandee is Accountable
Data scientist/Analyst/Researcher responsible
How we govern, accountability Never, never, never ‘ownership’
In- en outboundData Delivery Agreements
R.D.Damhof – October 2014 – IDQ Summit 2014
(4) How do people excel
R.D.Damhof – October 2014 – IDQ Summit 2014
(5) How to use technology
R.D.Damhof – October 2014 – IDQ Summit 2014
Storage: (R)DBMS Processing: Automation Software Data Quality: Validation, Profiling Development: Data Modeling Accessibility: Data Virtualization
Storage: Pattern based Processing: Automation/limited ETL Data Quality: DQ rules/dashboards User tooling: Reporting, dashboards, Data Visualization
Storage: Analytical Processing: Preptools for Data Analyst User tooling: Advanced Analytics, Data Visualization
(6) How about Technology
R.D.Damhof – October 2014 – IDQ Summit 2014
(7) Business-,Information- or Data Modeling is key
The Logical Model drives the technical data architecture, design and implementation
Conceptual
Logical
e.g Data Vault,
Anchor Model
e.g. Dimensional,
hierarchical,flat
OntologyFacts
Relational
R.D.Damhof – October 2014 – IDQ Summit 2014
Oh…data warehouse?The classic distinction between ‘operational data environment’ and ‘informational data environment’ is fading. "!
Modern day data warehouses have been split up. Where the ‘fact’-part (Q1) moved into the operational side."!
Although data warehouses have evolved, operational applications have not, at least not in terms of data architecture. They should though…..
R.D.Damhof – October 2014 – IDQ Summit 2014
Email: [email protected] Linkedin: nl.linkedin.com/in/ronalddamhof/ Twitter: RonaldDamhof Blog: prudenza.typepad.com Website: www.prudenza.nl