kvalitetssikring av “big data” - qrn.no · 2019-07-31 · kvalitetssikring av “big data ......
TRANSCRIPT
DNV GL © SAFER, SMARTER, GREENERDNV GL ©1
Kvalitetssikring av “Big Data”
Er organisasjonens data et virksomhetskritisk produksjonsmiddel?
Per MyrsethData Management Competence Centre (DMCC) DNV GL Digital Solutions
DNV GL ©
Main topics of today
▪ Data as an asset and production means, including Big Data
▪ Data management as part of general quality management
▪ Risk management of data dependent operations
3
DNV GL ©
Digitalization: The relationship between processes, data and technology
5
Processes
DataTechnology
Business goals
Designed and managed to meet
People
DNV GL ©
Digitalisation roadmap and creating value from data
6
Processes
Data Technology
Business roles and goals
Processes
Data Technology
Business roles and goals
Current competitive landscape Changed and/or improved
• Technology
• Data
• Processes
• Competence & capacity
People People
Future competitive landscape
DNV GL ©
Value creation from data and the risk of using bad data
8https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
DNV GL ©
Intangible assets, investments and market value
9
https://ssrn.com/abstract=3009783
DNV GL ©
Creating value from data
Data/Info Management
Data
ingestio
n, s
tream
ing o
r
batc
hes
Data
quality
assessm
ent
Oth
er D
ata
Pre
para
tion
Data
cle
ansin
g
Data
harm
oniz
atio
n
Security and access control
Condition based
maintenance
Business Use-Cases
Analytics
Safety and Barrier
Mgmt.
Operational Efficiency
DNV GL ©
Data life cycle, when does data bring value
14
Value to business
Creation Several years later
Create the data today that bring
value to business tomorrow
DNV GL ©
v
c
lp q r s y z
Known good
Known insufficient
Known unavailable
Unidentified potential
Data we have versus data we need
15
v
c
lp q r s y z
e
ik
o
d
g
j
u
a
f
h
m nw x
b
Data relevant for the task
Importance of
contribution
Y(t) = at+bt+ct+ dt+et+ft+ gt+ht+ it+ jt+kt+ lt+ mt+nt+ot+pt+qt+ rt+st+ut+vt+wt+ xt+yt +zt
DNV GL ©
Having the right data at hand:a prerequisites for transformation, automation, and success with data science and new services
16
Low data quality
Few data sources
Data fit for purpose within cost and risk
Future way for working
High data quality
Every data source of your dreams
Current way for working
DNV GL ©
Explorative or use case driven value creation
17
Small data
Know which problem to solve
Big data
Do not know what problem to solve
Project AProject B
Project DProject C
Where to place projects A-D
• Incremental improvements
• High or low ROI risk
• Something totally new
• Easy to copy innovation
DNV GL ©
Understanding the data quality consequences of data in value chains
18
Merge/Integrated
Dataset D1
Dataset D2
Dataset D3
Dataset D123
Derived
Dataset D4
ORIGINAL
- SLA
- Cost
- Contract rights/IPR
- Liability
- Confidentiality
TRANSFORMED
- SLA
- Income
- Contract rights/IPR
- Liability
- Confidentiality
Lineage of data in a data flowSLA - Service Level Agreement
IPR - Intellectual Property Rights
DNV GL ©
Data value chain
http://www.dlineage.com/assets/img/articles/impact-analysis-4.jpg
DNV GL ©
‘Enterprise-Oriented’ Strategic Long Term Focus
‘Project-Oriented’ Short Term Focus
Many system vendorsProject focus
Personnel change
System IntegratorsQuick fix
Data management : long term and enterprise focus is needed
KPI’sDeadlines Pilots
Outside my scopeSmall budget
Standards, who cares…
Technology driven
Silos
Data management debt
Data management responsible
DNV GL ©
Digitalization: The relationship between processes, data and technology
22
Processes
DataTechnology
Business goals
Designed and managed to meet
People
DNV GL ©
Data as an asset and business driver
▪ The primary driver for Data Management is
to enable organizations to get value from
their data assets, just as effective
management of financial and physical
assets enables organizations to get value
from those assets. – (Ref DAMA DMBOK 2.0)
▪ Information is recognized as an enterprise asset. Although
generally accepted accounting principles (GAAP) as yet do not
require the reporting of information assets on the balance
sheet, infonomics deems that organizations acknowledge that
information is more than merely a resource.
– (Ref: March 2017, Doug Laney of Gartner spoke on Infonomics)
23
DNV GL ©
Extended data quality assessment process
24
Define scope
Data quality assessment
Data usage risk assessment Fit for use within acceptable risk
Risk based data quality improvement
Organizational maturity assessment
Data exploration and profilingGet access to data, prepare, explore
• Define scope • Consequence dimensions• Perform risk assessment
• Root causes• Measures • Prioritize
• Intended use • Context
• Datasets & criticality• Data origin and path
• Data• IT Systems
• Processes • Organization
Continuous im
pro
vem
ent
cycle
• Identify and define req.• Prepare and configure • Perform assessment
• Organizational scope• Interview & review• Analyse and score
Sensor system assessment
• Clarify scope• Choose methodology• Analyse and score
Algorithm assessment
• Clarify scope• Choose methodology• Analyse and score
Extended
scope
DNV GL ©
Data Quality Assessment Framework in use
25
Production line for data,Sensor system, Operational & IT
TechnologyData Asset Data Use
Risk
The Organization
Data quality assessment activities
Organizational Maturity
Assessment
Data Quality Assessments
Continuous improvements
Sensor system
assessment
Algorithm assessment
DNV GL ©
Organizational Maturity Assessment - 2
27
Maturitylevel
GovernanceOrganization and
peopleProcesses Process Efficiency
Requirement definition
Metrics and dimensions
Architecture, tools and technologies
Data standards
LEVEL 5 -Optimized
Data management policies governs and drives improvements
Data management board oversees improvement activities
Processes for continuous improvement in place
Processes provides feed-back and feed-forward to support continuous improvement
Baseline established and improvements measured according to requirements
Metrics defines baseline to support continuous improvement
Tools support policy driven continuous improvement cycle
Standard compliance and domain models are subject to continuous improvement
LEVEL 4 –Managed
Policies defined in relation to business objectives
Skillset extended to include risk analysis of quality issues aligned with business objectives
Processes for impact analysis and risk mgmt. in place
Monitoring is performed across enterprise and published as KPI’s and trends
Requirements are linked to business impacts
Metrics are linked to business impacts and risk analysis
Tools are driven by business objectives and include support for root cause analysis and risk mgmt.
Standards are used actively to reduce risk for critical business operations
LEVEL 3 –Defined
Policies defined at enterprise level
Roles and required skills defined at enterprise level
Processes are defined and implemented consistently across enterprise
Defined metrics are monitored in advance of business impact
Requirements defined and communicated at enterprise level
Framework for metrics and dimensions defined at enterprise level
Architecture in place at enterprise level supporting full stack data management
Standards, domain models and semantics used at enterprise level
LEVEL 2 –Repeatable
Local initiatives address the requirement for policies
Locally defined roles and some basic skills
Best practices in place but not used consistently
Generic metrics are monitored at point of impact
Local initiatives define requirements
Metrics are reused locally in projects
Tools and technologies used consistently in selected projects
Industry standards and domain models used selectively across projects
LEVEL 1 -Initial
Only ad-hoc or temporal policies in place
No formally defined roles or skillset
Ad-hoc or reactive responses to quality issues
No baseline and no monitoring of quality issues
Re-engineering used to derive requirements
Project specific metrics
Tools are used ad-hoc per project
Ad-hoc and inconsistent use of standards
Objectives,
Policy, Culture, Awareness, Risks, Capabilities to handle DQ issues
Organization, roles, responsibilities, authority, skillsets
Structured and vetted ways of handling and preventing DQ issues
Measure, monitor and use metrics to mitigate DQ issues
DQ Requirements defined,communicated and acted upon
DQ metrics defined, setup, measured and monitored
DQ Tools forprocessing, analysing and correcting DQ issues with data assets
Use available standards, models,ontologies and taxonomies – a corporate «DQ language»
People and
processes
Definitions
and
requirements
Technology
and
standards
Target Line
short way to improvements
longer way to improvements
Objective is to use what
already has been developed
DNV GL ©
How to set up, monitor and improve your data management
▪ Data Management provides foundation for organizing data effectively
Required capability areas:
28
Governance
Organization
and people
Processes
Process
efficiencyMetrics &
dimensions
Requirement
definition
Architecture, tools &
technologies
Data
standards
DNV GL ©
Data Quality Engine
30
Production line for data,Sensor system, Operational & IT
TechnologyData Asset Data Use
The Organization
DNV GL Runtime services
Data Quality Engine
Quality status &
cleansed data
Data Quality rules
Data Cleaning Engine
Data cleaning
rules
DNV GL ©
Data quality metrics – accurate and complete data
▪ Data types and format
▪ Data quality flags
▪ Collection frequency (missing records/duplicate
records)
▪ Value distribution according to valid range
▪ Redundancy for reference data – correlations
▪ Rate of change
▪ Outliers
▪ Identification tags – traceability
▪ Missing values
▪ Sensor drift
▪ Consistency
▪ Code lists
▪ Event logs
31
Min
Max
Outside range
Time collision
RoC
Missing data
t
v
DNV GL ©
Data quality and cleansing process
32
RAW
DATA
CLEANSED
DATA
CHANGE
LOG
DATA QUALITY
DATA CLEANSING
Access raw
data
Use data quality
rules to determine
if data need
cleansing, reuse
rules from data
quality
assessment
Clean data according to
predefined activities
- Replace values
(default/interpolation)
- Insert new values
- Delete values
- Tag values
- …
Store cleansed data
Record all changes
applied to raw data
- What changed
- Why
- By whom
- When
- …
Data quality engine
Data quality rules
Data cleansing
rules
DNV GL ©
Extended data quality assessment process
34
Define scope
Data quality assessment
Data usage risk assessment Fit for use within acceptable risk
Risk based data quality improvement
Organizational maturity assessment
Data exploration and profilingGet access to data, prepare, explore
• Define scope • Consequence dimensions• Perform risk assessment
• Root causes• Measures • Prioritize
• Intended use • Context
• Datasets & criticality• Data origin and path
• Data• IT Systems
• Processes • Organization
Continuous im
pro
vem
ent
cycle
• Identify and define req.• Prepare and configure • Perform assessment
• Organizational scope• Interview & review• Analyse and score
Sensor system assessment
• Clarify scope• Choose methodology• Analyse and score
Algorithm assessment
• Clarify scope• Choose methodology• Analyse and score
Extended
scope
DNV GL ©
Data storage architecture, where
35
Sensor Operational
systemIT-systems
On Asset/plant On premises
Cloud
Partner/ third party
DNV GL ©
Production line and
main process
Supporting process:
Data management and
data quality monitoring
Business activities and data relationship
36
Data
Business
activity 1Business
activity 2
Business
activity 3
Create Use Create Use Create
Data quality
monitoring
Data issue
handling
DNV GL ©
Data value chain and interoperability layers
37
ICT and communication
Data and semantics
Business model and process
Legal and contractual
37
ICT and communication
Data and semantics
Business model and process
Legal and contractual
Enterprise 1 Enterprise 2
Value Chain/
Value network
DNV GL ©
Data quality and data cleansing on top of Veracity
38
Data capture
and handling
Data
transfer Data
access
Data quality
assessment
Data
cleansing
Data quality
acceptanceData
transfer
Data
accessData use and
analytics
P2PData quality
improvements
Data continuous
improvements
Customer site DNV GL Provider on Veracity
DNV GL ©
Where to find us?
External access information:
Feature article on DNVGL.com front page, on treating data as asset
https://www.dnvgl.com/feature/data-quality.html
Guides and best practices
Recommended practice:
Data quality assessment framework, DNV GL RP-0497
https://rules.dnvgl.com/docs/pdf/DNVGL/RP/2017-01/DNVGL-RP-0497.pdf
Guide:
Creating Value from Data in Shipping
https://www.dnvgl.com/maritime/Creating-Value-from-Data-in-Shipping/index.html
41
DNV GL ©
Where to find us on Veracity
Organizational Maturity Assessment (Advisory) (based on ISO 8000 Data Quality)
https://store.veracity.com/organizational-maturity-assessment/product/24348a37-fa51-4fec-b36e-06c4d9c38615
Data Quality Assessment (Advisory): (based on ISO 8000 Data Quality)
https://store.veracity.com/data-quality-assessment/product/658d3000-51b0-424a-b4a6-0909a22b164a
Data Quality as a Service (Software as a service): (based on ISO 8000 Data Quality)
https://store.veracity.com/data-quality-as-a-service/product/1063bf6f-a4dd-4de1-a072-be50f7c79b81
Data Risk Assessment (Advisory): (based on ISO 31000 Risk Management)
https://store.veracity.com/data-risk-assessment/product/550f312e-5493-4cd4-a818-7af2145653ae
GDPR - Personal Data Risk Evaluation (Advisory) (based on ISO 31000)
https://store.veracity.com/gdpr-personal-data-risk-evaluation/product/81ceca7b-80c1-4c70-a2a4-4d37e59c81ee
Data management maturity self-assessment:
https://store.veracity.com/data-management-maturity-self-assessment/product/f83f117e-0338-4729-9dfe-e7aabbe225b8
42
DNV GL ©
SAFER, SMARTER, GREENER
www.dnvgl.com
The trademarks DNV GL®, DNV®, the Horizon Graphic and Det Norske Veritas®
are the properties of companies in the Det Norske Veritas group. All rights reserved.
Data is the new business enabler
43
Jørgen StangChief Specialist
Per MyrsethChief Specialist
Head of Section [email protected]
Jarl S. MagnussonChief Specialist
Simen SandelienPrincipal Engineer
Karl John PedersenPrincipal Specialist