effective audit trail of data with prov-o€¦ · title: 2017_mlw_chi_effective audit trail with...

27
13 June 2017 © COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Effective Audit Trail of Data With PROV-O Scott Henninger, Senior Consultant, MarkLogic

Upload: others

Post on 09-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Effective Audit Trail of DataWith PROV-OScott Henninger, Senior Consultant, MarkLogic

Page 2: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 2 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Operationalizing the Metadata EFFECTIVE AUDIT TRAIL WITH PROV-O

Data GovernanceQuality management

Provenance DimensionsTechnical perspective

Provenance ModelsMetadata description

Page 3: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 3 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Strategy and Execution§ Shared: Exchange of data between different

departments is possible

§ Reliable: Source has competence in the field of interest

§ Accurate: All accounting events are correct in value and description

§ Current: Data is up-to-date for the world it models

DATA GOVERNANCE

DATA LINEAGERISK MANAGEMENT

REGULATORY COMPLIANCE

ORGANIZATIONPROCESSES

DATA QUALITY

POLICIES

Page 4: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 4 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Information Chain§ Create: Generate new data entities or update

their state

§ Derive: Create value from more contributing data entities

§ Analyze: Inspect data to discover new useful information

§ Report: Submission of summary data as evidence of events

DATA GOVERNANCEINGEST

PREPARE

TRANSFORM

PUBLISH

DELETE

1

2

3

5

4

Page 5: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 5 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Provenance Metadata § Origin: Proof of the data ownership during the

history of the data

§ Timeline: Recorded timestamps of all events the data experienced

§ Process: Transformations that change the data during its lifecycle

DATA GOVERNANCE

UPDATES RESPONSIBILITY

INFORMATION ORIGIN

REPRODUCIBILITY

EVOLUTIONACCESS

Data provenance documents the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins

Data lineage includes the data's origins, what happens to it and where it moves over time

Page 6: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 6 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

“If the benefits of provenance are so well understood, why don’t more firms recognize

it as a priority? ”

What makes it difficult

§ Location Data is spread across different systems in different organizational silos

§ OwnershipLack of mature data governance makes the challenge of data lineage even more daunting

§ SpreadsheetsBusiness processes run outside of data management processes

Unforeseen costs§ Compliance risk

The business gets exposed to difficult contract negotiations which can incur additional data costs

§ Redundant data activitiesDuplicate controls are performed in different departments several times

§ Accuracy of analyticsImpossible to verify why models result in sub-optimal outcomes

Page 7: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 7 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Operationalizing the Metadata EFFECTIVE AUDIT TRAIL WITH PROV-O

Provenance ModelsMetadata description

Data GovernanceQuality management

Provenance DimensionsTechnical perspective

Page 8: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 8 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Metadata RepositoryPROVENANCE MODEL

ETL*ETL

ETL

LAZYComplex technique for reasoning

EAGERDerived directly from output database

TRACING PROVENANCE

TRACING PROVENANCE

Page 9: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 9 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Provenance StoragePROVENANCE MODEL

Envelope PatternProvenance stored with data

Separate DatabaseLarge provenance payloads stored with reference to data

<envelope> <provenance> <sem:triple> <sem:subject>/doc/id_a12a3.xml</sem:subject> <sem:predicate>http://www.w3.org/ns/prov#wasGeneratedBy </sem:predicate> <sem:object>/xform2016-07-20</sem:object> </sem:triple> <sem:triple> <sem:subject>/CanonicalTransform2016-07-20</sem:subject> <sem:predicate>http://www.w3.org/ns/prov#endedAtTime </sem:predicate> <sem:object> datatype="http://www.w3.org/2001/XMLSchema#dateTime"> 2016-07-20T12:01:42.987</sem:object> </sem:triple> ... </provenance><content>

<doc-id>a12a3</docId> <workflowStatus>Draft</workflowStatus> <version>2.3</version> ... </content></envelope>

Content Database Provenance Database

<provenance><sem:triple>

<sem:subject>/doc/id_a12a3.xml</sem:subject>

<sem:predicate>wasGeneratedBy</sem:predicate><sem:object>/xform2016-07-2</sem:object>

</sem:triple></provenance>

<content><doc-id>a12a3</docId><workflowStatus>Draft</workflowStatus><version>2.3</version>

...</content>

uri: /doc/id_a12a3.xml

Page 10: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 10 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

PROV Data ModelPROVENANCE MODEL

§ Entity: a trade, order, document, or other kind of entity, physical, digital or conceptual with some fixed aspects

§ Activity: something that occurs over a period of time and acts upon or with entities, such as creating, consuming, transforming, modifying, etc.

§ Agent: the business line responsible for an activity taking place, for the existence of an entity

AGENT

wasDerivedFrom

wasAttributedTo

wasAssociatedWith

uses wasGeneratedBy

xs:dateTime xs:dateTime

ENTITY

ACTIVITY

startedAtTime endedAtTime

W3C standard, circa 2013:https://www.w3.org/TR/prov-o/

Page 11: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 11 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Encoding SpecificationPROVENANCE MODEL

XML

<prov:documentxmlns:prov="http://www.w3.org/ns/prov#"xmlns:ex="http://example.com/ns/ex#">

<prov:entity prov:id="ex:e1"><prov:type xsi:type="xsd:string">approval </prov:type>

</prov:entity>

<prov:activity prov:id="ex:a1"><prov:type xsi:type="xsd:QName">Editing</prov:type>

</prov:activity>

</prov:document>

@prefix prov: <http://www.w3.org/ns/prov#> .@prefix : <http://example.com/> .

:geneSequencinga prov:Activity;

prov:startedAtTime "2012-04-25T01:30:00Z"; prov:used :drosophilaSample-84; prov:wasAssociatedWith :lab-technician-GH-32.

:drosophilaSample-84

a prov:Entity;prov:wasAttributedTo :lab-technician-FE-56.

:lab-technician-GH-32 a prov:Agent .

PROV-XML• Types and elements are reusable

PROV-O• Reason on provenance data• Specialized properties • Model-based extensions of the standard

Page 12: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 12 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Operationalizing the Metadata EFFECTIVE AUDIT TRAIL WITH PROV-O

Provenance DimensionsTechnical perspective

Provenance ModelsMetadata description

Data GovernanceQuality management

Page 13: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 13 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

▪ Non mutually exclusive ▪Who endorses the information ▪ How a decision is made▪ User consumption of provenance ▪What was considered for that decision

CONTENT

Content, Use, and ManagementPROVENANCE DIMENSIONS

MANAGEMENT USE

Page 14: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 14 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

▪ Non mutually exclusive ▪Who endorses the information ▪ How a decision is made▪ User consumption of provenance ▪What was considered for that decision

CONTENT

USE

Content ▪ Use ▪ ManagementPROVENANCE DIMENSIONS

MANAGEMNT

Scenario…an investment bank is implementing new regulatory reporting defined by CFTC, that will provide more information on their trading activities (extended type of financial products and pieces of data) in a shorter time frame (near real-time publication) with higher complexity of the rules determining who has the obligation to deliver the information.

Page 15: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

@prefix prov: <http://www.w3.org/ns/prov#> .

@prefix : <http://example.com#> .

:TransactionReport a prov:Entity;

prov:generatedAtTime "2017-04-12T12:12:12” ;

prov:wasDerivedFrom :TransactionA ;

prov:wasGeneratedBy :ReportGen .

:ReportGen a prov:Activity;

prov:used prov:TransactionA ;

prov:used :Venue1 ;

prov:wasAssociatedWith :Msma .

:TransactionA a prov:Entity

prov:wasAtttributedTo :Murex

:Venue1 a prov:Entity .

:Murex a prov:Agent, prov:SoftwareAgent .

:Msma a prov:Agent, prov:Organization ;

§ What the provenance is about

§ Sources used to create new result

§ Process that yielded the artifact

AttributionPROVENANCE CONTENT

XML

Page 16: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

§ Amendments are incorporated in the trade

§ Different aspects of the same trade linked together

EvolutionPROVENANCE CONTENT

ORIGINAL VERSION

AMENDED VERSION

@prefix prov: <http://www.w3.org/ns/prov#> .

@prefix : <http://example.com#> .

:Transaction1 a prov:Entity.

:Transaction2 a prov:Entity;

prov:wasRevisionOf :Transaction1.

:TransactionReport1 a prov:Entity;

prov:wasDerivedFrom :Transaction1.

:TransactionReport2 a prov:Entity;

prov:wasDerivedFrom :Transaction2.

:PostTradeReport a prov:Entity;

prov:generatedAtTime "2017-04-12T12:12:14";

prov:wasDerivedFrom :Transaction2;

prov:alternateOf :TransactieReport2.

Page 17: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

§ Two temporal axes to maintain the business valid and the system times

Bitemporal TimelinesPROVENANCE CONTENT

WHEN THE EVENT OCCURRED(Valid Time)

WHEN IT WAS RECORDED(System Time)

LAGNOV

19NOV

21

NOV

20

WHEN IT WAS RECORDED(System Time)

WHEN THE EVENT OCCURRED(Valid Time)

{ "transaction":

{

"system-start": "2014-11-19T11:00:00",

"valid-start": "2014-11-21T12:00:00",

"trader": "12XL9A",

"price": 12

}

}

{ "trader":

{

"system-start": "2014-11-19T11:00:00",

"valid-start": "2014-11-20T12:00:00",

"id": "12XL9A",

"name": "John"

}

}

Page 18: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 18 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

ContentPROVENANCE DIMENSIONS

Post Trade Report

Trade v1Trade v2

Transaction Report

wasDerivedFrom

alternateOf

wasDerivedFrom

wasRevisionOf

Murex

wasAttributedTo

Reporting

used

2017-04-24T12:12:12

generatedAtTimegenerated

Ingest

generated

wasInfluencedBy

2017-04-20T10:22:12

receivedAtTime

Feedback

wasInvalidatedBy

Transaction System

Software Agentvalue

Page 19: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 19 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

UnderstandingPROVENANCE USE

§ … what was the trading and reference data used to generate this transaction report ...

§ … why is there a difference between the transaction report and the post trade report …

§ … were there any changes in reference data at the time the correction was sent …

DATA STEWARD

DATA QUALITY

Page 20: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 20 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

CompliancePROVENANCE USE

§ …which department provided the trade data and when was the booking done …

§ … what transactions were not reported in the time required, and for what reasons …

§ … are any transactions that should have been reported for new versions of rules …

§ ... have traders complied with rules ... COMPLIANCE OFFICER

REGULATORY COMPLIANCE

BUSINESS DEPARTMENT

PENALTY

Page 21: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 21 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

DebuggingPROVENANCE USE

§ ... where did an error occur in a specific data field ...

§ … was the notification for the post trade publisher sent for that specific trade …

§ … what version of data extraction rules were used when the transaction report was created …

§ … what is the percent of reportable transactions from the daily volume …

IT OPERATIONS

OPERATIONAL DATA STORE

APPROVED PUBLICATION ARRANGEMENT

Page 22: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 22 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Trusting Data Sources§ Forward-Looking provenance

- Anticipating problems given provenance information from other systems

§ Analysis may find some sources/transformations/ETLs are troublesome

- ...sometimes in specific contexts, such as high load rates, etc.

§ Look for alternatives when designing future efforts

§ Target troublesome processes for future refactoring efforts

PROVENANCE USE

Page 23: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 23 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

PublicationPROVENANCE MANAGEMENT

AC

CES

SXML

RESOURCES

PROVENANCE

CO

NTE

NTLINK

SEARCH

BROWSE

rdf - SPARQL

html - HTTP

xml - HTTP

PROVENANCE URI

TARGET URI

§ Access

§ Locate

§ Query

HOW TO CONSUME PROVENANCE?

Page 24: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 24 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

DisseminationPROVENANCE MANAGEMENT

§ Security: secure HTTP should be used across unsecured networks; authentication should be enforced

§ Access control: provenance information should follow the same access control rules as the resources

§ Bundle: care is needed to ensure that the integrity of provenance is maintained

PRIVACY WALL

HTTPS

Provenance discovery

Provenance of provenance

Page 25: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 25 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

PROVENANCE DIMENSIONS

USE

CONTENTMANAGEMENT

PUBLICATION

ACCESS

QUERY

SEMANTIC

XML

DOCUMENT

UNDERSTANDING

DEBUGGING

COMPLIANCE

FORWARD-LOOKING

Page 26: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

SLIDE: 26 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

solid compliance architecture

ALL of your data and metadata

Complete track of data changes

Full query composability

Security, publishing, monitoring, etc.

Fewer tools and processes to manage

What makes it easy

DATA LINEAGE

Page 27: Effective Audit Trail of Data With PROV-O€¦ · Title: 2017_MLW_CHI_Effective Audit Trail with PROV-O_Henninger_FINAL_05142017 Created Date: 6/13/2017 4:09:05 PM

Questions?