turn gdpr’s accountability principles into an added-value for your business by andy petrella at...

35

Upload: big-data-spain

Post on 17-Mar-2018

452 views

Category:

Technology


0 download

TRANSCRIPT

www.kensu.io

DATA SCIENCE GOVERNANCE

1

Turn GDPR’s accountability principles into an added-valuefor your business

Big Data Spain, 2017

www.kensu.io 2

- CEO & Founder - Mathematics

Computer Science

ANDY PETRELLA

KENSU & ME

Started in Belgium by building en enterprise stack for Data Scientists (Agile Data Science Toolkit)

Pivot on internal component: Data Science Catalog

Focus on Data Science Governance

Accelerated by Alchemist Accelerator in San Francisco and The Faktory in Belgium

Kensu Inc. in October!

Spark Notebook O’Reilly Training

www.kensu.io

TOPICS

1. Some thoughts on “Data Science”

2. Data Science Governance: What

3. Data Science Governance: How

4. GDPR: Accountability principle and transparency

5. Example Governing Spark

3

www.kensu.io

SOME THOUGHTS ON “DATA SCIENCE”

4

www.kensu.io

MACHINE LEARNING

Pioneers in 1950s

AI Winter in 1970s due pessimism

Resurgence in 1980s

Machine Learning (and related) is used since the 1990s (esp. SVM and RNN)

Deep learning see widespread commercial use in 2000s

Machine learning receives great publicity (read: buzz) in 2010s

5ref: https://en.wikipedia.org/wiki/Timeline_of_machine_learning

www.kensu.io

DATA SCIENCE: +ENGINEERING

Claim: “Data Scientist” coined by DJ Patil in 2008.

Pretty much where Machine Learning was part of Softwares

In a way, when we added “engineering” to the mix

Also, engineering is even more prominent with Big Data Distributed Computing

6

www.kensu.io

DATA SCIENCE: +EXPERIMENTATION

So much data available

So many tools, libraries, frameworks, …

So many things we can try

We have distributed computing now, right? => Let’s try everything

Discover new insights (and potentially new businesses)

7

www.kensu.io

DATA SCIENCE: RECAP

Maths: stats, machine learning and so on

Engineering: ETL, Databases, Computing framework, Softwares, Platforms, …

Creativity: “From business intelligence To intelligent business” - Michael Fergusson

Data Science is an umbrella on top of all activities on data 8

www.kensu.io

DON’T BELIEVE ME?

9https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

www.kensu.io

DATA SCIENCE GOVERNANCE: WHAT

10

www.kensu.io

DATA PIPELINE

Data pipeline is connecting activities on data, potentially involving several technologies.

A pipeline is generally thought as an End-to-End processing line to solve one problem.

But, part of pipelines are reused to save computation, storage, time, …

Thus interdependency between pipeline segments grows with initiatives

11

www.kensu.io

GOAL: TAKE DECISION

Data Pipelines, connected together, aren’t created for the beauty of it.

The ultimate goal is always to take decisions.

Decisions are generally taken or linked to humans with responsibilities.(even for self driving cars, in case of problem)

Given that pipelines are cut­and­wired, interleaved, … 

How not to be anxious at deploying the last piece used by the decision maker

12

www.kensu.io

SOURCES OF ANXIETY

What if:

• one of the data used in the process has different patterns suddenly?

• one of the tools, projects or similar is modified upstream?

• the insights are deviating from the reality?

• …

13

www.kensu.io

DEBUGGING?

To reduce the anxiety or, actually, reducing the risks, we need ways to debug.

In pure engineering, we have unit, function, integrations tests,… but

How do we do when the problems come from the data themselves?

We can’t generate all cases of data variations, right?

How to debug? Without the big picture, we may try to optimise a model for weeks for nothing

14

www.kensu.io

DATA SCIENCE GOVERNANCE

Data governance: controls that data meets precise standards and involves monitoring against production data.

Data Science Governance: control that data activity meets precise standards and involves monitoring against production data activity.

A Data Activity is described by at least technologies, users, systems, data, processing

15

www.kensu.io

GOVERNING DATA SCIENCE

Who does what on which data and where it is done?

What is the impact of a process on the global system?

What are the performance metrics (quality, execution,…) of the processes?

16

www.kensu.io

CONTINUOUS INTEGRATION FOR DATA SCIENCE

Data Scientists/Citizens have a view on all the activities applied to the original sources used in his/her own process.

They also have a control on their own results in production

They have the opportunity to analyse and debug a pipeline involving all activities:

• independently of the technologies

• involving several people in the enterprise17

www.kensu.io

DATA SCIENCE GOVERNANCE: HOW

18

www.kensu.io

CHALLENGES

So many tools are using data!

The number of processing is growing impressively.

We have to take care of the legacy…

19

www.kensu.io

GET THE DATA

As usual, we have to collect the right data to take right decision.

First run an assessment to create a high level map of all the tools involved into a company.

For each tool, do whatever it takes to collect information about the activities it is creating.

Information are metadata, lineage, statistics, accuracy measures, …

20

www.kensu.io

CONNECT THE DATA

Data Science Governance needs the global picture.

To do that we need to connect all data that can be collected.

So that, it is possible to create a cartography of all on-going processes.

This map tracks all data and their descendants21

www.kensu.io

USE THE DATA

This is where the fun part starts… the map of data activities is an amazing source of information

Here are a few things you can think of when using this kind of data:• impact analysis• dependency analysis• optimisation• recommendation

22

www.kensu.io

GDPR

23

General Data Protection Regulation

www.kensu.io

ACCOUNTABILITY PRINCIPLE

Implement  appropriate  technical  and  organisational  measures  that ensure  and  demonstrate  that  you  comply.  This  may  include  internal data  protection  policies  such  as  staff  training,  internal  audits  of processing activities, and reviews of internal HR policies.

24

www.kensu.io

TRANSPARENCY

As  well  as  your  obligation  to  provide  comprehensive,  clear  and transparent  privacy  policies,  if  your  organisation  has  more  than  250 employees,  you  must  maintain  additional  internal  records  of  your processing activities.

25

www.kensu.io

ACCOUNTABILITY: DATA SCIENCE GOVERNANCE

To govern data science, we have to:

• collect activities

• connect activities

With this information we can reliably create automatically the process registry

26

www.kensu.io

TRANSPARENCY: DATA SCIENCE GOVERNANCE

To govern data science seen as a continuous integration solution: we have to explain and measure activities independently of the technologies.

With this information we can reliably create transparent reports of activities across the whole chain of processing

27

www.kensu.io

CONSEQUENCES

28

Connect data and business

Spoiler attack: one-line ahead

www.kensu.io

DATA TO BUSINESS

29

Business KPIs are nothing but data!

www.kensu.io

BUSINESS TO DATA

30

Change the business to match the data

ADAPT!

www.kensu.io

KENSU

Taking the idea further

31

www.kensu.io 32

SOLUTION: DATA SCIENCE ON DATA SCIENCE

Data:              Oracle

Activity: Tensorflow

(*)

collect activities metadata (*)

performance optimisations

Data Science

Governance

CompliancePerformance

www.kensu.io

OUR PRODUCT: KENSU DATA ACTIVITY MANAGEMENT

33

Data Science Governance

First Governance, Compliance and Performance solution for Data science

Feature Benefit Why it matters

Connect.Collect.Learn

Automatically  captures  all  data science  relevant activities  related  to governance,  compliance  and performance within a given domain. 

Provided  end­to­end  control  and insights  into  all  relevant  aspects  of data  science  related  activities#GDPR

DPO Dashboard One­stop  control  center  for  all potential data privacy violations

Near­realtime  notifications  and actionable  intelligence  current  state of “compliance health”#GDPR

Compliance Reporting One­click  reports  for  all  relevant governance and compliance reports

Guarantee for good relationship with authorities  in  charge  by  respecting their templates#GDPR

www.kensu.io

DATA SCIENCE GOVERNANCE

Andy Petrella

CEO Co Founder

@noootsab

@kensuio