dark data

42
Amir Sedighi February 2017 Dark Data Risks and Opportunities @amirsedighi

Upload: amir-sedighi

Post on 21-Feb-2017

114 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Dark data

Amir Sedighi

February 2017

Dark DataRisks and Opportunities

@amirsedighi

Page 2: Dark data

Speaker

Amir Sedighi

Software EngineerData Solutions Architect Founder at recommender.ir

twitter: @amirsedighi

Page 3: Dark data

By even the most conservative estimates, the amount of data in the world doubles every two years.

Data Era

Page 4: Dark data

May Venn Diagram helps us!

Big Data

Page 5: Dark data

May Venn Diagram helps us!

Tabular/ Relational/ RDBMS Data

Big Data

Page 6: Dark data

May Venn Diagram helps us!

Dark DataTabular/ Relational/ RDBMS Data

Big Data

Page 7: Dark data

May Venn Diagram helps us!

Dark DataTabular/ Relational/ RDBMS Data

(Structured/Unstructured)

(Almost Unstructured)

(Structured)

Big Data

Page 8: Dark data

May Venn Diagram helps us!

Dark DataTabular/ Relational/ RDBMS Data

(Structured/Unstructured)

(Almost Unstructured)

(Structured)

Big Data

Almost can’t be processed or analyzed

Page 9: Dark data

Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).

Dark Data Definition by Gartner

Page 10: Dark data

Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets.

Dark Data Definition by Gartner

Page 11: Dark data

Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

Dark Data Definition by Gartner

Page 12: Dark data

Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

Dark Data Definition by Gartner

Page 13: Dark data

Dark Data - A more Sensible Definition

Page 14: Dark data

Dark Data - A more Sensible Definition

Organizations Generate and Gather Data

Page 15: Dark data

Dark Data - A more Sensible Definition

Organizations Generate and Gather Data

A large portion of the collected data are never even analyzed!

Page 16: Dark data

Dark Data - A more Sensible Definition

Organizations Generate and Gather Data

A large portion of the collected data are never even analyzed!

90% of the data are never analyzed

Page 17: Dark data

Dark Data - A more Sensible Definition

Organizations Generate and Gather Data

A large portion of the collected data are never even analyzed!

90% of the data are never analysed.

• Customer Information• Log Files• Previous Employee Information• Previous Webpages• Sensor Data• Email Correspondences• Account Information• Notes or Presentations• Old Versions of Relevant

Documents

Page 18: Dark data

80%..90% is Dark Data

Page 19: Dark data

Does Your Org have any Dark Data?

I am just going to check if we have any dark data in the cellar…

Page 20: Dark data

Brining Dark Data into Light

1. Gathering

2. Storing/Processing

3. Analyzing and Bringing it into decisions

Page 21: Dark data

Brining Dark Data into Light

Page 22: Dark data

Brining Dark Data into Light

Page 23: Dark data

Brining Dark Data into Light

Page 24: Dark data

Brining Dark Data into Light

Page 25: Dark data

Brining Dark Data into Light

Page 26: Dark data

Brining Dark Data into Light

Page 27: Dark data

Question

All companies know data is going to provide value.

Why there is so much of dark data?

Page 28: Dark data

Why there is so much of dark data?

• Lack of insight about data • Lack of ambitions to improve • Disconnect among departments • Lopsided priorities • Lack of technologies to Capture and Store • Lack of resources/infrastructures to make it available • Lack of CPU and technics to analyze the data

Page 29: Dark data

The issues you face with Dark Data

• Legal and Regulatory Issues • Loss of Reputation • Intelligence Risk • Operation Costs • Opportunity Costs

Page 30: Dark data

Some essential questions

• What can we gather? • What may we extract from it? • How we may prune it? • How long should we keep it? • What are the storage options? • What are the processing options? • How much is the value of each block of data

(Approximately) • Running limited boundary scenarios

Page 31: Dark data

Software Tools & Frameworks on DD

Page 32: Dark data

Software Tools & Frameworks on DD

Page 33: Dark data

Software Tools & Frameworks on DD

Log Management

Page 34: Dark data

Software Tools & Frameworks on DD

Indexing and Search

Page 35: Dark data

Software Tools & Frameworks on DD

Data Streaming

Page 36: Dark data

Software Tools & Frameworks on DD

Page 37: Dark data

Software Tools & Frameworks on DD

Page 38: Dark data

Software Tools & Frameworks on DD

Machine Learning and Graph Processing

• Mahout • MLLib • FlinkMK • Theano • Torch • TensorFlow • GraphX • Gelly

Page 39: Dark data

A common Pipeline

Machine Learning

Steam Processing

QueryAlready Processed Data

Real World RT Events

Page 40: Dark data

A common Pipeline

Machine Learning

Steam Processing

QueryAlready Processed Data

Real World RT EventsNew Pipeline

Page 41: Dark data

Questions?

Keep in touch: twitter: @amirsedighi

Page 42: Dark data

1. http://www.gartner.com/it-glossary/dark-data/

2. http://www.itproportal.com/2016/03/07/5-benefits-of-putting-dark-data-to-work/

3. http://www.kdnuggets.com/2015/11/importance-dark-data-big-data-world.html

4. https://www.youtube.com/watch?v=_fBMmQo-Z4E

5. http://confluent.io

6. https://www.ecmconnection.com/doc/the-various-shades-of-dark-data-0001

7. https://www.datanami.com/2015/11/30/spark-streaming-what-is-it-and-whos-using-it/

References