Download - Dark data
![Page 1: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/1.jpg)
Amir Sedighi
February 2017
Dark DataRisks and Opportunities
@amirsedighi
![Page 2: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/2.jpg)
Speaker
Amir Sedighi
Software EngineerData Solutions Architect Founder at recommender.ir
twitter: @amirsedighi
![Page 3: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/3.jpg)
By even the most conservative estimates, the amount of data in the world doubles every two years.
Data Era
![Page 4: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/4.jpg)
May Venn Diagram helps us!
Big Data
![Page 5: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/5.jpg)
May Venn Diagram helps us!
Tabular/ Relational/ RDBMS Data
Big Data
![Page 6: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/6.jpg)
May Venn Diagram helps us!
Dark DataTabular/ Relational/ RDBMS Data
Big Data
![Page 7: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/7.jpg)
May Venn Diagram helps us!
Dark DataTabular/ Relational/ RDBMS Data
(Structured/Unstructured)
(Almost Unstructured)
(Structured)
Big Data
![Page 8: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/8.jpg)
May Venn Diagram helps us!
Dark DataTabular/ Relational/ RDBMS Data
(Structured/Unstructured)
(Almost Unstructured)
(Structured)
Big Data
Almost can’t be processed or analyzed
![Page 9: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/9.jpg)
Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).
Dark Data Definition by Gartner
![Page 10: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/10.jpg)
Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets.
Dark Data Definition by Gartner
![Page 11: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/11.jpg)
Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.
Dark Data Definition by Gartner
![Page 12: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/12.jpg)
Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.
Dark Data Definition by Gartner
![Page 13: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/13.jpg)
Dark Data - A more Sensible Definition
![Page 14: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/14.jpg)
Dark Data - A more Sensible Definition
Organizations Generate and Gather Data
![Page 15: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/15.jpg)
Dark Data - A more Sensible Definition
Organizations Generate and Gather Data
A large portion of the collected data are never even analyzed!
![Page 16: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/16.jpg)
Dark Data - A more Sensible Definition
Organizations Generate and Gather Data
A large portion of the collected data are never even analyzed!
90% of the data are never analyzed
![Page 17: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/17.jpg)
Dark Data - A more Sensible Definition
Organizations Generate and Gather Data
A large portion of the collected data are never even analyzed!
90% of the data are never analysed.
• Customer Information• Log Files• Previous Employee Information• Previous Webpages• Sensor Data• Email Correspondences• Account Information• Notes or Presentations• Old Versions of Relevant
Documents
![Page 18: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/18.jpg)
80%..90% is Dark Data
![Page 19: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/19.jpg)
Does Your Org have any Dark Data?
I am just going to check if we have any dark data in the cellar…
![Page 20: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/20.jpg)
Brining Dark Data into Light
1. Gathering
2. Storing/Processing
3. Analyzing and Bringing it into decisions
![Page 21: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/21.jpg)
Brining Dark Data into Light
![Page 22: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/22.jpg)
Brining Dark Data into Light
![Page 23: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/23.jpg)
Brining Dark Data into Light
![Page 24: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/24.jpg)
Brining Dark Data into Light
![Page 25: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/25.jpg)
Brining Dark Data into Light
![Page 26: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/26.jpg)
Brining Dark Data into Light
![Page 27: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/27.jpg)
Question
All companies know data is going to provide value.
Why there is so much of dark data?
![Page 28: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/28.jpg)
Why there is so much of dark data?
• Lack of insight about data • Lack of ambitions to improve • Disconnect among departments • Lopsided priorities • Lack of technologies to Capture and Store • Lack of resources/infrastructures to make it available • Lack of CPU and technics to analyze the data
![Page 29: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/29.jpg)
The issues you face with Dark Data
• Legal and Regulatory Issues • Loss of Reputation • Intelligence Risk • Operation Costs • Opportunity Costs
![Page 30: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/30.jpg)
Some essential questions
• What can we gather? • What may we extract from it? • How we may prune it? • How long should we keep it? • What are the storage options? • What are the processing options? • How much is the value of each block of data
(Approximately) • Running limited boundary scenarios
![Page 31: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/31.jpg)
Software Tools & Frameworks on DD
![Page 32: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/32.jpg)
Software Tools & Frameworks on DD
![Page 33: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/33.jpg)
Software Tools & Frameworks on DD
Log Management
![Page 34: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/34.jpg)
Software Tools & Frameworks on DD
Indexing and Search
![Page 35: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/35.jpg)
Software Tools & Frameworks on DD
Data Streaming
![Page 36: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/36.jpg)
Software Tools & Frameworks on DD
![Page 37: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/37.jpg)
Software Tools & Frameworks on DD
![Page 38: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/38.jpg)
Software Tools & Frameworks on DD
Machine Learning and Graph Processing
• Mahout • MLLib • FlinkMK • Theano • Torch • TensorFlow • GraphX • Gelly
![Page 39: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/39.jpg)
A common Pipeline
Machine Learning
Steam Processing
QueryAlready Processed Data
Real World RT Events
![Page 40: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/40.jpg)
A common Pipeline
Machine Learning
Steam Processing
QueryAlready Processed Data
Real World RT EventsNew Pipeline
![Page 41: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/41.jpg)
Questions?
Keep in touch: twitter: @amirsedighi
![Page 42: Dark data](https://reader031.vdocuments.net/reader031/viewer/2022022200/58ac00e91a28abb6718b58cd/html5/thumbnails/42.jpg)
1. http://www.gartner.com/it-glossary/dark-data/
2. http://www.itproportal.com/2016/03/07/5-benefits-of-putting-dark-data-to-work/
3. http://www.kdnuggets.com/2015/11/importance-dark-data-big-data-world.html
4. https://www.youtube.com/watch?v=_fBMmQo-Z4E
5. http://confluent.io
6. https://www.ecmconnection.com/doc/the-various-shades-of-dark-data-0001
7. https://www.datanami.com/2015/11/30/spark-streaming-what-is-it-and-whos-using-it/
References