new trends and directions in data science - mit information quality conference - july 19th 2013
DESCRIPTION
Panel I hosted at MIT for the 7th Information Quality Conference in July 2013, with J.Andrew Rogers (SpaceCurve) and Matt Piekarczyk (Cortix Systems)TRANSCRIPT
![Page 1: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/1.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
Moderator : Mario Faria
July 19th , 2013
July 17, 2012
![Page 2: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/2.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
• J.Andrew Rogers (SpaceCurve) • Ma? Piekarczyk (CorDx Systems)
Panelists
![Page 3: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/3.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Format
• Mario’s introduc9on on the subject • Each panelist will have 20 minutes to present a point of view
• Mario will ask a few ques9ons • Panelists will debate among each other or answer ques9ons from the audience
![Page 4: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/4.jpg)
Data Science
The process of taking raw data, producing informa9on from data, and using this informa9on to guide ac9ons that will bring financial benefits to business
![Page 5: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/5.jpg)
Quality is mandatory for Data Science to
work
![Page 6: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/6.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Where we stand today
• Fragmented ecosystem • Over usage of the Big Data term • The “how to compete on analy9cs” is s9ll hard to achieve
• In the majority of companies, data is s9ll managed with an IT mind set
![Page 7: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/7.jpg)
Mario Faria
7
The Big Data Fragmented Tech Vendors data life cycle process view
![Page 8: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/8.jpg)
Mario Faria
8
![Page 9: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/9.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 10: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/10.jpg)
Mario Faria
10
![Page 11: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/11.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
J.Andrew Rogers Founder and CTO
SpaceCurve
![Page 12: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/12.jpg)
www.spacecurve.com
© 2013 SpaceCurve, Inc. All rights reserved. 12
Five Big Data Trends and Directions In Data Science
J. Andrew Rogers Founder & CTO
July 18, 2013
![Page 13: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/13.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 13
The Evolution Of Data Science
§ 1st Generation
– An organization’s structured data
– Example: OLAP / Data Warehouse
§ 2nd Generation
– An organization’s unstructured data
– Example: Hadoop / MapReduce
§ 3rd Generation
– Real-time context and actionability of an organization’s data
– Example: SpaceCurve
![Page 14: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/14.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 14
Capturing and Fusing In-Motion Data
§ Monetization of data-in-motion – Satellites, smartphones, sensor, social media, spatial, radar, …
§ Real-time processing and fusing § Immediate insights from multiple layers of data in motion and
historical data at once § Immersive intelligence with real-time location analysis
![Page 15: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/15.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 15
Trend #1. Use of diverse data sources for better situational awareness
§ Proliferation of inexpensive sensors create new possibilities
– Imagery and video: satellite, UAV, coincidental
– GPS-tagged entities and entity motion vectors
– Sensor networks, RF, radar
§ Many challenges
– Integration and fusion of unrelated data sources
– Domain expertise required to use data effectively
– Standardization of data representation
![Page 16: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/16.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 16
Trend #2. Leveraging machine-generated data to increase model quality
§ Machines continuously make measurements of reality
– Sensor networks e.g. imaging, radar, GPS tracking, RF, seismic
– Operational sensors on machines e.g. automotive and aircraft
– Computer network activity and audit logs
§ Challenge is extreme data generation rates
– Few big data platforms designed for continuous data ingest
– Computers and sensors are not constrained by human biology
![Page 17: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/17.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 17
Real-world scenario: Hurricane Sandy
![Page 18: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/18.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 18
Trend #3. Real-time data ingestion concurrent with analysis (“round-trip real-time”)
§ Minimizing latency from new data availability to updated analytic models and actionable intelligence is a multi-faceted advantage
– Leverage highly perishable contextual data before it expires
– Identify operational risks as soon as they manifest in the data
– Continuously evolve models to reflect operational environment
§ Challenges for traditional data science platforms
– Moving from batch to on-line or near-line analytical models
– Minimizing data movement in analytical processes
– Scaling out analytic query performance with online updates
![Page 19: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/19.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 19
Trend #4. Space and time relationships for data fusion and deeper insights
§ Space and time are primary keys of reality
– Entities and events can be localized at a point in time
– Robust method for fusing unrelated slow and fast moving data
– Interactions and movement over time can be modeled as graphs
§ Powerful and unique analytical capability
– Correlation of data by time and space relationships
– Relationship discovery by analyzing unrelated entity vectors
– Anomaly detection using vector analysis
![Page 20: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/20.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 20
Real-world scenario: Correlating entities on social media with flight data
![Page 21: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/21.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 21
Trend #5. Layering many data sources for data quality and immersive intelligence
§ Understanding the full context in which events occur for maximum model fidelity
§ Reinforce signal and cancel out noise by overlaying different measurements of the same event
– Fill in incomplete or missing data from single data sources
– Corroborate similar data sources against each other to detect errors and fraud
– Corroborate a fact analytically from dissimilar data sources
– Identify subtle semantic and representation differences across data sets
![Page 22: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/22.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 22
New Big Data capabilities needed to meet future market requirements
![Page 23: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/23.jpg)
© 2013 SpaceCurve, Inc. All rights reserved. 23
Delivering immediately actionable intelligence
![Page 24: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/24.jpg)
www.spacecurve.com
© 2013 SpaceCurve, Inc. All rights reserved. 24
Thank You!
J. Andrew Rogers Office: +1 206.453.2236 Email: [email protected] Twitter: @jandrewrogers
For More Information, Please Contact:
![Page 25: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/25.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
Ma] Piekarczyk President
Cor9x Systems
![Page 27: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/27.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 28: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/28.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 29: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/29.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
17 hrs /week spent gathering and fusing data
![Page 30: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/30.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
80% Effort 1/3 Cost 11% Integrated
![Page 31: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/31.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
0
1
2
3
4
5
1 201 401 601 801
x 100000
Fundamental Law
![Page 32: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/32.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Parse Clean Map Find
Use
![Page 33: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/33.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 34: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/34.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 35: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/35.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 36: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/36.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 37: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/37.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 38: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/38.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
There is a better way
![Page 39: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/39.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Learn Learn Learn Learn
Use Share
![Page 40: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/40.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Learning solu9ons
![Page 41: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/41.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Custom dynamic fused data go
Data is the platform
![Page 42: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/42.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
![Page 43: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/43.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Cost
Focus
Underpowered High Risk
![Page 44: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/44.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Cost
Focus
Optimize Resource Allocation and Focus
![Page 45: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013](https://reader034.vdocuments.net/reader034/viewer/2022051818/54bd7df84a7959975b8b461f/html5/thumbnails/45.jpg)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
• Mario Faria (Moderator) • J.Andrew Rogers (SpaceCurve) • Ma? Piekarczyk (CorDx Systems)
The Debate