#abbyysummit15 (1/10): the future of information, the explosion of unstructured data
TRANSCRIPT
© 2015 Experian Information Solutions, Inc. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Information Solutions, Inc. Other product and company names mentioned herein are the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian.
The Future of Information Kevin Chen Chief Data Scientist Experian DataLabs, North America
4 © 2015 Experian Information Solutions, Inc. All rights reserved.
People go to
but NOT
Like? _________________
5 © 2015 Experian Information Solutions, Inc. All rights reserved.
§ 800%+ growth in data volume within next 5 years
§ Amount of unstructured data is growing 62% faster
§ 80% of data will be unstructured data in 2019
Explosions of Unstructured Data
Organizations have little awareness of the volume, composition, risk and business value of their unstructured data (Gartner)
6 © 2015 Experian Information Solutions, Inc. All rights reserved.
Big Data Analytics
Information is the oil of the 21st century, and analytics is the combustion engine – Peter Soundergaard
7 © 2015 Experian Information Solutions, Inc. All rights reserved.
Living in a World of Unstructured Data
Structured data: § Well-studied
§ Columnar + Relational
§ Interval/categorical/ordinal
Unstructured data: § Diverse types, many inputs
§ Text, audio, image, video, metadata, health records, etc.
§ Need to be able to search, compare, understand, and prediction
8 © 2015 Experian Information Solutions, Inc. All rights reserved.
Key Questions: § How do we capture and understand
unstructured data?
§ How do we represent unstructured data (words, sentences, phrases, concepts, objects) and use it in predictive modeling?
§ What are the applications?
Challenges and Opportunities
9 © 2015 Experian Information Solutions, Inc. All rights reserved.
Google, Microsoft, Bidu, Stanford, Berkley, CMU & UCLA all reported significant progress in automatic image captioning using deep learning in 2014
Machine Learning on Unstructured Data Automatic Image Captioning
10 © 2015 Experian Information Solutions, Inc. All rights reserved.
Transforming big data
Experian Breathes big data § 19 credit information and 13 business
information bureaus § Credit data on 600 million consumers &
60 million businesses § Demographic data on 260+ million
households § Online behavior data for 25 million
users across 5 million websites
11 © 2015 Experian Information Solutions, Inc. All rights reserved.
Experian DataLabs Understanding consumer behavior through big data analytics
Debit/Credit Card Transaction Data
Social Media Data
Mobile/Geolocation Information
Business Entity Data
Credit Bureau Data
Online Behavior
12 © 2015 Experian Information Solutions, Inc. All rights reserved.
Understanding Consumer Transactions Using Machine Learning
Structured & Unstructured Machine Learning Algorithms
Behavior Profiles & Lifestyle Segments
Merchant Name Merchant Location
Product / SKU Purchased $ Amount
Transaction Time Card Present?
14 © 2015 Experian Information Solutions, Inc. All rights reserved.
Applications
Opportunity
§ Offer card best reflecting customer spend choices
Targeting the right card for a customer based on lifestyle segmentation promotes spend and deeper customer loyalty
Current card
15 © 2015 Experian Information Solutions, Inc. All rights reserved.
Future of Hadoop - Spark
From Hadoop Stack to Spark Stack
• Comprehensive support for ETL, SQL, Machine Learning, Graphs, and Streaming
• Faster with in-memory calculation • Easier to use with flexible APIs such as join, union, intersection, etc. • Tight integration with Python and R • Extensive machine-learning/data-mining libraries
16 © 2015 Experian Information Solutions, Inc. All rights reserved.
The Future of Information Is Here
Data Scientists
Structured & Unstructured Data
Big Data Analytics Platform
Machine Learning