big data, little data, whatever
TRANSCRIPT
Big Data, little data, whatever…Making the world a little smarter
Matt Denesuk
Manager, Natural Resources Modeling and Social Analytics, IBM Research
Partner, IBM Venture Capital Group
Launch of SPE Technical Section, Petroleum
Data-Driven Analytics (PD2A), October 8, 2012
3 big things
• Physical-meets-Digital
• Data-driven approach
• Heterogeneity & integration (data &
approaches)
Physical-meets-digital is driving highly physical industries toward being more about moving & manipulating data.
INSTRUMENTEDmeters, sensors, actuators, IP enablement, ...
INTERCONNECTEDtransmitters, networks, taxonomies, ...
+
+=
3 key things:
Physical-meets-Digital,
Smarter Planet,
Cyber-physical systems, …transmitters, networks, taxonomies, ...
INTELLIGENTreporting, visualization, predictive analytics &
modeling, decision mgmnt, closed-loop
automation, ...
+=
Cyber-physical systems, …
Heavy, physical industries are increasingly infusing their operations
with information technology, and this will result in higher growth &
productivity trajectories. 2009 – 20102009
IT S
pe
nd
ing
/ R
eve
nu
e (
%)
A 0.5pt increase in IT spend ratio would drive
$31B in incremental IT spend.
Operating Margin (%)
IT S
pe
nd
ing
/ R
eve
nu
e (
%)
Industries where value is generated by moving and manipulating datahave high IT-spend ratios (and high productivity growth)
Data-driven approach
How Big the data are is just one factor…
Analytical
&/or Data
Complexity
Watson
Computer Chess
Customer
Data Size
Search Engines
Statistical Translation
Customer Churn
But bigger data sets let us use a whole new set of
“dumb” tools that can deliver high-value, with
remarkable speed.
Example: Google & Statistical Translation
• Employ language experts to codify
rules, exceptions, vocabulary
mappings, etc.
• Gather and classify lots of
translated docs (websites, UN,
books, …)
Regular Science approach Statistical (data-driven) approach
Use of language is infinitely
complex, but you can teach a
computer all the rules and
content.
People say the same kind of
things over and over. And
somebody has already
translated it.
mappings, etc.
• Apply transformation to user’s
query.
books, …)
• Identify & match patterns
• Map to user’s translation query.
• Costly, hard to scale
• Can translate nearly any statement
(but accuracy variable)
• In theory, could be better than
human.
• Incrementally low cost, highly
scalable.
• Limited in scope to digitized
docs that have been translated
before
• Limited by skill of human
translators
Heterogeneity & Integration
Two ways of seeing a data set (and the world)
• The data set is record of everything that happened, e.g.,
– All customer transactions last month
– All friendship links between members of social networking site
• Goal is to find interesting patterns, rules, and/or
associations.
Regular Scientist – “get the knowledge”
Computer Scientist – “get the knowledge locked in the data”
Regular Scientist – “get the knowledge”
(See D. Lambert, or R. Mahoney, e.g.)
• The data set is an partial, and often very noisy
reflection of some underlying phenomenon, e.g.,
– Emission spectra from stars
– Battery voltage varying with current, time, and temperature
• Goal is better understanding or ability to predict,
often through a mathematical model
But the approaches & skill sets can
be joined…
Examples of hybrid, integrated approaches
• Simple, well-defined rules, but computationally impossible to solve (today)
• Relies on position evaluation function.– Use human-derived chess theory to set up initially.
– But tune by comparing to the best games humans have played.
• Better than any human (1997)
• Issues– Saturation, fatigue, psychology, …
Computer Chess
• People’s opinions reflected in many digitized forms
• Articles, blogs, social media, playlists, …
• “Big Data” search & transform capabilities can generate
buzz metrics (“ink”, sentiment, category, …)
• BUT WHAT DO WITH THEM? � Need to apply traditional,
small-data modeling approaches.
• Examples
• Pre-launch promotion management for albums
• Movie trailer management
Buzz & the CMO
Hybrid example: “equipment health” models driving operational
optimization
Oil & Gas Scenario
� Gas compressor showing signs of trouble
3 months before a scheduled turnaround.
� The system indicates that lowering
pressure by 20% will extend health
enough to make it to turnaround.
–But then production levels will not be
sufficient to fulfill scheduled shipment.
11
sufficient to fulfill scheduled shipment.
� The system identifies that another
platform can be run for 30 days at 115%
throughput without significant risk before
its next scheduled turnaround.
� Coordinated actions taken, and $40M
production loss avoided.
Trying to combine 3 different kinds of modeling
• Data-driven / Machine-learning
– Early days, often not enough data
– Bias � limited region of parameter spaces explored (by
management design)
• Knowledge-based
– Rule capture, experience
Initial use to generate hypotheses for other approaches. – Initial use to generate hypotheses for other approaches.
• Physics-based
– Difficult to scale
– Use for seed models
– Locked-up in OEMs?
12
Also simulation, for what-if
analyses, and verificationSee Peng et al.
Example: Condition-based Management
Multiple sensor data streams
Outcomes
Environmental data
Higher-order
“Events” &
measures
Probabilistic Models / Rule Mining
Actionable Rules, measures, & options
Management system• Maintenance optimization
• Use / output optimization
• Energy / comfort / safety balancing
Physical Models
Example process:
Text data
Image data
13
Broad range of applications.
Bridges
Water Infrastructure
Railroads
Aircraft
Mining Equipment
Oil Pipelines
Oil Platforms
Steel manufacture
TruckingMobile
ComputersIT Infrastructure
Heavy Infrastructure Business Equipment /
Consumer Products
Human Health?
Home AppliancesBuildings
(HVAC, Elevators, Lighting, …)
Photocopiers
Refrigeration
Business value requires both Modeling and Process
Integration
• Many organization not used
to making data-driven
decisions.
– Culturally
– Process-wise
• Mathematical proof of
business value not initially
Pro
cess
Inte
gra
tio
n
1. Integration pilot & evaluation.
2. Deploy/scale
Capability & value growth
business value not initially
compelling
• Example: CbM & false
positives.
• Initial deployment very
risky!
14
Modeling & Analytics
Pro
cess
Inte
gra
tio
n
Models developed & tested
2. Deploy/scale
14
Key points
• Physical-meets-Digital is happening
• This makes data-driven approaches much more important
• But most real problems require integration of • But most real problems require integration of very different approaches and data types– Not easy to build these teams
• The realities of current culture & process must be addressed early.