big data, little data, whatever

15
Big Data, little data, whatever… Making the world a little smarter Matt Denesuk Manager, Natural Resources Modeling and Social Analytics, IBM Research Partner, IBM Venture Capital Group Launch of SPE Technical Section, Petroleum Data-Driven Analytics (PD 2 A), October 8, 2012

Upload: denesuk

Post on 14-Jul-2015

170 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Big data, little data, whatever

Big Data, little data, whatever…Making the world a little smarter

Matt Denesuk

Manager, Natural Resources Modeling and Social Analytics, IBM Research

Partner, IBM Venture Capital Group

Launch of SPE Technical Section, Petroleum

Data-Driven Analytics (PD2A), October 8, 2012

Page 2: Big data, little data, whatever

3 big things

• Physical-meets-Digital

• Data-driven approach

• Heterogeneity & integration (data &

approaches)

Page 3: Big data, little data, whatever

Physical-meets-digital is driving highly physical industries toward being more about moving & manipulating data.

INSTRUMENTEDmeters, sensors, actuators, IP enablement, ...

INTERCONNECTEDtransmitters, networks, taxonomies, ...

+

+=

3 key things:

Physical-meets-Digital,

Smarter Planet,

Cyber-physical systems, …transmitters, networks, taxonomies, ...

INTELLIGENTreporting, visualization, predictive analytics &

modeling, decision mgmnt, closed-loop

automation, ...

+=

Cyber-physical systems, …

Page 4: Big data, little data, whatever

Heavy, physical industries are increasingly infusing their operations

with information technology, and this will result in higher growth &

productivity trajectories. 2009 – 20102009

IT S

pe

nd

ing

/ R

eve

nu

e (

%)

A 0.5pt increase in IT spend ratio would drive

$31B in incremental IT spend.

Operating Margin (%)

IT S

pe

nd

ing

/ R

eve

nu

e (

%)

Industries where value is generated by moving and manipulating datahave high IT-spend ratios (and high productivity growth)

Page 5: Big data, little data, whatever

Data-driven approach

Page 6: Big data, little data, whatever

How Big the data are is just one factor…

Analytical

&/or Data

Complexity

Watson

Computer Chess

Customer

Data Size

Search Engines

Statistical Translation

Customer Churn

But bigger data sets let us use a whole new set of

“dumb” tools that can deliver high-value, with

remarkable speed.

Page 7: Big data, little data, whatever

Example: Google & Statistical Translation

• Employ language experts to codify

rules, exceptions, vocabulary

mappings, etc.

• Gather and classify lots of

translated docs (websites, UN,

books, …)

Regular Science approach Statistical (data-driven) approach

Use of language is infinitely

complex, but you can teach a

computer all the rules and

content.

People say the same kind of

things over and over. And

somebody has already

translated it.

mappings, etc.

• Apply transformation to user’s

query.

books, …)

• Identify & match patterns

• Map to user’s translation query.

• Costly, hard to scale

• Can translate nearly any statement

(but accuracy variable)

• In theory, could be better than

human.

• Incrementally low cost, highly

scalable.

• Limited in scope to digitized

docs that have been translated

before

• Limited by skill of human

translators

Page 8: Big data, little data, whatever

Heterogeneity & Integration

Page 9: Big data, little data, whatever

Two ways of seeing a data set (and the world)

• The data set is record of everything that happened, e.g.,

– All customer transactions last month

– All friendship links between members of social networking site

• Goal is to find interesting patterns, rules, and/or

associations.

Regular Scientist – “get the knowledge”

Computer Scientist – “get the knowledge locked in the data”

Regular Scientist – “get the knowledge”

(See D. Lambert, or R. Mahoney, e.g.)

• The data set is an partial, and often very noisy

reflection of some underlying phenomenon, e.g.,

– Emission spectra from stars

– Battery voltage varying with current, time, and temperature

• Goal is better understanding or ability to predict,

often through a mathematical model

But the approaches & skill sets can

be joined…

Page 10: Big data, little data, whatever

Examples of hybrid, integrated approaches

• Simple, well-defined rules, but computationally impossible to solve (today)

• Relies on position evaluation function.– Use human-derived chess theory to set up initially.

– But tune by comparing to the best games humans have played.

• Better than any human (1997)

• Issues– Saturation, fatigue, psychology, …

Computer Chess

• People’s opinions reflected in many digitized forms

• Articles, blogs, social media, playlists, …

• “Big Data” search & transform capabilities can generate

buzz metrics (“ink”, sentiment, category, …)

• BUT WHAT DO WITH THEM? � Need to apply traditional,

small-data modeling approaches.

• Examples

• Pre-launch promotion management for albums

• Movie trailer management

Buzz & the CMO

Page 11: Big data, little data, whatever

Hybrid example: “equipment health” models driving operational

optimization

Oil & Gas Scenario

� Gas compressor showing signs of trouble

3 months before a scheduled turnaround.

� The system indicates that lowering

pressure by 20% will extend health

enough to make it to turnaround.

–But then production levels will not be

sufficient to fulfill scheduled shipment.

11

sufficient to fulfill scheduled shipment.

� The system identifies that another

platform can be run for 30 days at 115%

throughput without significant risk before

its next scheduled turnaround.

� Coordinated actions taken, and $40M

production loss avoided.

Page 12: Big data, little data, whatever

Trying to combine 3 different kinds of modeling

• Data-driven / Machine-learning

– Early days, often not enough data

– Bias � limited region of parameter spaces explored (by

management design)

• Knowledge-based

– Rule capture, experience

Initial use to generate hypotheses for other approaches. – Initial use to generate hypotheses for other approaches.

• Physics-based

– Difficult to scale

– Use for seed models

– Locked-up in OEMs?

12

Also simulation, for what-if

analyses, and verificationSee Peng et al.

Page 13: Big data, little data, whatever

Example: Condition-based Management

Multiple sensor data streams

Outcomes

Environmental data

Higher-order

“Events” &

measures

Probabilistic Models / Rule Mining

Actionable Rules, measures, & options

Management system• Maintenance optimization

• Use / output optimization

• Energy / comfort / safety balancing

Physical Models

Example process:

Text data

Image data

13

Broad range of applications.

Bridges

Water Infrastructure

Railroads

Aircraft

Mining Equipment

Oil Pipelines

Oil Platforms

Steel manufacture

TruckingMobile

ComputersIT Infrastructure

Heavy Infrastructure Business Equipment /

Consumer Products

Human Health?

Home AppliancesBuildings

(HVAC, Elevators, Lighting, …)

Photocopiers

Refrigeration

Page 14: Big data, little data, whatever

Business value requires both Modeling and Process

Integration

• Many organization not used

to making data-driven

decisions.

– Culturally

– Process-wise

• Mathematical proof of

business value not initially

Pro

cess

Inte

gra

tio

n

1. Integration pilot & evaluation.

2. Deploy/scale

Capability & value growth

business value not initially

compelling

• Example: CbM & false

positives.

• Initial deployment very

risky!

14

Modeling & Analytics

Pro

cess

Inte

gra

tio

n

Models developed & tested

2. Deploy/scale

14

Page 15: Big data, little data, whatever

Key points

• Physical-meets-Digital is happening

• This makes data-driven approaches much more important

• But most real problems require integration of • But most real problems require integration of very different approaches and data types– Not easy to build these teams

• The realities of current culture & process must be addressed early.