© 2012 IBM Corporation1
Enterprise Intelligence
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
Email: [email protected]: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
© 2012 IBM Corporation2
My Background
Early 80‟s: Founded Systems Research & Development (SRD), a custom software consultancy
Personally designed and deployed +/- 100 systems, a number of which contained multi-billions of transactions describing 100‟s of millions of entities
1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA)
2001: Funded by In-Q-Tel, the venture capital arm of the CIA
2005: IBM acquires SRD
Today: Primarily focused on „sensemaking on streams‟ with special attention towards privacy and civil liberties protections
© 2012 IBM Corporation3
Time
Com
puti
ng P
ower
Gro
wth
Sensemaking Algorithms
Available Observation
Space
Context
Trend: Organizations Are Getting Dumber
EnterpriseAmnesia
Every two days now we create as much information as we did from the dawn of civilization up until 2003.”
~ Eric Schmidt, CEO Google
© 2012 IBM Corporation4
Amnesia, definition
A defect in memory, especially resulting from brain damage.
© 2012 IBM Corporation5
Enterprise Amnesia, definition
A defect in memory, resulting in wasted resources, lower revenues, unnecessary fraud losses, etc.
© 2012 IBM Corporation6
Time
Sensemaking Algorithms
Available Observation
Space
ContextWHY?
Trend: Organizations Are Getting DumberC
ompu
ting
Pow
er
Gro
wth
© 2012 IBM Corporation7
Algorithms at Dead End.
You Can‟t Squeeze Knowledge
Out of a Pixel.
© 2012 IBM Corporation9
Context, definition
Better understanding something by taking into account the things around it.
© 2012 IBM Corporation10
Information in Context … and Accumulating
Top 200Customer
Job Applicant
IdentityThief
CriminalInvestigation
© 2012 IBM Corporation11
The Puzzle Metaphor
Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors
What it represents is unknown – there is no picture on hand
Is it one puzzle, 15 puzzles, or 1,500 different puzzles?
Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted
Some pieces may even be professionally fabricated lies
Until you take the pieces to the table and attempt assembly, you don‟t know what you are dealing with
© 2012 IBM Corporation12
Puzzling
Cottage Garden
© 2010 Royce B. McClure,
Artist All Rights Reserved
© 2010 Ravensburger USA,
Inc.
Down Home Music
© Kay Lamb Shannon,
Artist
Licensed by Cypress Fine
Art Licensing
© 2011 Ravensburger USA
Inc.
Neuschwanstein Beauty
© 2009 Photo Copyright
Robert Cushman Hayes
© 2009 Ravensburger USA,
Inc.
Vegas
Artwork provided by
Hadley House Licensing,
Minneapolis
© 2011 Giesla Hoelscher
All Rights Reserved
© 2011 Ravensburger USA,
Inc.
270 pieces
90%200 pieces
66%
150 pieces
50%
6 pieces2%
30 pieces10% (duplicates)
© 2012 IBM Corporation13
© 2012 IBM Corporation14
© 2012 IBM Corporation15
First Discovery
© 2012 IBM Corporation16
More Data Finds Data
© 2012 IBM Corporation17
Duplicates in Front Of Your Eyes
© 2012 IBM Corporation18
First Duplicate Found Here
© 2012 IBM Corporation19
© 2012 IBM Corporation20
Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
37min “Looks like a bunch of hillbillies ona porch.”
44min “Hillbillies, playing guitars, sittingon a porch, near a barber sign …and a banjo!”
© 2012 IBM Corporation21
150 pieces
50%
© 2012 IBM Corporation22
Incremental Context – Incremental Discovery
47min “We should take the sky and grassoff the table.”
2hr “Let‟s switch sides, and see if wecan make sense of this fromdifferent perspectives.”
2hr10m “Wait, there are three … no, fourpuzzles.”
2hr17m “We need a bigger table.”
2hr18m “I think you threw in a few randompieces.”
© 2012 IBM Corporation23
© 2012 IBM Corporation24
How Context Accumulates
With each new observation … one of three assertions are made: 1) Un-associated; 2) placed near like neighbors; or 3) connected
Must favor the false negative
New observations sometimes reverse earlier assertions
Some observations produce novel discovery
As the working space expands, computational effort increases
Given sufficient observations, there can come a tipping point
Thereafter, confidence improves while computational effort decreases!
© 2012 IBM Corporation25
Big Data [in context]. New Physics.
More data: better the predictions– Lower false positives
– Lower false negatives
More data: bad data good– Suddenly glad your data is not perfect
More data: less compute
© 2012 IBM Corporation26
Big Data
Pile of ____ In Context
© 2012 IBM Corporation27
One Form of Context: “Expert Counting”
Is it 5 people each with 1 account … or is it 1 person with 5 accounts?
Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times?
If one cannot count … one cannot estimate vector or velocity (direction and speed).
Without vector and velocity … prediction is nearly impossible.
© 2012 IBM Corporation28
Entity ResolutionDemonstration
© 2012 IBM Corporation29
VOTERGeorge F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DECEASED PERSONGeorge Balston
YOB: 1951 SSN: 5598
DOD: 1995
Entity Resolution Demonstration
When it comes to best practices in voter matching, if only a name and year of birth match, this is insufficient proof of a match. Many
different people in the U.S. share a name and year of birth.
Human review is required.
Unfortunately, there are thousands and thousands of cases just like this and state election offices don‟t have the staff (or budget) to
manually review such volumes.
© 2012 IBM Corporation30
VOTERGeorge F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DECEASED PERSONGeorge Balston
YOB: 1951 SSN: 5598
DOD: 1995
Now Consider This Tertiary DMV Record
DMVGeorge F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
The DMV record contains enough features to match both the voter (name, year of birth and driver‟s license) and/or the deceased persons record (name, year of birth and SSN). For the sake of argument, let‟s
say it matches the voter best.
© 2012 IBM Corporation31
VOTERGeorge F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DMVGeorge F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
DECEASED PERSONGeorge Balston
YOB: 1951 SSN: 5598
DOD: 1995
Features Accumulate
The voter/DMV record now shares a name, year of birth and SSN with the deceased person record. In voter matching best practices, this evidence would be sufficient to make a determination that this voter
is in fact deceased. This case no longer needs human review.
© 2012 IBM Corporation32
VOTERGeorge F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DMVGeorge F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
DECEASED PERSONGeorge Balston
YOB: 1951 SSN: 5598
DOD: 1995
Useful Insight Revealed!
As features accumulate it becomes possible to resolve previous un-resolvable identity
records.
As events and transactions accumulate –
detection of relevance improves.
Here we can see George who died in 1995 voted in
2008.
© 2012 IBM Corporation33
IBM InfoSphere Identity Insight V8
© 2012 IBM Corporation34
MoneyGram International
© 2012 IBM Corporation35
Enterprise IntelligenceOne Plausible Journey
Enterprise IntelligenceOne Plausible Journey
© 2012 IBM Corporation36
ObservationSpace
Sense and Respond
What you know
New Observations
© 2012 IBM Corporation37
ObservationSpace
Decide
?Relevance
Finds the Sensor(<200ms)
Data Finds Data
Sense and Respond
© 2012 IBM Corporation38
Explore and Reflect
ObservationSpace
Decide
?
DirectedAttention
Relevance Find You
DeepReflection
CuratedData
PatternDiscovery
RelevanceFinds the Sensor
(<200ms)
Data Finds Data
Sense and Respond
© 2012 IBM Corporation39
ObservationSpace
Decide
?
DirectedAttention
NEWINTERESTS
DeepReflection
CuratedData
PatternDiscovery
RelevanceFinds the Sensor
(<200ms)
Data Finds Data
Explore and ReflectSense and Respond
© 2012 IBM Corporation40
ObservationSpace
Decide
?
DeepReflection
CuratedData
PatternDiscovery
RelevanceFinds the Sensor
(<200ms)
Data Finds Data
InfoSphere StreamsILog
NetezzaSPSS
Watson
DirectedAttention
Cognos
Explore and ReflectSense and Respond
InfoSphere Streams
NEWINTERESTS
SPSSSensemaking
© 2012 IBM Corporation41
ObservationSpace
Decide
?
DirectedAttention
NEWINTERESTS
DeepReflection
CuratedData
PatternDiscovery
RelevanceFinds the Sensor
(<200ms)
Data Finds Data
Report and Manage
Explore and ReflectSense and Respond
© 2012 IBM Corporation42
Decide
?
DirectedAttention
NEWINTERESTS
PatternDiscovery
RelevanceFinds the Sensor
(<200ms)
Data Finds Data
Info Management Systems
Content ManagementCase ManagementData Warehousing
Report and Manage
© 2012 IBM Corporation43
Big Data Trends
© 2012 IBM Corporation44
Val
ue o
f D
ata
The Greater the Context, the Greater the Value
Pile of Data
Records Managed(Big) (Ludicrous Big)
Data in Context
© 2012 IBM Corporation45
Willing
ness
to
Wai
tThe better the
predictions … the faster they will be
wanted.
“Why did we have to wait until the
end of the day for the smart answer?”
Time Is Of The Essence
Relevance (Iffy) (Totally)
Day
Hour
200ms
Batch
Real-Time
© 2012 IBM Corporation46
Closing Thoughts
© 2012 IBM Corporation47
The most competitive organizations
are going to make sense of what they are observing
fast enough to do something about it
while they are observing it.
© 2012 IBM Corporation48
Time
Sensemaking Algorithms
Available Observation
Space
Context
Wish This On The Competitor
EnterpriseAmnesia
Com
puti
ng P
ower
Gro
wth
© 2012 IBM Corporation49
Time
The Way Forward: Enterprise Intelligence
Sensemaking Algorithms
Available Observation
Space
Context
Com
puti
ng P
ower
Gro
wth
© 2012 IBM Corporation50
Related Blog Posts
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Puzzling: How Observations Are Accumulated Into Context
On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others
G2 | Sensemaking – One Year Birthday Today. Cognitive Basics Emerging.
© 2012 IBM Corporation51
Email: [email protected]
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Questions?
© 2012 IBM Corporation52
Enterprise Intelligence
Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics
Email: [email protected]: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
© 2012 IBM Corporation53
Sensemaking on StreamsMy G2 Secret Little IBM Project
3+ years in the making
© 2012 IBM Corporation54
G2 Mission Statement
1) Evaluate each new observation against previous observations.
2) Determine if what is being observed is relevant.
3) Delivering this actionable insight to its consumer … fast enough to do something about it while it is still happening.
4) Doing this with sufficient accuracy and scale to really matter.
© 2012 IBM Corporation55
From Pixels to Pictures to Action
Observations
Data Finds Data
PersistentContext
Relevance Finds You
Consumer(An analyst, a system, the sensor itself, etc.)
This is G2
© 2012 IBM Corporation56
Uniquely G2
More scalable, faster and extensible– Designed for grid compute and sub-200ms sense and respond
Smarter– Tolerance for disagreement (no such thing as a single version of truth)
– Support for more abstract entities (e.g., locations, products, asteroids)
– Support for more exotic features (e.g., biometrics, social circles)
Crazy stuff– Detects on its own when it is confused and makes “note to self”
– Geospatial reasoning including a sense of here and now
Privacy by Design (PbD) – More privacy and civil liberties enhancing features baked-in than any other
commercial technology
© 2012 IBM Corporation57
PbD: Self-Correcting False Positives
Which reveals this is a FALSE POSITIVE
John T Smith Jr123 Main Street
703 111-2000DOB: 03/12/1984
John T Smith123 Main Street
703 111-2000DL: 009900991
A plausible claim these two people are the same
1
2 John T Smith Sr123 Main Street
703 111-2000DL: 009900991
Until this record comes into view
3
© 2012 IBM Corporation58
PbD: Self-Correcting False Positives
John T Smith Jr123 Main Street
703 111-2000DOB: 03/12/1984
John T Smith123 Main Street
703 111-2000DL: 009900991
John T Smith Sr123 Main Street
703 111-2000DL: 009900991
New Best Practice:FIXED IN REAL-TIME
(not end of month)
John T Smith123 Main Street
703 111-2000DL: 009900991
1
3
2
2
© 2012 IBM Corporation59
Customer Facing Systems
Data Mining
Back-of-House Accounting Systems
Fraud
This System That System
Sensemaking