data science strategies for reai-time analytics · “real time data analytics for the resilient...
TRANSCRIPT
Principal Data Scientist
Booz | Allen | Hamilton
Kirk Borne @KirkDBorne
Data Science Strategies
for ReaI-Time Analytics
http://www.boozallen.com/datascience https://careers.boozallen.com/en-US/search?keywords=data
http://www.mif.pg.gda.pl/homepages/jasiu/stud/EiM/pdf/22-ind-zastosowania.pdf
2
3
4
5
Can our electric grid
be more resilient?
“Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience?
6
1. The capacity to recover quickly (bounce
back) from difficulties; toughness.
2. The ability of a substance or object to
spring back into shape; elasticity. http://www.starservice.org.uk/curriculumpage.php?subject=57 http://signsofpolitics.blogspot.com/2009/03/around-and-about-resilience.html
“Real Time Data Analytics for the Resilient Electric Grid” So, what is Resilience?
7
1. The capacity to recover quickly (bounce
back) from difficulties; toughness.
2. The ability of a substance or object to
spring back into shape; elasticity.
e.g., Resilient Communities have the sustained
ability to utilize available resources to respond to,
withstand, and recover from adverse situations =
= resources + data + analytics = insights + decisions!
“Enhancing Resilience in Infrastructure”
8
https://my.vanderbilt.edu/universityfundingprograms/2017/02/enhancing-safety-and-resilience-of-civil-infrastructure-through-interdisciplinary-research-vanderbilts-iris-initiative/
9
Smart Electric Grid Use Cases • Spatiotemporal insight
• Situational / Context awareness
• Fast diagnosis & response
• Anomaly / Fraud / Loss detection
• Predictive maintenance
• Digital twins / Prescriptive action
• System performance optimization
• Resiliency
• Load balancing
• Predictive demand forecasting
• Real-time pricing
• New products
• Customer ‘nudge’
• Targeted offerings
• Smart contracts (Blockchain)
• Regulatory compliance
http://smartgridcenter.tamu.edu/sgc/web/wp-content/uploads/2016/03/grey-border-BigData_illustration_v5.jpg
https://www.slideshare.net/ImpetusInfo/realtime-streaming-analytics-business-value-use-cases-and-architectural-considerations-impetus-webinar
10
Emerging and Disruptive Digital Technologies in the Energy Industry:
This is what Digital Disruption looks like!
http://www.leadingpractice.com/industry-standards/energy/oil-gas/
11
Emerging and Disruptive Digital Technologies in the Energy Industry:
This is what Digital Disruption looks like!
http://www.leadingpractice.com/industry-standards/energy/oil-gas/
The Data Science Revolution = Moving from data to insight to action!
12
Data Science enables the art of the possible :
The easy button for real-time analytics!
Manage the
Digital Disruption
with a
Data Science
Strategy
13
Data needs a Transformer (like Electricity)
to make it accessible to all.
Massive data collections unlock deeper insights into hard problems and complex systems
…Be careful what you wish for!!!!
14
15
Adding more data doesn’t necessarily help…
https://paulmead.com.au/blog/understand-perceptions/
Unless we can combine and integrate the different signals
into a “single view” of the thing, there will continue to be
many possible interpretations of what the source is!
Combining, connecting, and linking diverse data makes data “smart”!
Think of data not as information, but as facts that encode knowledge.
Environmental Analytics example
16
Transforming Data to Information to Knowledge to Understanding
16
Environmental Analytics example
17
17
1) Class Discovery: Finding the categories of objects (population segments), events, and behaviors in your data. + Learning the rules that constrain the class boundaries (that uniquely distinguish them).
2) Correlation (Predictive and Prescriptive Power) Discovery: Finding trends, patterns, dependencies in data, which reveal the governing principles or behavioral patterns (the object’s “DNA”).
3) Novelty (Surprise!) Discovery:
Finding new, rare, one-in-a-[million / billion / trillion] objects, events, or behaviors.
4) Association (or Link) Discovery: (Graph and Network Analytics) – Finding the unexpected, (unusual ) co-occurring associations / links / connections among the entities in your domain.
4 Types of Discovery from Data Science:
What is your data analytics use case?
18
(Graphic by S. G. Djorgovski, Caltech)
5 Levels of Analytics Maturity
in Data-Driven Applications
1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
19
5 Levels of Analytics Maturity
in Data-Driven Applications
1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
4) Prescriptive Analytics
– Insight (How can we optimize what
happens?) (Follow the dots / connections in
the graph!)
5) Cognitive Analytics – Right Sight (the 360 view , what is the
right question to ask for this set of data in
this context = Game of Jeopardy)
– Finds the right insight, the right action, the
right decision,… right now!
– Moves beyond simply providing answers, to
generating new questions and hypotheses.
20
3 Examples of Analytics
1) Descriptive
2) Predictive
3) Cognitive
21
3 Examples of Analytics
1) Descriptive
2) Predictive
3) Cognitive
22
All of the features in the data histogram convey valuable (actionable) information (the long tail, outliers, multi-modal peaks, …)
0
2000
4000
6000
8000
10000
12000
14000
-8 -6 -4 -2 0 2 4 6 8
23
Mixture Models = Statistical Clustering
Each of these data histograms can be represented by the mixture (i.e., sum) of several Gaussian normal distributions, such as the 3 Gaussian distributions shown in the lower right.
Each Gaussian statistically represents (characterizes) one “cluster” of data values within the full set of data values.
Comprehensive web resource for Mixture Models for clustering and unsupervised learning in Data Mining: http://www.csse.monash.edu.au/~dld/mixture.modelling.page.html
24
Statistical Clustering tags (characterizes) the data, enabling discovery: making the data “smart”!
25
Each Gaussian in the mixture can be characterized by various parameters, such as the mean, variance (standard deviation), and amplitude (i.e., the strength of that particular Gaussian component within the mixture).
These parameters can be plotted as a function of some independent (treatment) variable, to discover trends and correlations in the effects across the different segments of the population. h
ttp
s://
ww
w.r
esea
rch
gate
.net
/pu
blic
atio
n/6
20
022
4_C
on
form
atio
nal
_en
tro
py_
in_m
ole
cula
r_re
cogn
itio
n_
by_
pro
tein
s
25
3 Examples of Analytics
1) Descriptive
2) Predictive
3) Cognitive
26
Classic Textbook Example of Data Mining (Legend?): Data
mining of grocery store logs indicated that men who buy
diapers also tend to buy beer at the same time.
Association Discovery Example #1
27
Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many of {one particular product}
compared to everything else.
Association Discovery Example #2
28
Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many strawberry pop tarts compared
to everything else.
Association Discovery Example #2
29
Strawberry pop tarts???
http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html http://www.hurricaneville.com/pop_tarts.html
http://bit.ly/1gHZddA 30
Association Rule Discovery for Hurricane Intensification Forecasting
• Research by GMU geoscientists
• Predict the final strength of hurricane at landfall.
• Find co-occurrence of final hurricane strength with specific values of measured physical properties of the hurricane while it is still over the ocean.
• Result: the association rule discovery prediction is better than National Hurricane Center prediction!
• Research Paper by GMU scientists: https://ams.confex.com/ams/pdfpapers/84949.pdf
31
3 Examples of Analytics
1) Descriptive
2) Predictive
3) Cognitive
32
“You can see a lot by just looking”
(and you can see around corners!)
Cognitive, Contextual, Insightful, Forecastful
33 https://www.speedcafe.com/2017/07/12/f1-demo-take-place-london-streets/
Final Thoughts
34
1) Design Patterns for Streaming Data Analytics: • Detecting POI (Pattern, Product, Process, Person, or any Point Of Interest) • Detecting BOI (Behavior Of Interest from any “dynamic actor”) • Precomputed scenarios and their responses (to speed up “best action”) • Design Thinking : UX, CX, EX (User / Customer / Employee eXperience)
2) Edge Analytics (move the algorithms to the sensor: intelligence at the
point of data collection) • Locality in Time
3) Near-field Analytics (what else is local to my asset?)
• Locality in Geospace
4) Related-entity Analytics (what else is similar to this event / entity?)
• Locality in Feature Space
5) Agile Analytics • DataOps • Culture of Experimentation • Fail-fast / Learn-fast • Build and deploy Learning Systems / Resilient Systems
Data Science Strategies for Real-time Analytics
35
In the Big Data era, Everything is Quantified and Monitored : – Populations & Persons – Smart Cities, Energy, Grids, Farms, Highways – Environmental Sensors – IoE = Internet of Everything!
Discovery through Machine Learning and Data Science:
– Class Discovery, Correlation Discovery, Novelty Discovery, and
– Association Discovery: Find interesting cases where condition X is associated with event Y with time shift Z.
17 SDGs are KPIs for the World!
(currently, the SDGs have 229
Key Performance Indicators) ( SDG: Sustainability Development Goal )
Big Data + the IoT + Citizen Data Scientists =
= Partners in Sustainability
The Internet of Things (IoT): Knowing the knowable via deep, wide, and fast data from ubiquitous sensors!
Big Data:
Sustainability Development Goals
http://www.unglobalpulse.org
36
Thank you! Contact information, for further questions or inquiries:
Dr. Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
Twitter: @KirkDBorne or Email: [email protected]
Get slides here: http://www.kirkborne.net/Portland2018/
37
Booz | Allen | Hamilton