industry of things world - berlin 19-09-16
TRANSCRIPT
Impact of IoT analytics on the development budget
Dr. Boris Adryan @BorisAdryan
Industry of Things World, Berlin, 19th September 2016
Dr. Boris Adryan• with Zühlke Engineering since September 2016
• longstanding IoT enthusiast • Founder of thingslearn Ltd. • Board Member & Strategic Advisor for Pycom
(microcontrollers), BioSelf (biosensors) and OpenSensors (IoT platform)
• before: research group leader for data analytics and machine learning at University of Cambridge, England.
@BorisAdryan
I disagree with the notion that data is the new oil. It’s as infinite as the sun, and just like the power of the sun, we’re barely using it at the moment.
Mike Gualtieri, Forrester Research
“
”
5V of Big Data
Velocity
Veracity
Volume
Variety
Value
“doesn’t fit on my local drive”
“process deals with hundreds of events
per second”
“wouldn’t even know how to save this in a
RDBMS”
“actionable insight”
“not sure how current, valid or complete it is”
It’s worth to look at the actual data problem before hiring a ‘big data specialist’ or buying an ‘analytics solution’.
IoT = Big DataSensor devices produce large and small data.
You may not immediately know how to deal with them - but that doesn’t automatically make them ‘big data’.
39% of survey participants are worried about the cost of an industrial IoT solution.
“Why aren’t you doing IoT?”
Hardware is often perceived as investment that customers understand and therefore anticipate.
This talk is about unfounded IoT fears.
There’s an air of magic around data and analytics. This leads to fear of: • having to hire specialists
(for both data plumbing and analytics) • having to buy expensive services • losing control over the process due to a lack
of understanding
data
You want actionable insight.
data
data
here be dragons!
whatever you do in your vertical
✓better ✓ faster ✓cheaper
insight
“magic”
how to deal and what to do with the data
✓small (fits on your drive) ✓you know exactly what you’re looking for
not a ‘data problem’ ask your programmer
✓ large (think data centre-scale) ✓you know exactly what you’re looking for
potentially ‘big data’
ask your sysadmin, then your programmer
Do you need to employ a specialist?
data
data
“My data problem must be special!”
✓ unstructured data
✓ distributed ingestion and storage
My company went to an IoT conference
& all I got was this t-shirt
and a bunch of buzzwords.
Customers fear costs because they’re facing:
Or they believe from hear-say that IoT automatically requires:
✓ real-time analytics
✓ sophisticated machine learning
“I receive U NsT Ruc Tur data!”De
RDBMS
name age
Boris 40
name city job
Boris Fra… IoT
key-value DBs
name: Boris age: 40 city: Frankfurt
name: Boris job: IoT / data science
name: Ilka age: 39 name: Ilka
city: Frankfurt job: pharma R&D
SQL-ish syntax
not a ‘big data’ nor a ‘cloud’ problemNoSQL DBs run on commodity hardware
thing thing thing
time
thing thing thing
thing
thing
thing
thing
thing
thing
thing
broker
broker
broker
broker storage
storage
storage
even standard cloud offerings can do distributed ingestion and storage very well
“I got too many things!”
not a big data ‘problem’
Your apps & corporate design
Your products and analytical
services
Your devices
Adapting a PaaS to your needs.
Security
I/O / broker fast storage
device management
gatewayportal & user
management
basic analytics
Zühlke IoT Platform
standard components (still, tedious to configure)
your USP
data
You want actionable insight.
data
data
here be dragons!
whatever you do in your vertical
✓better ✓ faster ✓cheaper
insight
“magic”
how to deal and what to do with the data
Basic data plumbing and storage is usually not the issue.
The message is that there are known knowns. There are things we know that we know. There are known unknowns. That is to say there are things that we now know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.
Donald Rumsfeld ex US Secretary of Defense
“
”
✓small or large
✓you don’t know what to connect or how to find it (the “known unknowns”)
✓you want to increase operational awareness (the “unknown unknowns”)
a ‘data science problem’
We can help to establish a machine learning pipeline to extract relevant information automatically.
data
data
data
data
datadatadata
data
datadata
Do you need to employ a specialist?
you may just need a one-off solution
unsupervised learning - get an overview what’s in your data set
supervised learning - teach the machine to classify data on the basis of some previous training
statistical analysis - find rules and outliers on the basis of numerical data
What is machine learning?
?
y
4 n n 0
2 n n 1
4 y y 4
6 y y 9
6 y y 2
skates bike car bus lorry
whe
els
mot
or
win
dow
s se
ats
very relevant for predictive maintenance etc.
dataweather forecast
airport location
# of gates
# of runways
# of snowploughs
airline
aircraft
BLACK BOX
trainingflights cancelled in
the past
classifier
ranked list of relevant features
weight of features
thresholds for features
performance metric
prediction
new data
How does classification work?
training
classifier
performance assessment
good enough?
success!
mor
e da
ta fo
r tra
inin
g
data
noyes
Is this reliable?se
nsiti
vity
“t
rue
posi
tives
”
1-specificity “false positives”
0 0.2 0.4 0.6 0.8 1.0
1.0
0.8
0.6
0.4
0.2
worse than random guess
data
Where is your classifier located?
data
data
here be dragons!
whatever you do in your vertical
insight
“magic”
model building training operation performance tracking
on device, cloud or mobile app
} R & D
}
✓better ✓ faster ✓cheaper
“Do I need real-time analytics?”
microseconds to seconds
seconds to minutes
minutes to hours
hours to weeks
on device
on stream
in batch
am I falling? counteract
battery level should I land?
how many times did I
stall?
what’s the best weather for
flying?
in process
in database
operational insight
performance insight
strategic insight
e.g. Kalman filter
e.g. with machine learning
e.g. rules engine
e.g. summary stats
Edge, fog and cloud computing
Edge Pro: - immediate compression from raw
data to actionable information - cuts down traffic - fast response
Con: - loses potentially valuable raw data - developing analytics on embedded
systems requires specialists - compute costs valuable battery life
Cloud Pro: - compute power - scalability - familiarity for developers - integration center across
all data
Con: - traffic
Fog Pro: - same as Edge - closer to ‘normal’ development work - gateways often mains-powered
Con: - loses potentially valuable raw data
Some of our examples for real-time analytics
Choosing the appropriate method and toolset on every level.
Options for cloud-based real-time analytics
some features can cost a bit, especially when you don’t really know what you’re doing and want to ‘try it out’.
a badly configured SMACK stack on your own commodity hardware can be slow and unreliable
your pre-trained classifier
My current pet hate: Deep Learning
Deep learning has delivered impressive results mimicking human reasoning, strategic thinking and creativity.
At the same time, big players have released libraries such that even ‘script kiddies’ can apply deep learning.
It’s already leading to unreflected use of deep learning when other methods would be more appropriate.
Dr. Boris Adryan @BorisAdryan
‣ Super-fast analytics and state-of-the-art methods are not automatically the most useful solution.
‣ A good understanding on the type of insight that is required by the business model is essential.
‣ There are many solutions readily available that might enable IoT projects very cost-effectively.
Zühlke can advise on your options around IoT and data analytics, and provide complete solutions where needed.
Industry of Things World, Berlin, 19th September 2016
Summary