big data analytics for transport
DESCRIPTION
TRANSCRIPT
DITEN - University of Genoa - Italy
www.smartlab.ws
(Big) Data Analytics and Intelligent Systems
(for Transport)
SmartLab
DITEN - University of Genoa - Italy www.smartlab.ws
University of Genoa Polytechnic School
2
Polytechnic School Established in 1870 – ~1000 students /year
Genuense Athenaeum Established in 1481 35000 students Italian Rank: 2nd (CENSIS 2010 -‐ among medium-‐large UniversiMes)
DITEN Dept. of InformaMon Technology, Electrical
and Naval Engineering
DITEN - University of Genoa - Italy www.smartlab.ws
SmartLab People
SMARTLAB 3
Prof. Sandro Ridella SmartLab ScienMfic Advisor
Prof. Davide Anguita SmartLab Coordinator
Dr. Alessandro Ghio Postdoc Research Assistant
Luca Ghelardoni Postdoc Research Assistant
Luca Oneto Ph.D. Student
Isah Abdullahi Lawal ICE Ph.D. Student
(with Univ. of London, UK)
Jorge Luis Reyes Or@z ICE Ph.D. Student
(with Univ. Politec. de Catalunya, Spain)
Giuseppe Ripepi Ph.D. Student
(now Postdoc @ CNR)
+ Master students in:
• Industrial Engineering
• Electronic Engineering
• Computer Engineering
• RoboMcs Engineering
Mehrnoosh Vahdat ICE Ph.D. Student (end of 2013)
DITEN - University of Genoa - Italy www.smartlab.ws
Teaching and training
• Master Course in Industrial Engineering (SV) – Business Intelligence
• Istituto Superiore di Studi in Tecnologie dell'Informazione e della Comunicazione – Business Intelligence & Analytics
• Master Course in Electronic Engineering – Computational Intelligence
• Corporate training
SMARTLAB 4
DITEN - University of Genoa - Italy www.smartlab.ws
(Big) Data Analytics
• Present – What can be done
• Past – What we have learned to do
• Future – What we intend to do
SMARTLAB 5
DITEN - University of Genoa - Italy www.smartlab.ws
(Big) Data Analytics
• Present – What can be done
• Past – What we have learned to do
• Future – What we intend to do
SMARTLAB 6
DITEN - University of Genoa - Italy www.smartlab.ws
7
Analytics: a process
AbstracMon
InformaMon storage
InducMon
DeducMon
AcMon
Learning from Data
DITEN - University of Genoa - Italy www.smartlab.ws
Big Data
8
Source: UC Berkeley School of Information
DITEN - University of Genoa - Italy www.smartlab.ws
9
(Big) Data
Servers Running Hadoop at Yahoo.com
DITEN - University of Genoa - Italy www.smartlab.ws
Big Data Analytics: V3
• Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue.
• Variety: IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more.
• Velocity: This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.
(Gartner – 2011)
10
DITEN - University of Genoa - Italy www.smartlab.ws
(Big) Data Analytics
11
Data storage / Data warehouse / OLAP
Visual AnalyMcs Data Mining Machine Learning …
11
DITEN - University of Genoa - Italy www.smartlab.ws
(Big) Data Analytics
• Present – What can be done
• Past – What we have learned to do
• Future – What we intend to do
SMARTLAB 12
DITEN - University of Genoa - Italy www.smartlab.ws
Real-time analytics
Ferrari 13
Fuel predicMon
Skid predicMon
DITEN - University of Genoa - Italy www.smartlab.ws
Fuel prediction - problem
Ferrari 14
-1
-0.5
0
0.5
1
0 2000 4000 6000 8000 10000 12000 14000
Fuel i_ssr2
© WikipediaProlific
KPIs: Fuel injectors current
DITEN - University of Genoa - Italy www.smartlab.ws
Fuel prediction - solution
Ferrari 15
Gaussian Kernel Support Vector Regressor with Cross-‐validated Model
SelecMon
DB
Offline
Online
DITEN - University of Genoa - Italy www.smartlab.ws
Fuel prediction - results
Ferrari 16
Brazil 06-‐Jun-‐03 Lap 21-‐28
OK
Alert
No fuel
DITEN - University of Genoa - Italy www.smartlab.ws
Skid prediction - problem
Ferrari 17
© Robert
KPIs: Acc_x, Acc_y, Speed © Brian Nelson
DITEN - University of Genoa - Italy www.smartlab.ws
Skid prediction - solution
Ferrari 18
Gaussian Kernel Support Vector Classifier with Cross-‐validated Model
SelecMon
DB
Offline
Skid No skid
Online
DITEN - University of Genoa - Italy www.smartlab.ws
Skid prediction - result
05/03/14 Prova 19
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 2000 4000 6000 8000 10000 12000
Analog outputReal target
M.Schumacher -‐ Fiorano
PredicMon
DITEN - University of Genoa - Italy www.smartlab.ws
SMARTLAB 20
Smart Waves
In cooperaMon with
MoMon predicMon for Landing Period Designator
DITEN - University of Genoa - Italy www.smartlab.ws
NeuroZenit
SMARTLAB 21
ForecasMng of urban traffic Part of Elsag Zenit system
In cooperaMon with
DITEN - University of Genoa - Italy www.smartlab.ws
SMARTLAB 22
Smart Bus
In cooperaMon with
Arrival Mme forecasMng for bus fleets Tests performed on ATM (Milan) bus #90
DITEN - University of Genoa - Italy www.smartlab.ws
SMARTLAB 23
Oracle Data Mining Suite Oracle 10g DM Suite – Beta tesMng
DITEN - University of Genoa - Italy www.smartlab.ws
SMARTLAB 24
EUNITE European Network on Intelligent Technologies
ISAAC Internet Smart Adaptive Algorithm
Computational Server
(2002 – 2004)
DITEN - University of Genoa - Italy www.smartlab.ws
… 2013…
SMARTLAB 25
(Grimilde) 4 x Xeon (8C) – 64 virtual cores – 128 GB Ram (Arla) 2 x Xeon (4C) – 16 virtual cores – 32 GB Ram
6TB NAS – Storage 1Gb/s Ethernet
DITEN - University of Genoa - Italy www.smartlab.ws
…2015
SMARTLAB 26
(IBM Cluster -‐ 256 nodes)
DITEN - University of Genoa - Italy www.smartlab.ws
Business Intelligence on Clouds
SMARTLAB 27
Courtesy: Salesforce.com In cooperaMon with:
DITEN - University of Genoa - Italy www.smartlab.ws
(Big) Data Analytics
• Present – What can be done
• Past – What we have learned to do
• Future – What we intend to do
SMARTLAB 28
DITEN - University of Genoa - Italy www.smartlab.ws
SMARTLAB 29
Analytics for Complex Data: Process Mining
In cooperaMon with:
Log file
Process descripMon
DITEN - University of Genoa - Italy www.smartlab.ws
BigData@SIIT: NoSQL DBs…
• Wide Column: Hadoop / Hbase; Cassandra; Hypertable; Accumulo; Amazon SimpleDB; Cloudata; Cloudera; HPCC; Stratosphere;
• Document Store: MongoDB; CouchDB; RavenDB; Clusterpoint Server; ThruDB; Terrastore; RaptorDB; JasDB; SisoDB; SDB; SchemaFreeDB; djondb;
• Key Value/ Tuple Store: DynamoDB; Azure Table Storage; Couchbase Server; Riak; Redis; LevelDB; Chordless; GenieDB; Scalaris; Tokyo Cabinet / Tyrant; Scalien; Berkeley DB; Voldemort; Dynomite; KAI; MemcacheDB; Faircom C-Tree; HamsterDB; STSdb; Tarantool/Box; Maxtable; RaptorDB; TIBCO Active Spaces; allegro-C; nessDB; HyperDex; Mnesia; LightCloud; Hibari; BangDB; OpenLDAP;
• Graph Databases: Neo4J; Infinite Graph; Sones; InfoGrid; HyperGraphDB; DEX; GraphBase; Trinity; AllegroGraph; BrightstarDB; Bigdata; Meronymy; OpenLink Virtuoso; VertexDB; FlockDB;
• Multimodel Databases: OrientDB; ArangoDB; AlchemyDB;
• Object Databases: db4o; Versant; Objectivity; Gemstone; Starcounter; Perst; ZODB; Magma; NEO; PicoLisp; siaqodb; Sterling; Morantex; EyeDB; HSS Database; FramerD; Ninja Database Pro; Ndatabase;
• …
30 Source: nosql-‐database.org
DITEN - University of Genoa - Italy www.smartlab.ws
BigData@SIIT - Condition Based Maintenance
SMARTLAB 31
© ERDMANN Sotware
DITEN - University of Genoa - Italy www.smartlab.ws
Advanced Data Analytics
• Hierarchichal Functionality – Descriptive Analytics
(what happened ?) Data fusion, correlation, association,…
– Predictive Analytics (what will happen ?) Modelling, forecasting,…
– Prescriptive Analytics (what should we do ?) Interpretation, optimization,…
32 FROM: Shit2Rail EC PPP
DITEN - University of Genoa - Italy www.smartlab.ws
Incremental Data Analytics
33
Time
Incremental Knowledge Building for Decision Support
FROM: Shit2Rail EC PPP
DITEN - University of Genoa - Italy www.smartlab.ws
Adaptive Data Analytics
• Domain adaptation
34
Knowledge transfer
FROM: Shit2Rail EC PPP
DITEN - University of Genoa - Italy www.smartlab.ws
Contract based knowledge exchange
35
Open Data
FROM: Shit2Rail EC PPP
DITEN - University of Genoa - Italy www.smartlab.ws
Open Linked Data
36
RDF: Resource DescripMon Framework format RDF query language: SPQRQL
DITEN - University of Genoa - Italy www.smartlab.ws
Open Data mashup (example)
37
DITEN - University of Genoa - Italy www.smartlab.ws
Open Data 1
38
DITEN - University of Genoa - Italy www.smartlab.ws
Connectivity and information sharing for intelligent mobility
Taken from hvp://whaMnspiresnick.files.wordpress.com/2011/09/urban-‐density-‐11.jpg
Boost of polluMon
CongesMon of people/freight
Urban congesMon costs approx. 8 B£/yr in the
UK
Life span of UK ciMzens living in large urban areas reduced by approx. 8 months
Source IBM
Human, Social, Envornmental, Economic (HSE2)
sustainability issues encompassed
Open data
On-‐field sensors
WWW
… CiMzen centric approach
Towards TAVA decision-‐making T iming A ccurate V aluable A cMonable
HSE2 KPIs
(Big) Data AnalyMcs engine
DITEN - University of Genoa - Italy www.smartlab.ws
Things simply do not work (yet..)
Marassi Stadium Lack of ability in
planning acMviMes by
contemplaMng heterogeneous
available informaMon
DITEN - University of Genoa - Italy www.smartlab.ws
Analytics Engine
!
DITEN - University of Genoa - Italy www.smartlab.ws
References
National Patents • D.Anguita, S.Pischiutta S.Ridella, D.Sterpi, Dispositivo per l'esecuzione della fase in avanti di un
classificatore automatico, (Device for the computation of the feed-forward phase of a classifier), N. 0001371367, Dep. 10/01/2006, 08/03/2010.
• D.Anguita, S.Ridella, D.Sterpi, Procedimento e sistema per la classificazione automatica multiclasse di dati di misura di una grandezza fisica, (Method and system for the automatic classification of multi-class data), N. 0001352198, Dep. 23/07/2004, 19/01/2009.
Selected publications • L.Ghelardoni, A.Ghio, D.Anguita, Energy Load Forecasting Using Empirical Mode Decomposition and
Support Vector Regression, IEEE Transactions on Smart Grids, Vol. 4, No. 1, pp. 549-556, 2013.
• L.Oneto, A.Ghio, D.Anguita, S.Ridella, An Improved Analysis of the Rademacher Data-dependent Bound Using Its Self-Bounding Property, Neural Networks, Vol. 44, No., pp. 107-111, 2013.
• D.Anguita, A.Ghio, L.Oneto, S.Ridella, In-Sample Model Selection for Trimmed Hinge Loss Support
Vector Machine, Neural Processing Letters, Vol. 36, No. 3, pp. 275-283, 2012. • D.Anguita, A.Ghio, L.Oneto, S.Ridella, In-Sample and Out-of-Sample Model Selection and Error
Estimation for Support Vector Machines, IEEE Trans. on Neural Networks and Learning Systems, Vol. 23, No. 9, pp. 1390-1406, 2012.
SMARTLAB 42
DITEN - University of Genoa - Italy www.smartlab.ws
Technology Transfer
SMARTLAB 43
Spin-‐off founded in February 2007:
10%: University of Genoa 10%: Researchers (University of Genoa) 60%: Industry partner (IsoSistemi S.r.l.) 20%: Private investors
Target market:
Steel Industry Intelligence BI & AnalyMcs
DITEN - University of Genoa - Italy www.smartlab.ws
Technology Transfer
SMARTLAB 44
Start-‐up founded in March 2013:
49%: Researchers (University of Genoa) 49%: Industry partner (Infinity S.p.A.) 2%: Private investors
In preparaMon: request for recogniMon as academic Spin-‐off
Target market:
Manufacturing Intelligence Real-‐Mme AnalyMcs Scheduling & Planning