management and analysis of large scale heterogeneous time-series data

13
Management and Analysis of Large Scale Heterogeneous Time-Series Data Sensor and Government Data: Their Role in Public Policy Martin Litzenberger Safety and Security Department AIT Austrian Institute of Technology Martin Litzenberger | Senior Engineer | DSS SNI

Upload: danube-university-krems-centre-for-e-governance

Post on 07-May-2015

581 views

Category:

Government & Nonprofit


3 download

DESCRIPTION

Copyright Martin Litzenberger at CeDEM14

TRANSCRIPT

Page 1: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Management and Analysis of Large Scale Heterogeneous Time-Series Data Sensor and Government Data: Their Role in Public Policy

Martin Litzenberger

Safety and Security Department

AIT Austrian Institute of Technology

Martin Litzenberger | Senior Engineer | DSS SNI

Page 2: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Motivation

� A plethora of heterogeneous data are collected by public institutions with various sensors today

� But the data and their use are (usually) restricted to the domain or departments they belong to, e.g.

� security surveillance, traffic, public transport, air quality, power grids, ...

� Reasons: Lack of interoperability and often lack of communication and cooperation of data owners

223.05.2014

Page 3: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Advantages

� Connecting these data or even collecting them on a common platform would allow for new ways of analysis and insight into important and interesting mechanisms (e.g. traffic / air quality)

� But data are heterogeneous in many aspects such as: format,

update frequency, representation, owner, accessibility .. which

makes a joint analysis a big challenge

� Real-time 24/7 processing and availability, not a “one-time”academic investigation!

323.05.2014

Page 4: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Challenge: Heterogeneity of Data

� Temporal heterogeneity

� Discrete events versus regular time series

� Spatial heterogeneity

� „On-site“ versus „as near as possible“

� Semantic heterogeneity

� The same parameters might have different significance under different context

� Technical heterogeneity

� Non-standardized interfaces, formats, etc.

� Political heterogeneity

� “Owners” of data have different missions and goals

423.05.2014

Page 5: Management and Analysis of Large Scale Heterogeneous Time-Series Data

523.05.2014

� Investigating effects of traffic state (free flow/stop&go) on local air quality

� Data sources

� Traffic monitor for traffic volume and acceleration

� Black carbon sensor at road side and a background station

� Meteorological station

Case Study

Page 6: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Case Study: Combined Air Quality and Traffic Monitoring

� Different owners

� City Council, State AQ Department and projects own sensors

� Different data intervals

� Traffic: Individual vehicles (~ 4000 data sets (speed, acceleration, vehicle class)/hour !)

� Air Quality & Meteo: fixed frequency, 30 min averages (48 data points/day)

� Pre-processing

� Temporal alignment & Aggregation

Page 7: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Goal: Investigating a “black carbon equivalent” for traffic

� Accelerating cars have a higher tailpipe emission than “free flowing”vehicles

Approach:

Q”BC” = Qtotal-vehicles + 6 * Qaccelerating-vehicles

(can be even more complex including weight factors for HGV etc...)

� Local (road-side) black carbon concentrations need to be reduced by “background” values to “isolate” traffic related component

CBC = Croad – Cbackground

And of course wind speed is of interest at the same time ... !

723.05.2014

Page 8: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Solution: What is openUwedat?

� OpenUwedat is a toolbox that allows to build Time Series relatedApplications

� The toolbox contains many ready-made, adaptable programs

� The toolbox contains libraries to write your own programs which integrate seamlessly with the existing ones

Driv

er Driver

Database

Dri

ver

configurable

Page 9: Management and Analysis of Large Scale Heterogeneous Time-Series Data

What can I do with openUwedat?

� openUwedat allows to interact with any kind of Time Series Device. You can integrate new devices by writing new modules which act as „drivers“.

� Typical devices are:

� Measurement Devices

� Data Aquisition Systems (station computers)

� Other Time Series Management Systems

� Databases (SQL and no-SQL)

� …

Page 10: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Implementation in openUwedat

� Powerful scripting language “Formula 3”

� Real time interfaces and real-time processing pipes

Example code how to implement the BC-Equivalent function in Formula 3

@A="name=Database; type=Aggregation;Source=TDS;Sensor=S4.TDS1;Lane=0"

@B="name=Database; type=Aggregation;Source=TDS;Sensor=S4.TDS1;Lane=1"

<<(A.accCount[i]+B.accCount[i]+A.decCount[i]+B.decCount[i])*6+A.totalFlow[i]+B.totalFlow[i]>> |

<< sum( _ ]t-60mins..t] ) >> every 60 mins

1023.05.2014

Page 11: Management and Analysis of Large Scale Heterogeneous Time-Series Data

1123.05.2014

Very good correlation! But depending on meteo-conditions. During

episodes of stronger wind, the correlation drops!

Typical Result Traffic / Air Quality

Page 12: Management and Analysis of Large Scale Heterogeneous Time-Series Data

Conclusions

� Plenty of heterogeneous data are collected on regular basis by public authorities day by day

� The potential to analyse these data together stays mostly unusedbecause:

� Lack of cooperation between authorities / departments

� Lack of interoperability of the systems

� Case study on traffic/air quality show potential of how heterogeneous data analysis creates new insights

� AIT’s OpenUwedat data management toolbox allows

� Collection of Large Scale Heterogeneous Time-Series Data from different sources

� Complex analysis using a powerful scripting language

1223.05.2014

Page 13: Management and Analysis of Large Scale Heterogeneous Time-Series Data

AIT Austrian Institute of Technologyyour ingenious partner

Martin Litzenberger