time series analysis (tsa): patterns, algorithms, examples

38
‹#› © Cloudera, Inc. All rights reserved. Mirko Kämpf | April 2016 Time Series Analysis (TSA): patterns, algorithms, examples

Upload: mirko-kaempf

Post on 14-Jan-2017

157 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Mirko Kämpf | April 2016

Time Series Analysis (TSA): patterns, algorithms, examples

Page 2: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Warmup

• Correlation

• Short Term Correlation

• Long Range Correlation

• Return Interval Statistics

Page 3: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Agenda

• Recap: The Data Science Process (DSP) - Where are all that time series?• Similarity Graphs• Hadoop.TS and HDGS• Typical patterns in TSA

• Application of Hadoop.TS• HDGS: Current status• Practical hints

Page 4: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Recap: The Data Science Process

Application of Big-Data-Technology

Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science

Page 5: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Where are the time series?

Application of Big-Data-Technology

Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science

Page 6: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Where are the time series?

Application of Big-Data-Technology

Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science

- Events are collected, grouped and sorted

- Normalization of raw series

- quality inspection- derive new information

about the series

- Plot the right charts- Visualize system properties

as networks

Page 7: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

“A time series describes a thing over time.” Many time series describes many things over time.

—Why should I care about time series analysis?

Page 8: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

“A time series describes a thing over time.” Many time series describes many things over time.

Correlation networks are drived from time series. CNs describe the system.—Why should I care about time series analysis?

Page 9: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Similarity GraphsWhat is similar among nodes?

(a) static properties(b) dynamic properties

Page 10: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Inspection of system properties: data quality screening

Page 11: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 12: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Motivation for Hadoop.TS & HDGSOverview & Concepts

Page 13: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 14: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 15: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 16: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Uni-Variat TSAProperties per time series

Page 17: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Distribution of values (PDF) …

Page 18: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Fluctuation AnalysisReturn Interval Statistics

Detect Long Term Correlation in Time Series

Page 19: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Multi-Variat TSAProperties per time series

Page 20: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 21: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Similarity Measures

• Cross-Correlation• Event-Synchronization

• Cosine-Similarity

• Granger Causality • Mutual Information

Question: How can I identify significant links?

Modifications and variation lead tobetter results in special use cases.

Page 22: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 23: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Get Meaning out of Correlation Metrics …

Page 24: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Application of Hadoop.TS:Results

Page 25: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 26: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 27: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

HDGS: Current statusData Flow, Prototype & Architecture Overview

Page 28: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 29: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Hadoop Ecosystem incl. Apache Spark

Spark is our time series workbench.Hadoop.TS brings in domain specific functions.HDGS exposes Metadata and data set properties as „linked data“ in RDF.

Hadoop.TS

HDGS

Page 30: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

How to use your code in Apache Spark-Shell?

A. Interactively, by loading it into the spark-shell.B. Contribute to existing Spark projects.C. Create your module and use it in a spark-shell session.D. Build a data-product which uses Apache Spark.

For simple and reliable usage of Java classes and complete third-party libraries, we define a Spark Module as a self-contained artifact created by Maven.

http://blog.cloudera.com/blog/2015/03/how-to-build-re-usable-spark-programs-using-spark-shell-and-maven/

Such packages can be shared and added to any Spark Session via –package option and Maven coordinates.

Page 31: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Hadoop Distributed Graph Space (HDGS)

Page 32: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

HDGS (2)

• Deploy a graph-meta-storeas parcel and activate via CSD

• Mediawiki or Fuseki/Jenaare available

Page 33: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Where is the Oscilloscope for Hadoop?

Page 34: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Page 35: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Practical Tips

Page 36: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Data Management

• Think about typical access patterns: • random access to each event, record or field?• access to entire groups of records?• variable size or fixed size sets?

• In general, prepare for „full table scan“• OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN!• Select efficient storage formats: Avro, Parquet• Index your data in SOLR for random access and data exploration • Indexing can be done by just a few clicks in HUE …

Page 37: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Collecting Sensor Data with Spark Streaming …

• Spark Streaming works on fixed time slices only.

• Use the original time stamp? • Requires additional storage and bandwidth• Original system clock defines resolution

• Use „Spark-Time“ or a local time reference: • You may lose information!• You have a limited resolution, defined by batch size.

Page 38: Time Series Analysis (TSA): patterns, algorithms, examples

‹#›© Cloudera, Inc. All rights reserved.

Thank you !Enjoy Apache Spark and all your data …