big data & knime · 512.231.6000 - 512.231.6010 fax - big data & knime michael hoskins, cto...

21
512.231.6000 - 512.231.6010 fax - www.pervasive.com Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012

Upload: others

Post on 14-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

512.231.6000 - 512.231.6010 fax - www.pervasive.com

Big Data & KNIME

Michael Hoskins, CTO Pervasive Software

KNIME User Conf, Zurich, 1 February 2012

Page 2: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Big Data and the Digital Data Revolution

• Every two days we create as much

information as we did from the

dawn of civilization until 2003 – Eric Schmidt, Google, 2010

2

Page 3: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

How Big? Surging to Exabytes

3

Page 4: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Data Inflation

4

Page 5: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

• Where is all this Big Data

coming from?

5

Page 6: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

The Internet is a Driver

6

Page 7: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

The Real Culprit: an Internet of Things aka: Machine Generated Data

7

Page 8: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

• What to do with all this Big

Data?

8

Page 9: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

9

Analyze it!

Page 10: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Using Machine Learning Techniques

• Association rule learning

• Classification

• Cluster analysis

• Crowdsourcing

• Data fusion and data integration

• Data mining

• Ensemble learning

• Genetic algorithms

• Natural language processing (NLP)

• Neural networks

• Network analysis

• Optimization

• Pattern recognition

10

•Predictive modeling

•Regression

•Sentiment analysis

•Signal processing

•Spatial analysis

•Statistics

•Supervised learning

•Simulation

•Time series analysis

•Unsupervised learning

•Visualization

Page 11: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

To Predict the Future

11

Page 12: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

• What does Big Data mean to

you and KNIME?

12

Page 13: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Big Data means a new Data Science

13

Page 14: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Big Data Scientists need good Tools!

14

Page 15: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

• What is Pervasive doing

about this?

15

Page 16: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Introducing Pervasive DataRush™

16

DataRush is a parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications

• Scalable: Performance dynamically scales with increased core/server

counts. No change to the code.

• High Throughput: Patented parallel dataflow technology enables fast,

deep analysis of large data sets with no limit on input data size.

• Cost Efficient: Fully exploit commodity multicore servers – save

significant capital and energy costs via efficient node utilization.

• Easy to Implement: DataRush takes care of complex parallel

processing issues at design time: hides threading complexity; no

deadlocks; runs on any platform – including Hadoop; etc..

• Extensible: DataRush is a component-based platform with an open API

so you can easily extend it for your own needs.

Page 17: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Pervasive DataRush Plug-in for KNIME

17

DataRush

Plug-Ins

Drag and Drop

High performance

nodes

DataRush

for

KNIME

Predictive

Analytics

Page 18: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Genomic Analysis: Align and Assemble

18

Page 19: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Scalable Predictive Analytics

19

Page 20: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

Demo of Big Data in DataRush for KNIME

• KNIME with distributed (nextgen v6)

DataRush, reading >120m historical airline

flight records at scale, from native HDFS on

our test Hadoop cluster; performing a

Linear Regression and Visualization.

Runtime = 47 seconds!

20

Page 21: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital

512.231.6000 - 512.231.6010 fax - www.pervasive.com

Thank You!

[email protected]