big data & knime · 512.231.6000 - 512.231.6010 fax - big data & knime michael hoskins, cto...

Post on 14-Mar-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

512.231.6000 - 512.231.6010 fax - www.pervasive.com

Big Data & KNIME

Michael Hoskins, CTO Pervasive Software

KNIME User Conf, Zurich, 1 February 2012

Big Data and the Digital Data Revolution

• Every two days we create as much

information as we did from the

dawn of civilization until 2003 – Eric Schmidt, Google, 2010

2

How Big? Surging to Exabytes

3

Data Inflation

4

• Where is all this Big Data

coming from?

5

The Internet is a Driver

6

The Real Culprit: an Internet of Things aka: Machine Generated Data

7

• What to do with all this Big

Data?

8

9

Analyze it!

Using Machine Learning Techniques

• Association rule learning

• Classification

• Cluster analysis

• Crowdsourcing

• Data fusion and data integration

• Data mining

• Ensemble learning

• Genetic algorithms

• Natural language processing (NLP)

• Neural networks

• Network analysis

• Optimization

• Pattern recognition

10

•Predictive modeling

•Regression

•Sentiment analysis

•Signal processing

•Spatial analysis

•Statistics

•Supervised learning

•Simulation

•Time series analysis

•Unsupervised learning

•Visualization

To Predict the Future

11

• What does Big Data mean to

you and KNIME?

12

Big Data means a new Data Science

13

Big Data Scientists need good Tools!

14

• What is Pervasive doing

about this?

15

Introducing Pervasive DataRush™

16

DataRush is a parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications

• Scalable: Performance dynamically scales with increased core/server

counts. No change to the code.

• High Throughput: Patented parallel dataflow technology enables fast,

deep analysis of large data sets with no limit on input data size.

• Cost Efficient: Fully exploit commodity multicore servers – save

significant capital and energy costs via efficient node utilization.

• Easy to Implement: DataRush takes care of complex parallel

processing issues at design time: hides threading complexity; no

deadlocks; runs on any platform – including Hadoop; etc..

• Extensible: DataRush is a component-based platform with an open API

so you can easily extend it for your own needs.

Pervasive DataRush Plug-in for KNIME

17

DataRush

Plug-Ins

Drag and Drop

High performance

nodes

DataRush

for

KNIME

Predictive

Analytics

Genomic Analysis: Align and Assemble

18

Scalable Predictive Analytics

19

Demo of Big Data in DataRush for KNIME

• KNIME with distributed (nextgen v6)

DataRush, reading >120m historical airline

flight records at scale, from native HDFS on

our test Hadoop cluster; performing a

Linear Regression and Visualization.

Runtime = 47 seconds!

20

512.231.6000 - 512.231.6010 fax - www.pervasive.com

Thank You!

mike.hoskins@pervasive.com

top related