hardware & software acquisition costs · 2017-05-23 · •scalable: performance dynamically...

Post on 01-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.pervasivedatarush.com

Pervasive DataRushTM

Parallel Data Analysis with KNIME

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES 2

Company Overview

Global Software Company

• Tens of thousands of users across the globe

• Americas, EMEA, Asia

• ~230 employees worldwide

Strong Financials

• $46 million revenue (Trailing 12-month)

• 40 consecutive quarters of profitability

• $36 million in the bank

• NASDAQ:PVSW since 1997

Leader in Data Innovation

• Cloud-Based and On-Premises Data Integration

• Data Management

• Web-based Business-to-Business Data Interchange

• Highly Parallel Data-Intensive and Analytic Applications

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Data Size

Com

ple

xit

y

HPC • climate modeling

• seismic analysis

• fluid dynamics

Internet scale

• web indexing

• web search

GB PB

Enterprise data

• custom solutions

• data quality

• data analytics

Need to deal

with increased

data and

complexity

The Challenge of Big Data

3

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Pervasive DataRush™

• Scalable: Performance dynamically scales with

increased core counts and increased nodes.

• High Throughput: Fast, deep analysis of large

data sets with no limit on input data size.

• Cost Efficient: Maximum performance from

commodity multicore servers, SMP systems and

clusters.

• Easy to Implement: No complex parallel

processing issues; visual and API level

interfaces.

• Extensible: Extensible platform so you remain in

control of development.

… a parallel dataflow platform that eliminates performance

bottlenecks in your data-intensive applications

Mult

icore

SM

P

Clu

ster

Hadoop

Clu

ster

Analytics and Big Data

Application

DataRush Apps Scale Up and Out

4

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Pervasive DataRush Architecture

PDR Modules

DR DataMatcher

DR Recommender

DR Profiler

User-defined Modules

Data preparation

Data analytics

DR Core Data

Prep Lib

DR Core

Analytics Lib

Dynamic

Processing

Graph

DataRush SDK

User-defined

Libraries

High Performance Data-intensive Application

Quality data

Actionable analytics

Large volumes of data

PDR Parallel Dataflow Engine

KNIME

5

JVM: Java, Python, JRuby, SCALA…

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

DataRush & KNIME Integration

• Desktop plug-in for DataRush usage

– Nodes for data preparation and manipulation

– Base set of parallelized data mining functionality

– Highly efficient & parallelized data staging

– Parallel execution extension

• SDK plug-in for DataRush node development

– Create your own DataRush based nodes

– Access to full DataRush API’s

– Wizard for creating DataRush based nodes

6

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

7

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

8

DataRush

Engine

spawns

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

9

DataRush

Engine

spawns

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

10

DataRush

Engine

spawns

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution - Complete

11

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel DataRush Executor

• Capabilities

– Supports parallel execution of DataRush based

nodes without intermediate staging

– Automatically splits workflows into executable

graphs at staging boundaries

– Executes non-DataRush nodes including meta-

nodes, for loops and branches

– Usable within desktop, command line and

server environments

12

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

13

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

14

DataRush

Engine

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

15

DataRush

Engine

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution - Complete

16

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution – Details

17

Parse

Parse

Parse

Parse

Replace

Replace

Replace

Replace

Aggregate

Aggregate

Aggregate

Aggregate

Format

Format

Format

Format Write

www.pervasivedatarush.com

Demo

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Vision for Levels of Usage

• Level 0

– No code changes on your part

– Install DataRush plug-in and most nodes will see a performance

benefit

• Level 1

– Some code changes required

– Utilize DataRush to access parallelized data staging capability

bypassing BDT API

• Level 2

– Utilize DataRush SDK to build nodes using the full parallelized

flow capability of DataRush

– Available today

19

www.pervasivedatarush.com

Demo

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

DataRush Benefits

• High Throughput

– Process data quickly and efficiently

– Accomplish complex processing in a single pass

• Scalable

– Takes advantage of multicore processors

– Runs faster as more cores are added

– Scales with the amount of data

• Easy to use and extend

– Dataflow abstraction hides parallelism details

– SDK to ease development

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Summary

• Scale performance on commodity multicore systems – Massive performance exists on a single server

– Core counts growing with Moore’s Law

• Scale up and scale out – Economical, environmental, and manageable

• Scale to big data – Handle diverse, complex, massive data sets

• Scale development – Easy for existing team to implement parallel applications

– Extensible platform keeps you in control

Simplify how you develop Big Data applications

22

www.pervasivedatarush.com

Questions?

top related