spark data streaming pipeline

9

Click here to load reader

Upload: jonathan-bradshaw

Post on 16-Apr-2017

242 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Spark Data Streaming Pipeline

Spark DSMData Streaming PipelineORCHESTRATING DATA STORAGE, PROCESSING, AND MOVEMENT

Page 2: Spark Data Streaming Pipeline

Background

Today’s data landscape for enterprises continues to grow exponentially in volume, variety, and complexity.

Multiple geographic locations, on-premises and cloud Combination of open source, commercial solutions and custom processing code Can be expensive, hard to integrate and maintain. Ever increasing volumes of data (terabytes, petabytes) New ways of processing data (Hadoop, Spark etc.)

.NET Developers write large amounts of custom point-solution logic Difficult to maintain and orchestrate Performance bottlenecks

Page 3: Spark Data Streaming Pipeline

SparkPipe Framework

A development framework to deliver a .NET information production system that co-ordinates all of this data and processing.

Familiar technologies for .NET developers including .NET Framework 4.0 Windows Workflow Foundation Task Parallel Library Dataflow

Drag and drop business process pipeline modeling Designed for performance to scale across processor cores and servers

from the local data center to cloud providers such as Microsoft Azure

Page 4: Spark Data Streaming Pipeline

Build Solutions

Build data-driven workflows (pipelines) that join, aggregate and transform data sourced from on-premises, cloud-based, and internet data stores.

Transform semi-structured, unstructured and structured data from diverse data sources into trusted information.

Produce data that can be easily consumed by using business intelligence (BI), analytics tools, and other applications.

Set up complex data processing through simple composing.

Page 5: Spark Data Streaming Pipeline

Visual Pipeline Design

Page 6: Spark Data Streaming Pipeline

Built for “Cloud Scale”

Support for Microsoft Azure offerings including: Azure SQL Server HDInsight (HADOOP) Blob, Tables, Queues and ServiceBus

Automatically spin-up cloud servers, process data and then shut down to for cost-effective processing.

Page 7: Spark Data Streaming Pipeline

Support for Healthcare

Out of the box components include: HL7 v2 Clinical Document Architecture EDI 834 PGP Encryption Secure FTP

Page 8: Spark Data Streaming Pipeline
Page 9: Spark Data Streaming Pipeline

Typical Process Flow