spark data streaming pipeline
TRANSCRIPT
Spark DSMData Streaming PipelineORCHESTRATING DATA STORAGE, PROCESSING, AND MOVEMENT
Background
Today’s data landscape for enterprises continues to grow exponentially in volume, variety, and complexity.
Multiple geographic locations, on-premises and cloud Combination of open source, commercial solutions and custom processing code Can be expensive, hard to integrate and maintain. Ever increasing volumes of data (terabytes, petabytes) New ways of processing data (Hadoop, Spark etc.)
.NET Developers write large amounts of custom point-solution logic Difficult to maintain and orchestrate Performance bottlenecks
SparkPipe Framework
A development framework to deliver a .NET information production system that co-ordinates all of this data and processing.
Familiar technologies for .NET developers including .NET Framework 4.0 Windows Workflow Foundation Task Parallel Library Dataflow
Drag and drop business process pipeline modeling Designed for performance to scale across processor cores and servers
from the local data center to cloud providers such as Microsoft Azure
Build Solutions
Build data-driven workflows (pipelines) that join, aggregate and transform data sourced from on-premises, cloud-based, and internet data stores.
Transform semi-structured, unstructured and structured data from diverse data sources into trusted information.
Produce data that can be easily consumed by using business intelligence (BI), analytics tools, and other applications.
Set up complex data processing through simple composing.
Visual Pipeline Design
Built for “Cloud Scale”
Support for Microsoft Azure offerings including: Azure SQL Server HDInsight (HADOOP) Blob, Tables, Queues and ServiceBus
Automatically spin-up cloud servers, process data and then shut down to for cost-effective processing.
Support for Healthcare
Out of the box components include: HL7 v2 Clinical Document Architecture EDI 834 PGP Encryption Secure FTP
Typical Process Flow