hortonworks hdp
DESCRIPTION
HDInsight. Hortonworks HDP. Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage. On-Premise or VM Based on HDFS. HDInsight. Hortonworks HDP. Lack of community support Untested to scale of traditional Hadoop setup in production setting - PowerPoint PPT PresentationTRANSCRIPT
Hortonworks HDPOn-Premise or VM
Based on HDFS
HDInsightSeamlessly scale in cloud
Backed by Azure Storage Vault (ASV)/Azure Blob Storage
HDInsight Hortonworks HDPLack of community support
Untested to scale of traditional Hadoop setup in production
setting
Lack of clear migration path to alternative Hadoop setup
Reliance on MS to bake in required Hadoop tools
Huge community support
Can setup on multitude of Linux and Windows VM’s
Migration to alternate platforms a known quantity
Support for new tools such as MRv2 or YARN quickly available
Hadoop/HDFS
HiveData Warehouse
Reporting Tools
Azure SQLCassandraSqoopMapReduce
ODBC
ODBC
Problem… Hadoop is great for batch of processing of millions of records. What about real-time processing?
Azure Queue
Data Warehouse
Trustev API
Azure Worker Roles
Message routing…
Can be complex, brittle and hard to scale
Azure Queue
Azure Queue Azure Queue
Message routing…
Routing must be re-configured when scaling out
Azure Queue
Azure Queue Azure Queue Azure Queue
And… Definition of fraud detection algorithms, weightings, rules get trapped in a release cycle. Fraud moves too fast!!!
Enter… Apache Storm. Doing for real-time data what Hadoop did for batch processing.
Azure Queue
Storm Cluster Data Warehouse
Trustev API
Shared Algorithms ML Generated Algorithms
Tuples
Streams
Spout
Bolts
Ordered List of ElementsName list of values of any type
Unbounded sequence of tuplesCan come from multiple source, like Twitter API or bolts
Source of streamCan talk with queues, logs, API calls, event data
Process Tuples, Create New StreamsApply functions, transforms, filter, aggregate, join and access DB’s and API’s etc.
Storm topologies
Are a directed graph of Spouts and Bolts. Using the correct tools, topologies can be created by fraud analysts, conversion analysts and most importantly automatically created and published using machine learning
Data Warehouse
Merchant A has a fraud problem that needs solving quickly. Merchant A can use our Shared Algorithm topology to immediately block common fraud problems.
Data Warehouse
Merchant A has been on our system for an extended period of time, and our system knows better what their fraud problem actually looks like. Our ML systems create a new topology to better deal with Merchant A’s fraud problem.
+
Hadoop Storm
Batch processing system than can churn huge volume of data
Real-time complex event processing system then can
process data stream
Speed Layer
Only New Data
Compensates for high latency ‘Serving Layer’
updates
‘Batch Layer’ overrides ‘Speed Layer’
Serving Layer
Loads and expose the batch views for querying
Random access to batch views
Batch Layer
Immutable, constantly growing datasets
Batch views computed from this raw dataset
This gives us our Lambda Architecture.
Real Time Big Data = Storm Process + Hadoop Process
Use the history data produce by Hadoop to make the to make your real time result faster, and more accurate
You can build this out in hours!
A simple combination of Azure Queues, SQL Azure, Azure VM’s running Cassandra, Hadoop and Storm