ipc global big data to decision solution overview
TRANSCRIPT
Enterprise Intelligence
Enterprise Intelligence
Big Data to Decisions
Pete ZybrickEnterprise Solutions ArchitectCloudera Certified Developer for Apache HadoopIPC GlobalT: [email protected]
Objectives
• Big Data to Decisions
• Cloudera CDH5
• AWS Elastic MapReduce
• Demonstrate End-to-End Example
• Overview of IPC Global Tools and Processes
Topics
• Data Source
• Randomly generated SiteCatalyst data (500K rows/day, 7 days, 554 columns)
• 2% Random Error Injection
• Process the Data: Hadoop
• Cloudera: Oozie job specification, MapReduce program
• AWS: EMR program
• Both call the same Hadoop Driver and Mapper programs
• Store Big Data: HDFS, Redshift, Delimited
• Selective Big Data Reduction
• Direct from Big Data: QVD
• Data Warehouse: MySQL
• Robust Application(s): QlikView
Process Flow
Live Data
Cloudera CDH5
AWS Elastic MapReduce
QlikView
Input Files
Test Data Generator
Impala
Redshift
ToImpala
ToRedshift
DailyDW
DataWarehouse
DailyDW
DailyQVD
QVD Files
DailyQVD
MapReduce
TSV Files
HDFS
MapReduce
TSV Files
Oozie Job
EMR Job
Power Data Users
Corp DB
IPC AWS Infrastructure
• Capabilities• Cloudera CDH5 Cluster – ClouderaManager + Managed Nodes
• AWS Elastic MapReduce – Dynamic launch of Hadoop cluster – Run Till Done
• Database Servers – RDS, MySQL, On Demand, QLIK
• VPN Integration with Client Network
• Rapid POC and Test Turnaround
Development / Testing
• Big Data Test Generator• Economically Generate Millions Of Rows Of Test Data Within Hours
• Runs as Cluster on AWS EC2 instances – Parallel Generation
• Configurable Random Data Types
• AWS Tools – Component Library• Encapsulate Complex Mechanisms into Basic Calls
• Consistent Error Recovery
• Consistent Security Model
• Library of Demonstration Programs
• Working with Amazon SA’s to Validate and Enhance
Summary
• On Premise, AWS, Hybrid - Rapid Turnaround
• Early Adopter – BI, AWS, Big Data
• Investing in Data to Decisions Pipeline
• Next Steps…
Enterprise Intelligence
Enterprise Intelligence
Big Data to Decisions.
Pete ZybrickEnterprise Solutions ArchitectCloudera Certified Developer for Apache HadoopIPC GlobalT: [email protected]