ipc global big data to decision solution overview

8
Enterprise Intelligence Enterprise Intelligence Big Data to Decisions Pete Zybrick Enterprise Solutions Architect Cloudera Certified Developer for Apache Hadoop IPC Global T: 973-214-8820 pete.zybrick@ipc-globa l.com

Upload: pzybrick

Post on 30-Jul-2015

281 views

Category:

Documents


2 download

TRANSCRIPT

Enterprise Intelligence

Enterprise Intelligence

Big Data to Decisions

Pete ZybrickEnterprise Solutions ArchitectCloudera Certified Developer for Apache HadoopIPC GlobalT: [email protected]

Objectives

• Big Data to Decisions

• Cloudera CDH5

• AWS Elastic MapReduce

• Demonstrate End-to-End Example

• Overview of IPC Global Tools and Processes

Topics

• Data Source

• Randomly generated SiteCatalyst data (500K rows/day, 7 days, 554 columns)

• 2% Random Error Injection

• Process the Data: Hadoop

• Cloudera: Oozie job specification, MapReduce program

• AWS: EMR program

• Both call the same Hadoop Driver and Mapper programs

• Store Big Data: HDFS, Redshift, Delimited

• Selective Big Data Reduction

• Direct from Big Data: QVD

• Data Warehouse: MySQL

• Robust Application(s): QlikView

Process Flow

Live Data

Cloudera CDH5

AWS Elastic MapReduce

QlikView

Input Files

Test Data Generator

Impala

Redshift

ToImpala

ToRedshift

DailyDW

DataWarehouse

DailyDW

DailyQVD

QVD Files

DailyQVD

MapReduce

TSV Files

HDFS

MapReduce

TSV Files

Oozie Job

EMR Job

Power Data Users

Corp DB

IPC AWS Infrastructure

• Capabilities• Cloudera CDH5 Cluster – ClouderaManager + Managed Nodes

• AWS Elastic MapReduce – Dynamic launch of Hadoop cluster – Run Till Done

• Database Servers – RDS, MySQL, On Demand, QLIK

• VPN Integration with Client Network

• Rapid POC and Test Turnaround

Development / Testing

• Big Data Test Generator• Economically Generate Millions Of Rows Of Test Data Within Hours

• Runs as Cluster on AWS EC2 instances – Parallel Generation

• Configurable Random Data Types

• AWS Tools – Component Library• Encapsulate Complex Mechanisms into Basic Calls

• Consistent Error Recovery

• Consistent Security Model

• Library of Demonstration Programs

• Working with Amazon SA’s to Validate and Enhance

Summary

• On Premise, AWS, Hybrid - Rapid Turnaround

• Early Adopter – BI, AWS, Big Data

• Investing in Data to Decisions Pipeline

• Next Steps…

Enterprise Intelligence

Enterprise Intelligence

Big Data to Decisions.

Pete ZybrickEnterprise Solutions ArchitectCloudera Certified Developer for Apache HadoopIPC GlobalT: [email protected]