delivering apache hadoop for the modern data architecture
DESCRIPTION
Join Hortonworks and Cisco as we discuss trends and drivers for a modern data architecture. Our experts will walk you through some key design considerations when deploying a Hadoop cluster in production. We'll also share practical best practices around Cisco-based big data architectures and Hortonworks Data Platform to get you started on building your modern data architecture.TRANSCRIPT
Page 1 © Hortonworks Inc. 2014
Delivering Apache Hadoop for the Modern Data Architecture
Cisco & Hortonworks. We do Hadoop Together
Page 2 © Hortonworks Inc. 2014
Our speakers…
Ajay Singh Director Technical Channels, Hortonworks
Sean McKeown Solutions Architect, Data Center, Cisco
Page 3 © Hortonworks Inc. 2014
Why Hadoop: Traditional Data Architecture Pressured
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Data source: IDC
SOU
RC
ES
OLTP, ERP, CRM
Documents, Emails
Web Logs, Click
Streams
Social Networks
Machine Generated
Sensor Data
Geolocation Data
Page 4 © Hortonworks Inc. 2014
Sens
or
Serv
er
Logs
Text
So
cial
Geo
grap
hic
Mac
hine
Clic
kstr
eam
Stru
ctur
ed
Uns
truc
ture
d
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔ ✔ ✔
Telecom Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔
Retail 360° View of the Customer ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
What: Business Applications of Hadoop
Page 5 © Hortonworks Inc. 2014
Sens
or
Serv
er
Logs
Text
So
cial
Geo
grap
hic
Mac
hine
Clic
kstr
eam
Stru
ctur
ed
Uns
truc
ture
d
Manufacturing Supply Chain and Logistics ✔
Preventive Maintenance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔ ✔
Monitor Patient Vitals in Real-Time
Pharmaceuticals
Recruit & Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔
Government ETL Offload in Response to Budgetary Pressures ✔
Sentiment Analysis for Gov’t Programs ✔
What: Business Applications of Hadoop
Page 6 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
DAT
A SY
STEM
S A
PPLI
CAT
ION
S
Repositories
ROOMS
Statistical Analysis
BI / Reporting,
Ad Hoc Analysis
Interactive Web & Mobile Apps
Enterprise
Applications
RDBMS EDW MPP
How: Modern Data Architecture with Hadoop
Governa
nce
& In
tegra.
on
Security
Ope
ra.o
ns
Data Access
Data Management
ENTERPRISE HADOOP
SOU
RC
ES
OLTP, ERP, CRM
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geolocation Data
Page 7 © Hortonworks Inc. 2014
YARN Transforms Hadoop’s Architecture
Enables deep insight across a large, broad, diverse set of data at
efficient scale
Mul.-‐Use Data Pla>orm Store all data in one place, process in many ways
Batch Interac.ve Itera.ve Streaming
1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °
° ° °
° ° °
° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °
° ° °
° ° n
Store any/all raw data sources and processed data over extended periods of time.
YARN : Data Opera.ng System
Page 8 © Hortonworks Inc. 2014
Designing Hadoop Cluster
§ Cluster Storage Capacity
§ Server Specification
§ Cluster Size
§ Factoring Performance
Key Considerations § Any piece of hardware can and will
fail
§ More nodes means less impact on failure
§ Resiliency and fault tolerance improve with scale
§ Build resiliency through scale
§ Still use modern hardware
§ Software handles hardware failures
Page 9 © Hortonworks Inc. 2014
Storage Capacity
§ Key Input § Initial Data Size § 3 year YOY growth § Compression ratio § Intermediate and materialized views § Replication Factor
§ Note § Hard to accurately predict the size of intermediate & materialized views at the start of a
project § Be conservative with compression ratio. Mileage varies by data type § Hadoop needs temp space to store intermediate files
Hadoop Cluster
Raw Data
Work In Process Data
Master Data
Materialized Views
Page 10 © Hortonworks Inc. 2014
Storage Capacity
Total Storage Required
(Initial Size + "YOY Growth + Intermediate Data Size) "X Replication Count "X 1.2"
Compression Ratio"
Good Rule of Thumb
Replication Count = 3""Compression Ratio = 4-5""Intermediate Data Size = 50%-100% of Raw Data Size"
Note
1.2 factor is included in the sizing estimator to account for the temp space requirement of Hadoop"
Page 11 © Hortonworks Inc. 2014
Server Specification
Page 11
§ Master Nodes – NameNode, Resource Manager, HBase Master § Dual Intel Xeon E5-26xx series processors § 128GB or 256GB RAM per chassis § 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares
§ Worker Nodes – DataNode, Node Manager and Region Server § Dual Intel Xeon E5-26xx series processors § 128GB RAM or 256GB RAM § 12 – 1-4 TB NLSAS/SATA Drives
§ Gateway Nodes / Edge Nodes § Mirror of Master Nodes configuration
Page 12 © Hortonworks Inc. 2014
Number of Data Nodes
Cluster Size
12
Storage Per Server
Number of Master Nodes § Name Node, Zookeeper § Resource Manager, Zookeeper § Failover Name Node, HBase Master, Hive
Server, Zookeeper § In a half-rack cluster, this would be combined with
Resource Manager § Management Node (Ambari, Ganglia, Nagios)
§ In a half-rack cluster, this would be combined with the Name Node
Total Storage"Required"
Note § Large clusters may need more than 4
master nodes § Start at 2/4 and grow based on usage
Page 13 © Hortonworks Inc. 2014
Factoring Performance
§ Data Nodes § 1 TB drives for performance clusters § 4 TB drives for archive clusters
§ Meeting SLA Requirements § Hadoop workloads are varied § Difficult to assess cluster size based on SLAs without actual testing § Good News: Hadoop performs linearly with scale
§ Enables one to design experiments using a fraction of data § Best Practice Guidance
§ Create a test configuration with a rack of servers § Load a slice of data § Run tests with real-life queries to measure performance & fine tune the system § Scale cluster size based on observed performance
13
Page 14 © Hortonworks Inc. 2014
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP and Cisco are deeply integrated in the data center SO
UR
CES
EXISTING Systems
Clickstream Web &Social Geoloca.on Sensor & Machine
Server Logs Unstructured
DAT
A S
YSTE
M
RDBMS EDW MPP HANA
APPLICAT
IONS
BusinessObjects BI Deep Partnerships Hortonworks and Cisco engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, & SAP Broad Partnerships Over 600 partners work with Hortonworks to certify their applications to work with Hadoop so they can extend big data to their users
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Page 15 © Hortonworks Inc. 2014
Cisco + Hortonworks Validated Design
Sean McKeown Solutions Architect, Data Center, Cisco
Page 16 © Hortonworks Inc. 2014
Cisco + Hortonworks Validated Design
Page 17 © Hortonworks Inc. 2014
Cisco UCS Common Platform Architecture (CPA) Building Blocks for Big Data
17
UCS 6200 Series Fabric Interconnects
Nexus 2232 Fabric Extenders
UCS Manager
UCS 240 M3 Servers
LAN, SAN, Management
Page 18 © Hortonworks Inc. 2014
UCS + Hortonworks Reference Configurations
18
Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) LE-39604-01 10/13
Americas Headquarters Cisco Systems, Inc. San Jose, CA
Asia Pacific Headquarters Cisco Systems (USA) Pte. Ltd. Singapore
Europe Headquarters Cisco Systems International BV Amsterdam, The Netherlands
Cisco Common PlatformArchitecture Version 2 for Big Data
unformatted storage per rack for a total of 7.68 petabytes (PB) when scaled to a 10-rack configuration.
Capacity Optimized with Flash MemoryThis is the industry’s first big data solution to accelerate performance with a transparent, high-performance flash-memory cache powered by LSI Nytro MegaRAID technology. The card’s 200 GB of flash memory can be used as a transparent cache tier for hard disk drives and operating system images, freeing all 12 hard disk drives for data. It offers 768 TB of unformatted storage and 3.12 TB of flash memory
per rack, for a total of 7.68 PB and 31.25 TB of flash memory per domain. It is designed for big data applications including Cloudera, HortonWorks, Intel Distribution for Apache Hadoop, MapR, MarkLogic, Oracle NoSQL Database, ParAccel, and Pivotal Greenplum Database Pivotal HD solutions.
Easy OrderingCisco UCS CPA v2 for Big Data is available through Cisco UCS Solution Accelerator Paks (Table 1). The program helps you quickly and easily deploy a powerful, secure big data environment in your enterprise without the expense
entailed in designing and building your own custom solution. The solution scales by adding servers as needed.
For More InformationFor more information about Cisco UCS big data solutions, please visit http://www.cisco.com/go/bigdata.
For more information about the Cisco UCS CPA v2 for Big Data, please visit http://blogs.cisco.com/datacenter/cpav2.
Visit the Cisco big data design zone at http://www.cisco.com/go/bigdata_design.
Performance Optimized (UCS-SL-CPA2-P)
Performance and Capacity Balanced (UCS-SL-CPA2-PC)
Capacity Optimized (UCS-SL-CPA2-C)
Capacity Optimized with Flash Memory (UCS-SL-CPA2-CF)
Connectivity • 2 Cisco UCS 6248UP 48-Port Fabric Interconnects
• 2 Cisco Nexus® 2232PP 10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-Port Fabric Interconnects
• 2 Cisco Nexus 2232PP 10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-Port Fabric Interconnects
• 2 Cisco Nexus 2232PP 10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-Port Fabric Interconnects
• 2 Cisco Nexus 2232PP 10GE Fabric Extenders
Management Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager
Servers 8 Cisco UCS C240 M3 Rack Servers, each with:• 2 Intel Xeon processors
E5-2680 v2• 256 GB of memory• LSI MegaRaid 9271CV
8i card• 24 900-GB 10K SFF SAS
drives (168 TB total)
16 Cisco UCS C240 M3 Rack Servers, each with:• 2 Intel Xeon processors
E5-2660 v2• 256 GB of memory• LSI MegaRaid 9271CV
8i card• 24 1-TB 7.2K SFF SAS
drives (384 TB total)
16 Cisco UCS C240 M3 Rack Servers, each with:• 2 Intel Xeon processors
E5-2640 v2• 128 GB of memory• LSI MegaRaid 9271CV
8i card• 12 4-TB 7.2K LFF SAS
drives (768 TB total)
16 Cisco UCS C240 M3 Rack Servers, each with:• 2 Intel Xeon processors
E5-2660 v2• 128 GB of memory• Cisco UCS Nytro
MegaRAID 200-GB Controller
• 12 4-TB 7.2K LFF SAS drives (768 TB total)
Table 1. Cisco CPA v2 for Big Data Includes Four Optimized Configurations
Page 19 © Hortonworks Inc. 2014
Installing Servers Today
LAN
SAN
• RAID settings • Disk scrub actions
• Number of vHBAs • HBA WWN assignments • FC Boot Parameters • HBA firmware
• FC Fabric assignments for HBAs
• QoS settings • Border port assignment per vNIC • NIC Transmit/Receive Rate Limiting
• VLAN assignments for NICs • VLAN tagging config for NICs
• Number of vNICs • PXE settings • NIC firmware • Advanced feature settings
• Remote KVM IP settings • Call Home behaviour • Remote KVM firmware
• Server UUID • Serial over LAN settings • Boot order • IPMI settings • BIOS scrub actions • BIOS firmware • BIOS Settings
Page 20 © Hortonworks Inc. 2014
UCS Service Profiles
LAN
SAN
Ser
vice
Pro
file
Page 21 © Hortonworks Inc. 2014
Abstracting the Logical Architecture
21
Adapter
Switch
10GE A
Eth 1/1
FEX A
6200-A
Physical Cable
Virtual Cable (VN-Tag) Server
vNIC 1
10GE A
vEth 1
FEX A
Adapter
6200-A
vHBA 1
vFC 1
Service Profile
Cables
vNIC 1
vEth 1
6200-A
vHBA 1
vFC 1
(Server)
Server
ü Dynamic, Rapid Provisioning
ü State abstraction
ü Location Independence
ü Blade or Rack
What you get What you see
Chassis
Page 22 © Hortonworks Inc. 2014
Cisco UCS: Physical Architecture
22
6200 Fabric A
6200 Fabric B
B200 VIC
FEX B
FEX A
SAN A SAN B ETH 1 ETH 2
MGMT MGMT
Chassis 1
Fabric Switch
Fabric Extenders
Uplink Ports
Compute Blades Half / Full width
OOB Mgmt
Server Ports
Virtualized Adapters
Cluster
Rack Mount C240
VIC
FEX A FEX B
Page 23 © Hortonworks Inc. 2014
Simple Scalability
23
Single Rack 16 servers
Single Domain Up to 10 racks, 160 servers,
7PBytes
Multiple Domains
L2/L3 Switching
Page 24 © Hortonworks Inc. 2014
Proven performance and linear scalability
24
Page 25 © Hortonworks Inc. 2014
Simplified Management Throughout Cluster Lifecycle
Provisioning
Monitoring
Maintenance
Growth
UCSM provides: • Speed • Ease of experimentation • Consistency • Simplicity • Visibility
Page 26 © Hortonworks Inc. 2014
Complete Network Flexibility
Example:
• vNIC0 for management
• vNIC1 for internal
• vNIC2 for external
• No OS bonding needed with Fabric Failover
Configure as vNICs and vLANs as you need with the click of a mouse
26
Data ingress/egress
VNIC 0
VNIC 0
VNIC 1
L2/L3 Switching
Data Node 1
VNIC 2
Data Node 2
6200 A
VNIC 2
6200 B
VNIC 1
Page 27 © Hortonworks Inc. 2014
Creating QoS Policies and Enabling JumboFrames
27
!!
Best Effort policy for management VLAN Platinum policy for cluster VLAN
Page 28 © Hortonworks Inc. 2014
Switch Buffer Usage With Network QoS Policy to prioritize
HBase Read Operations
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
40000"
Latency((us)(
Time(
READ","Average"Latency"(us)" QoS","READ","Average"Latency"(us)"
1"
70"
139"
208"
277"
346"
415"
484"
553"
622"
691"
760"
829"
898"
967"
1036"
1105"
1174"
1243"
1312"
1381"
1450"
1519"
1588"
1657"
1726"
1795"
1864"
1933"
2002"
2071"
2140"
2209"
2278"
2347"
2416"
2485"
2554"
2623"
2692"
2761"
2830"
2899"
2968"
3037"
3106"
3175"
3244"
3313"
3382"
3451"
3520"
3589"
3658"
3727"
3796"
3865"
3934"
4003"
4072"
4141"
4210"
4279"
4348"
4417"
4486"
4555"
4624"
4693"
4762"
4831"
4900"
4969"
5038"
5107"
5176"
5245"
5314"
5383"
5452"
5521"
5590"
5659"
5728"
5797"
5866"
5935"
Buffer&Used&
Timeline&
Hadoop"TeraSort" Hbase"
Read Latency Comparison of Non-QoS vs. QoS Policy
~60% Read Improvement
HBase + Hadoop Map Reduce (Terasort)
Page 29 © Hortonworks Inc. 2014
UCS Rack-Mount Servers
UCS Blade Servers
UCS Common Platform Architecture with Hortonworks
SAN/NAS Arrays
Enterprise Applications
Single Platform for Traditional and Big Data Applications
Page 30 © Hortonworks Inc. 2014
THANK YOU [email protected] [email protected]