successfully deploying alternative storage architectures ... · 2015 snia analytics and big data...

22
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Upload: vanduong

Post on 12-May-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Successfully Deploying Alternative Storage Architectures for Hadoop

Gus Horn Iyer Venkatesan

NetApp

Page 2: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Agenda

Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples Guidelines and best practices NFS Connector for Hadoop Conclusion and next steps

2

Page 3: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Hadoop and Storage

3

Page 4: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Data Node C Data Node B

Traditional Hadoop Storage Flow

4

Ingest to data-node-A Ingest is replicated to data-nodes-B and data-nodes-C

Name Node

Network Switch

Replication R=3

Ingest – logs, images, text

Data Node A

data1 data2

data3 data4

data1 data2

data3 data4

data1 data2

data3 data4

data1 data2

data3 data4 replicate replicate

Page 5: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Implications of three copies

5

Network Congestion Server Congestion, RAM Utilization

Hadoop and Memory Memory issues large part of support calls

(root cause = server memory contention) Reducing server replication reduces memory

consumption for a more reliable, faster cluster

Server replication can be messy

Server A

I/O Controller

Memory Controller

CPU

Disk Drive(s)

Memory (RAM, DIMM)

Start network

Server B Server C

Server A Server B

network

Server C

LUN - A (master) LUN - B (copy) LUN - C (copy)

LUN - A (copy) LUN - B (master) LUN - C (copy)

LUN - A (copy) LUN - B (copy) LUN - C (master)

Hadoop uses server-based replication to keep three copies Causes high levels of I/O over server system bus Causes poor disk utilization (1/3 of raw capacity)

Page 6: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Alternative DAS Architecture

Dedicated storage with E-series External DAS architecture

Higher capacity and density – 180TB in 4U – Less footprint in datacenter

Two copies of data (not three) – Less network congestion, better

throughput – Less data to manage, higher effic

High availability for Hadoop – Reliable NameNode protection – Jobs continue when nodes go off-line – Faster cluster recovery

6

Page 7: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

NetApp Storage Layout for HDFS

Two 7-disk RAID 5 groups with two LUNs per node Dedicated set of disks per DataNode Shared-nothing architecture Spare disks shared globally

7

Page 8: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Use Cases

8

Page 9: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Service Provider Leveraging Hadoop

Significant growth in network log data from remote data centers couldn’t be consolidated Analytical queries can’t be done with existing tools – stakeholders couldn’t access data

9

Remote Servers

Central Servers

Remote Servers

Analytics Solution

Hadoop HDFS/MapReduce

Archiving & Indexing Tools

UI +

Sea

rch

Too

l

Analysts

Business Users

Faster consolidation, indexing, searching of log data Information needed for auditing and compliance New analytics capabilities Eight note Hadoop cluster with open source search, indexing tools

Page 10: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Security Use Case in Government

Challenges Protect IT/data assets from cyber attacks Implementation: how to combine big data with cyber analytics

Benefits Defensive perimeter around financial data to thwart potential attacks Better situational awareness Required both Hadoop and custom analytical application for complete solution

10

Customer analytics

application

Page 11: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Alternative Architecture in Healthcare

Challenges Extract Transform Load offload for increasing amounts of unstructured data Integration of Hadoop with traditional systems Benefits Cost effective ingest solution of semi and unstructured data New treatment analytics capabilities Highly available Hadoop cluster

11

Hadoop

Business Intelligence

Images, Insurance claims patient records

Data Warehouse

Page 12: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Other customers and use cases

Manufacturing Electronics, industrial

coating

High Tech Semiconductor design and

packaging, networking

Healthcare Hospitals, pharmaceutical,

managed healthcare, clinical testing

Transportation Airline, automotive

Consumer Retail, household goods

Financial Services Insurance, banking, mobile

payments

12

Government Education, security

Telco/SP Wireless hotspots, logs analysis

Page 13: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Advantages of Alternative Architecture

13

Feature External or Managed DAS White Box DAS

Replication count 2 – Reduction of hardware required by one third Single copy planned

3 minimum

Application availability

Enterprise Hardware RAID 5,6 & Dynamic Disk Pools Much higher uptime (five nines)

Slower recovery from disk drive failure, NameNode failure Less uptime

Performance Consistent performance during “healthy and unhealthy” modes of operation 33% less network traffic

Degraded of up to 240% with single drive failure

Fan-In Ratio Up to 8:1 (nodes per E-Series) SAS options: I-Band, FC

Limited scalability only with internal drives

Solution Architecture

Validated designs, Technical Reports expediting time to market, reducing risk

Iterative time-consuming tuning process, multiple failure points, and resource intensive

Growth Flexibility Storage and compute decoupled Non-disruptive lifecycle management

Can only grow both simultaneously Disruptive migration and rebalancing

DataNode Management

Non Disruptive DataNode replacement No rebalancing or migration

Disruptive DataNode Replacement – must rebalance and / or migrate content

Page 14: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Best practices from customer use cases

Start with the use case or business problem to everage new data sources

Determine the workload, technologies, infrastructure

Enhance or update your datawarehouse and BI tools (ETL offload and active archiving)

Think about redesigning or updating the analytic platform

14

Page 15: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Best Practices

Minimize network overhead Replication factor of 2 and RAID 5 Use compression wherever possible

Storage and Hadoop optimization Start with 4:1 storage to compute ratio Allocate 30% of storage capacity to map output Disk group layout

Turn on rack awareness

15

Page 16: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Best Practices

Use E5560 (or later) as storage array, supporting four DataNodes

Use FAS22xx for diskless and network boot, storage administration

Separate network for data; separate for node interconnect

Use Jumbo Frames and 10GbE Determine DataNodes by storage and job run

requirements

16

Page 17: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Best practices (continued)

Start a POC or pilot sooner than later POC is for business validation Pilot is for technology validation

Focus on performance after deployment Application and cluster size determine most of

the configuration

17

Page 18: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Putting the Stack Together

Storage and File Systems

Servers, Networking, Hardware

Data Management

Applications and Analytics

Reporting/Dashboard/ Visualization

18

Page 19: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Scenario for storage and analytics

Hadoop diagram courtesy Hortonworks

19

NetApp FAS Storage NFS-based

Enterprise Data

1

1) Data is sitting on FAS, NFS-based storage

2

2) If Hadoop or Map Reduce analysis is needed, HDFS-based storage has to be created

3

3) Data has to be moved to newly created Hadoop storage

4

4) Analysis can now be done on data

Hadoop Analytics

HDFS

YARN

Map-Reduce HBase Spark

Page 20: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Introducing NetApp NFS Connector

Hadoop diagram courtesy Hortonworks

20

NetApp FAS Storage NFS-based

Enterprise Data

Map Reduce analytics natively on data sitting on FAS, NFS-based storage

NFS Connector is a thin software application between Map Reduce and NFS

NFS Connector

Directly on NFS Data

Hadoop Analytics

HDFS

YARN

Map-Reduce HBase Spark

Page 21: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Next Steps

Download information at netapp.com/hadoop Technical Reports, Solution Guides, Cisco

Validated Designs, Solution Briefs Start a POC Engage NetApp or partner

Contact us [email protected] or [email protected] or NetApp System Engineer

21

Page 22: Successfully Deploying Alternative Storage Architectures ... · 2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved. Successfully Deploying Alternative Storage

2015 SNIA Analytics and Big Data Summit. © NetApp All Rights Reserved.

Thank You!

22