running hadoop as service in altiscale platform

49
Experiences in running Hadoop As A Service [email protected] = #HadoopSherpa DAVID CHAIKEN • 21 NOVEMBER 2014

Upload: inmobi-technology

Post on 14-Jul-2015

322 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Running Hadoop as Service in AltiScale Platform

Experiences in running Hadoop As A Service [email protected] = #HadoopSherpa

DAVID CHAIKEN • 21 NOVEMBER 2014

Page 2: Running Hadoop as Service in AltiScale Platform

Talk Outline

Altiscale Company Introduction and Perspective

Altiscale Architecture

Use Cases: Performance, Job Analysis, Scheduling

Infinite Hadoop

Challenges to the Hadoop Community

Copyright  ©  2014  Al2scale,  Inc.  

Page 3: Running Hadoop as Service in AltiScale Platform

Corporate Background

Hadoop-as-a-Service (HaaS) innovator

Company founded in 2012 (Palo Alto & Chennai)

Founding team from Yahoo •  Raymie Stata, CEO, Former CTO

•  David Chaiken, CTO, Former Chief Architect

•  Charles Wimmer, Head of Operations, Former SRE

Employees from Yahoo, Google, Netflix, LinkedIn, VMware and others

Top-tier investors Copyright  ©  2014  Al2scale,  Inc.  

Page 4: Running Hadoop as Service in AltiScale Platform

Altiscale Chennai

Long-term colleagues from Yahoo and before

IIT Madras Research Park (back gate of IIT-M)

Architecture, Core Development, Test (Apache Bigtop)

Control Plane agile development, 2-week sprints

Next: Test++, Customer Support, Operations

Copyright  ©  2014  Al2scale,  Inc.  

Page 5: Running Hadoop as Service in AltiScale Platform

Everybody Loves Hadoop But…

Significant capex expenditure on infrastructure

•  Complex to manage and maintain

Time to get cluster up and running is long

Capacity planning is difficult

Skillset is difficult to recruit, train and retain

 What  about  the  cloud?  

Copyright  ©  2014  Al2scale,  Inc.  

Page 6: Running Hadoop as Service in AltiScale Platform

True Hadoop-as-a-Service

Altiscale is the industry’s first purpose-built, petabyte scale Hadoop cloud

•  Altiscale operates Hadoop for you •  Infrastructure optimized to run Hadoop

fast and reliably •  Pay for Hadoop service, not

infrastructure

Copyright  ©  2014  Al2scale,  Inc.  

Page 7: Running Hadoop as Service in AltiScale Platform

We Team With You To Help Deliver Insights

Poten2al  insights  from  a  flood  of  data  generated  by  the  

connected  world  

Our  Opera2ons  Team  and  Hadoop  Cloud  helps  realize  

those  insights  

+  

Customer   Al,scale  

Copyright  ©  2014  Al2scale,  Inc.  

Page 8: Running Hadoop as Service in AltiScale Platform

Customers

Copyright  ©  2014  Al2scale,  Inc.  

Page 9: Running Hadoop as Service in AltiScale Platform

How We Do It

Virtual  Hadoop  Cluster  

 YARN  Service  

HDFS  Service  

More  Apps  

File  Transfer  

 KaRa  Flume  

Data  Connect  

Hive   Pig   Oozie  

Pre-­‐configured  Apps   We  op2mize  the  job  to  complete  fast  

and  cost-­‐effec2vely  

Your  data  is  migrated  to  HDFS  

and  a  virtual  Hadoop  cluster  in  

our  cloud  

Our  Hadoop  Helpdesk  gives  you  access  to  Hadoop  experts  

Our  Hadoop  Opera2ons  Team  maintains  the  

cluster  and  plans  the  job  

Our  team  monitors  and  manages  the  job  through  to  comple2on  

We  provide  an  up2me  SLA  so  our  Hadoop  

cloud  is  always  available  Copyright  ©  2014  Al2scale,  Inc.  

Page 10: Running Hadoop as Service in AltiScale Platform

Altiscale Architecture: Data and Control Planes

Copyright  ©  2014  Al2scale,  Inc.  

Page 11: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Data and Control Planes

Page 12: Running Hadoop as Service in AltiScale Platform

Altiscale Architecture: Customer Environments

Copyright  ©  2014  Al2scale,  Inc.  

Page 13: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: O&O Hadoop Cluster

Page 14: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Host Components

Page 15: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Workbenches

Page 16: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Data Transfer

Page 17: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Portal and REST API

Page 18: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Control Plane Databases

Page 19: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Control Plane Services

Page 20: Running Hadoop as Service in AltiScale Platform

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Hadoop-Based Analysis

Page 21: Running Hadoop as Service in AltiScale Platform
Page 22: Running Hadoop as Service in AltiScale Platform

Hadoop as a Service Offering

Data is migrated to our HDFS service HDFS  Service  

Data  Connectors  

Foundry  Apps  Apache  Mahout    Cascading    Revolu2on  R  KaRa/Camus  Avro    Pentaho  Ke\le    Matlab  Spark  Sqoop  H2O  

Core  Apps  Apache  Hive    Apache  Pig    Apache  Oozie    Apache  HCatalog    Apache  Flume    R  JDK/JRE  Python  H\pFS  FUSE  LZOP,  Snappy,  gzip  

Terminal access to Hadoop cluster and associated apps

Portal provides job status, billing and support information

1  

2  

3  

Copyright  ©  2014  Al2scale,  Inc.  

Page 23: Running Hadoop as Service in AltiScale Platform

Challenges…

Copyright  ©  2014  Al2scale,  Inc.  

Page 24: Running Hadoop as Service in AltiScale Platform

Disks: Configuration, Controllers, Density, Cost

Network: Jumbo Packet MTU

Memory: echo never > \

/sys/kernel/mm/redhat_transparent_hugepage/enabled

Network: When does locality matter?

Flash: When to use SSD?

Performance Challenges…

Copyright  ©  2014  Al2scale,  Inc.  

Page 25: Running Hadoop as Service in AltiScale Platform

Customer provided Hive query + data sets (100GBs to ~5 TBs) Needed help optimizing the query Didn’t rewrite query immediately Wanted to characterize query performance and isolate bottlenecks first

Customer Case Study: Analyze Query

Page 26: Running Hadoop as Service in AltiScale Platform

Ran original query on the datasets in our environment: •  Two M/R Stages: Stage-1, Stage-2

Long running reducers run out of memory •  set mapreduce.reduce.memory.mb=5120!•  Reduces slots and extends reduce time

Query fails to launch Stage-2 with out of memory •  set HADOOP_HEAPSIZE=1024 on client machine

Query has 250,000 Mappers in Stage-2 which causes failure

•  set mapred.max.split.size=5368709120 to reduce Mappers

Analyze and Tune Execution

Page 27: Running Hadoop as Service in AltiScale Platform

Next challenge - how to visualize job execution? Existing hadoop/hive logs not sufficient for this task Wrote internal tools

•  parse job history files •  plot mapper and reducer execution

Analysis: Job Execution Characteristics

Page 28: Running Hadoop as Service in AltiScale Platform

Analysis: Map (Stage-1)

Page 29: Running Hadoop as Service in AltiScale Platform

Single  reduce  task  

Analysis: Reduce (Stage-1) Long Tail

Page 30: Running Hadoop as Service in AltiScale Platform

Analysis: Map (Stage-2)

Page 31: Running Hadoop as Service in AltiScale Platform

Analysis: Reduce (Stage-2)

Page 32: Running Hadoop as Service in AltiScale Platform

Lone, long running reducer in first stage of query Analyzed input data:

•  Query split input data by userId •  Bucketizing input data by userId •  One very large bucket: “invalid” userId •  Discussed “invalid” userid with customer

An error value is a common pattern! •  Need to differentiate between “Don’t know and don’t care”

or “don’t know and do care.”

Analysis Execution: Findings

Page 33: Running Hadoop as Service in AltiScale Platform

Loading data into DRAM makes processing fast! Examples: Spark, Impala, 0xdata, …, [SAP HANA], … Streaming systems (Storm, DataTorrent) may be similar Need to increase YARN container memory size

Interactive (DRAM-centric) Processing Systems

Page 34: Running Hadoop as Service in AltiScale Platform

Caution: larger YARN container settings for interactive jobs may not be right for batch systems like Hive Container size: needs to combine vcores and memory: yarn.scheduler.maximum-allocation-vcores yarn.nodemanager.resource.cpu-vcores ...!

Hive + Interactive: Watch Out for Container Size

Page 35: Running Hadoop as Service in AltiScale Platform

Attempting to schedule interactive systems and batch systems like Hive may result in fragmentation Interactive systems may require all-or-nothing scheduling Batch jobs with little tasks may starve interactive jobs

Hive + Interactive: Watch Out for Fragmentation

Page 36: Running Hadoop as Service in AltiScale Platform

Solutions for fragmentation… Reserve interactive nodes before starting batch jobs Reduce interactive container size (if the algorithm permits) Node labels (YARN-726) and gang scheduling (YARN-624)

Hive + Interactive: Watch Out for Fragmentation

Page 37: Running Hadoop as Service in AltiScale Platform

Altiscale’s point of view on Hadoop as a Service:

•  sell HDFS in increments of 10 TB

•  sell compute in increments of 10K TaskHours/Month

We market Infinite Hadoop, and provide services so that customers need not worry about cluster nodes.

But Apache Hadoop user interfaces provide node-oriented view of clusters…

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale: Hadoop Storage and Compute

Page 38: Running Hadoop as Service in AltiScale Platform

ResourceManager User Interface

Copyright  ©  2014  Al2scale,  Inc.  

Page 39: Running Hadoop as Service in AltiScale Platform

ResourceManager User Interface

Copyright  ©  2014  Al2scale,  Inc.  

Page 40: Running Hadoop as Service in AltiScale Platform

NameNode User Interface

Copyright  ©  2014  Al2scale,  Inc.  

Page 41: Running Hadoop as Service in AltiScale Platform

NameNode User Interface

Copyright  ©  2014  Al2scale,  Inc.  

Page 42: Running Hadoop as Service in AltiScale Platform

Feedback from Customers Storage plan normally easy to estimate

Compute plan is hard to estimate •  Customer pain point: achieving necessary

computation needs sometimes requires more peak compute capacity than provided by the number of nodes required for storage

•  Opportunity: average compute often requires less than the number of nodes required for storage

Copyright  ©  2014  Al2scale,  Inc.  

Page 43: Running Hadoop as Service in AltiScale Platform

Solution: Change Altiscale’s Product! Make “Infinite” computation available to customers

Multitenancy implementation phases, each of which includes a milestone with production deliverables

0. Automation for burn/add/remove nodes 1. Deploy Linux containers using Docker 2. Decouple compute/storage + manual bursting 3. Automation: orchestrate add/remove nodes according to

allocation plan from the capacity team. 4. Optimized: predictive allocation, economic incentives

Copyright  ©  2014  Al2scale,  Inc.  

Page 44: Running Hadoop as Service in AltiScale Platform

Physical Cluster per Customer

Copyright  ©  2014  Al2scale,  Inc.  

Page 45: Running Hadoop as Service in AltiScale Platform

NM and DN in Docker Containers

Copyright  ©  2014  Al2scale,  Inc.  

Page 46: Running Hadoop as Service in AltiScale Platform

Decouple Compute/Storage

Copyright  ©  2014  Al2scale,  Inc.  

Page 47: Running Hadoop as Service in AltiScale Platform

What Customers Get On demand access to “Infinite” Computation

Ability to handle unexpected needs without contacting Altiscale

“Access to a $10M cluster for just $1M”

Future…

Ability to package Hadoop job environment using Docker (YARN-1964)

Copyright  ©  2014  Al2scale,  Inc.  

Page 48: Running Hadoop as Service in AltiScale Platform

Hive + Hadoop debugging can get very complex •  Sifting through many logs and screens

•  Automatic transmission versus manual transmission

Static partitioning induced by Java Virtual Machine has benefits but also induces challenges. Where there are difficulties, there’s opportunity:

•  Better tooling, instrumentation, integration of logs/metrics

YARN still evolving into an operating system Just starting to build real multitenancy into Hadoop. Hadoop as a Service: aggregate and share expertise

Challenges to the Hadoop Community

Page 49: Running Hadoop as Service in AltiScale Platform