let's talk operations! (hadoop summit 2014)

Post on 26-Jan-2015

115 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

TRANSCRIPT

Let’s Talk Operations!Allen Wittenauer!

Twitter: @_a__w_ Email: aw @ apache.org!

How many individual grids should I have?

One big grid

Grid per project

• Pros!• Lower ops overhead!• One location for all data!

• Cons !• Dev and Prod on one

system

• Pros!• Capacity planning per project!

• Cons !• More headcount to maintain!• Multiple copies of data!• Data ingress is a mess

Data Center

Production

ETL

Development

ETL

Dev Prod

Base ETL Pull

Event FeedsDatabase Feeds

Base ETL Pull

Base ETL PullPost-Processed

Data

DC2DC1

Production

ETL

Development

How do I solve some common distcp issues?

• Common issues!• Version incompatibilities!• Network bandwidth consumption!!

• Some tricks!• Use WebHDFS!

• All modern versions support it!• Read and write in both directions!

• Create a separate queue with hard limits!• Pull from larger, push from smaller

Q&A

Allen  Wittenauer  Twitter:  @_a__w_ Email:  aw  @  apache.org  

Bonus Slide!

20 GB /, ... 200 GB task space (rest) HDFS

• root partitioning !!!!!

• non-root partitioning

5 GB swap 200 GB task space (rest) HDFS

top related