unraveling hadoop meltdown mysteries

Post on 11-Aug-2014

302 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

As powerful and flexible as Hadoop is, jobs still sometimes fail or thrash unpredictably. Pepperdata co-founder and CEO Sean Suchter, one of the first commercial users of Hadoop in the early days at Yahoo, will give real-world examples of Hadoop meltdowns complete with metrics and what we can learn from them. He'll also show how to automatically increase Hadoop cluster throughput through fine-grained job hardware usage visibility.

TRANSCRIPT

Meltdown MysteriesSean Suchter

Disks are thrashing!

Solution

• Make job author aware of surprising behavior.

• Modify job code & settings to be nicer to disks.

Nodes are dying!

Initial diagnosis…• Nodes abruptly started swapping and

becoming non-responsive. (Required physical power cycling)

• Job submitters report “I didn’t change anything”

• Question: What’s doing this to the cluster?

Cause & solution• While the job didn’t change, its input data did.

• Stop that user’s jobs immediately.

• Better use of capacity scheduler virtual memory controls.

• Use Pepperdata protection to limit physical memory as well.

Take-away

• You see problems at the node level.

• You see the root causes at the task level.

Pepperdata meetup tomorrow!

• War Stories from the Hadoop Trenches

• Allen Wittenauer (Apache Hadoop committer and former LinkedIn)

• Eric Baldeschwieler (former Hortonworks CEO / CTO)

• Todd Nemet (Looker; former Altiscale, ClearStory Data, Cloudera)

• 6pm Wed 6/25

• Firehouse Brewery, 111 S Murphy, Sunnyvale

• http://www.meetup.com/pepperdata/

top related