troubleshooting hadoop: distributed debugging

1© Cloudera, Inc. All rights reserved.

Troubleshooting Hadoop: Distributed DebuggingDustin Cote | Customer Operations Engineer


Roadmap

•The Hadoop Ecosystem

•What is Hadoop?

•What are some clear challenge areas?

•Debugging tools

•How do built-in Linux tools help?

•Where do we look for typical problems?

•Custom tooling to facilitate problem solving

•Deep dive example

•Application with intermittent failure

•Some data is bigger than others


The Hadoop Ecosystem


What is Hadoop?

•Top level Apache project for storing and processing large data sets•Originally an implementation of Google’s Mapreduce and Google File System papers•Since evolved to be the general platform for working with petabyte scale datasets• Specifically relevant for this presentation•Mostly implemented in Java•Users generally expand to 20+ other components that work with Hadoop•Master-Slave architecture•Commonly used “in the cloud”


Challenge Areas

• Infrastructure•Network sensitivity•Disk contention

• JVM Scaling•Garbage collection•Memory sizing

•Configuration management•Host inconsistencies•Platform config inconsistencies•Version tracking


Debugging tools


Linux-based utilities

•Hadoop runs on Linux

• Leverage existing skillsets

• Log parsing

•grep, sed, awk, perl, etc.

•Network health

• ifconfig, telnet, traceroute, tcpdump, etc.

•Process health

• top, ps, etc.

•System health

•dmesg, messages, etc.


Extending Linux-based utilities

•My application logs are 80GB!

• split, filter, slice, but how?

•ERROR is a good place to start

• zgrep when you have time

•Keywords for YARN applications

•ApplicationMaster, MRAppMAster

•FAIL, KILL, timed out

•Map those container IDs (container_XXXXX_XX)


JVM tools

•Mostly all Java means a mostly familiar toolkit

• jstack, jmap, jconsole, jps

•Careful with heap dumps, data processing JVMs can have 10+ GB heap sizes

•Garbage collection logging (-XX:+PrintGCDetails)

• Lots of different users, make sure you are running as the right user collecting JVM metrics

•Do not just run as root everywhere

•Sudo to the JVM owner when collecting jstacks and jmaps


Source code!

•Most of the code base is open source!

•Found a NullPointerException? Hop on github and find the line.

•https://github.com/apache/hadoop

•Even better, JIRA is available to see known issues

•Hadoop Common

•HDFS

•Mapreduce

•YARN


Log analysis helps identify anomalies

•Word counts are simple but powerful

•Tracking service logging overtime shows patterns

•Master tracking helps drill into which slaves may be unhealthy

Custom tooling


Configuration management is hard

•Validating configuration is in lock-step across all instances is ideal

•Keep configuration simple and logical

•At Cloudera, we pull whole cluster configurations for validation

Custom tooling


Deep dive examples


Example

• Initial complaint

•Mapreduce job shows “SUCCESSFUL” but does not generate an output

• Job was known to produce output on smaller datasets

•User environment

•~100 node cluster

•Running YARN with Mapreduce v2

• Job uses Kite SDK and Apache Crunch API (also open source)

• Job runs for several hours, reproducing is painful


Example

•Debugging the environment

•Searching on errors, first this was found

• 2015-04-20 15:40:04,938 WARN [Readahead Thread #1] org.apache.hadoop.io.ReadaheadPool: Failed readahead on ifile EINVAL: Invalid argument

• Bad disk? Probably not -- this job runs if the data is batch smaller

• User mailing lists confirm this is a false positive!

• File a JIRA and move on

• Other node problems? Probably not -- no indication of other jobs failing


Example

•Debugging the application

• Logging obtained through hadoop commands

• yarn logs -applicationId APP_ID > out.file

• logs are huge, need a strategy

• first check if a write-out failure is ignored -- was not

• check if any output data is created at all -- yes!

•output data is then destroyed when moving to final location -- bad, but why?


Example

•Debugging the application

•Need more information, let’s get DEBUG level logging

• Logs are already 80GB

• now we have even more data to sift through, let’s try to focus on the final move stage

• org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter makes that happen, let’s just raise that class to DEBUG

• success! we see this class toss aside the dataset, but why?

• code shows an int is being used to count records to output :(


Example 2

• Initial complaint

•Hive query that used to run in 10 minutes, now is not complete after 10 hours

•nothing has changed (!)

•User environment

•~50 node cluster

•Hive tables on scale of several hundred GB

•Query with JOIN operations


Example 2

•User may not be aware of changes, but what do the logs say?

•Hive generates mapreduce jobs deterministically based on:

•Table structure

•optimization flags

•HQL (SQL-like) query structure

•User shows it’s the same query and no properties have changed

•Back to those challenge areas (Infrastructure, JVM, Config management)


Example 2

•Config management is easiest

•Running from another client machine?

• cluster side default changes? (upgrades, patches, etc.)

• JVM is next easiest

• let’s pull in the Mapreduce logs again

•yarn logs -applicationId APP_ID > out.log

• 2015-11-24 17:56:46,324 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 3628000 rows for join key [00011]


Example 2

•Why so many rows for one join key?

•How many rows overall? ~1.8 billion!

• Just disk write throughput will take several hours (considering size)

•So, what changed?

•Configurations would not create more rows in the output

• JVM settings and memory management doesn’t seem likely

• Infrastructure was never going to be fast enough to do this in 10 minutes

•UAT testing was being used for performance baseline!

•Hadoop scales linearly only if you scale your data linearly :)


Thank you

troubleshooting hadoop: distributed debugging

Technology