Optimizing Hive Queries

Download Optimizing Hive Queries

Post on 15-Jan-2015




2 download

Embed Size (px)


Owen O'Malley gave a talk at Hadoop Summit EU 2013 about optimizing Hive queries.


<ul><li> 1. Optimizing Hive QueriesOwen OMalleyFounder and Architectowen@hortonworks.com@owen_omalley Hortonworks Inc. 2013: Page 1</li></ul> <p> 2. Who Am I?Founder and Architect at Hortonworks Working on Hive, working with customer Formerly Hadoop MapReduce &amp; Security Been working on Hadoop since beginningApache Hadoop, ASF Hadoop PMC (Original VP) Tez, Ambari, Giraph PMC Mentor for: Accumulo, Kafka, Knox Apache Member Hortonworks Inc. 2013 Page 2 3. OutlineData LayoutData FormatJoinsDebugging Hortonworks Inc. 2013 Page 3 4. Data LayoutLocation, Location, Location Hortonworks Inc. 2013Page 4 5. Fundamental QuestionsWhat is your primary use case?What kind of queries and filters?How do you need to access the data?What information do you need together?How much data do you have?What is your year to year growth?How do you get the data? Hortonworks Inc. 2013Page 5 6. HDFS CharacteristicsProvides Distributed File SystemVery high aggregate bandwidthExtreme scalability (up to 100 PB)Self-healing storageRelatively simple to administerLimitationsCant modify existing filesSingle writer for each fileHeavy bias for large files ( &gt; 100 MB) Hortonworks Inc. 2013Page 6 7. Choices for LayoutPartitionsTop level mechanism for pruningPrimary unit for updating tables (&amp; schema)Directory per value of specified columnBucketingHashed into a file, good for samplingControls write parallelismSort orderThe order the data is written within file Hortonworks Inc. 2013 Page 7 8. Example Hive LayoutDirectory Structurewarehouse/$database/$tablePartitioning/part1=$partValue/part2=$partValueBucketing/$bucket_$attempt (eg. 000000_0)SortEach file is sorted within the file Hortonworks Inc. 2013 Page 8 9. Layout GuidelinesLimit the number of partitions1,000 partitions is much faster than 10,000Nested partitions are almost always wrongGauge the number of bucketsCalculate file size and keep big (200-500MB)Dont forget number of files (Buckets * Parts)Layout related tables the same wayPartitionBucket and sort order Hortonworks Inc. 2013Page 9 10. NormalizationMost databases suggest normalizationKeep information about each thing togetherCustomer, Sales, Returns, Inventory tablesHas lots of good properties, butIs typically slow to queryOften best to denormalize during loadWrite once, read many timesAdditionally provides snapshots in time. Hortonworks Inc. 2013Page 10 11. Data FormatHow is your data stored? Hortonworks Inc. 2013Page 11 12. Choice of FormatSerdeHow each record is encoded?Input/Output (aka File) FormatHow are the files stored?Primary ChoicesTextSequence FileRCFileORC (Coming Soon!) Hortonworks Inc. 2013Page 12 13. Text FormatCritical to pick a SerdeDefault - ^As between fieldsJSON top level JSON recordCSV commas between fields (on github)Slow to read and writeCant split compressed filesLeads to huge mapsNeed to read/decompress all fields Hortonworks Inc. 2013 Page 13 14. Sequence FileTraditional MapReduce binary file formatStores keys and values as classesNot a good fit for Hive, which has SQL typesHive always stores entire row as valueSplittable but only by searching fileDefault block size is 1 MBNeed to read and decompress all fields Hortonworks Inc. 2013Page 14 15. RC (Row Columnar) FileColumns stored separatelyRead and decompress only needed onesBetter compressionColumns stored as binary blobsDepends on metastore to supply typesLarger blocks4 MB by defaultStill search file for split boundary Hortonworks Inc. 2013Page 15 16. ORC (Optimized Row Columnar)Columns stored separatelyKnows typesUses type-specific encodersStores statistics (min, max, sum, count)Has light-weight indexSkip over blocks of rows that dont matterLarger blocks256 MB by defaultHas an index for block boundaries Hortonworks Inc. 2013Page 16 17. ORC - File Layout Hortonworks Inc. 2013 Page 17 18. Example File Sizes from TPC-DS Hortonworks Inc. 2013 Page 18 19. CompressionNeed to pick level of compressionNoneLZO or Snappy fast but sloppyBest for temporary tablesZLIB slow and completeBest for long term storage Hortonworks Inc. 2013 Page 19 20. JoinsPutting the pieces together Hortonworks Inc. 2013 Page 20 21. Default AssumptionHive assumes users are either:NoobiesHive developersDefault behavior is always finishLittle Engine that Could!Experts could override default behaviorsGet better performance, but riskierWere working on improving heuristics Hortonworks Inc. 2013 Page 21 22. Shuffle JoinDefault choiceAlways works (Ive sorted a petabyte!)Worst case scenarioEach processReads from part of one of the tablesBuckets and sorts on join keySends one bucket to each reduceWorks everytime! Hortonworks Inc. 2013Page 22 23. Map JoinOne table is small (eg. dimension table)Fits in memoryEach processReads small table into memory hash tableStreams through part of the big fileJoining each record from hash tableVery fast, but limited Hortonworks Inc. 2013Page 23 24. Sort Merge Bucket (SMB) JoinIf both tables are:Sorted the sameBucketed the sameAnd joining on the sort/bucket columnEach process:Reads a bucket from each tableProcess the row with the lowest valueVery efficient if applicable Hortonworks Inc. 2013 Page 24 25. DebuggingWhat could possibly go wrong? Hortonworks Inc. 2013 Page 25 26. Performance QuestionWhich of the following is faster?select count(distinct(Col)) from Tblselect count(*) from (select distict(Col) from Tbl) Hortonworks Inc. 2013Page 26 27. Count Distinct Hortonworks Inc. 2013 Page 27 28. AnswerSurprisingly the second is usually fasterIn the first case:Maps send each value to the reduceSingle reduce counts them allIn the second case:Maps split up the values to many reducesEach reduce generates its listFinal job counts the size of each listSingleton reduces are almost always BAD Hortonworks Inc. 2013Page 28 29. Communication is Good!Hive doesnt tell you what is wrong.Expects you to know!Lucy, you have some splaining to do!Explain tool provides query planFilters on inputNumbers of jobsNumbers of maps and reducesWhat the jobs are sorting byWhat directories are they reading or writing Hortonworks Inc. 2013 Page 29 30. Blinded by ScienceThe explanation tool is confusing.It takes practice to understand.It doesnt include some critical details like partition pruning.Running the query makes things clearer!Pay attention to the detailsLook at JobConf and job history files Hortonworks Inc. 2013 Page 30 31. SkewSkew is typical in real datasets.A user complained that his job was slowHe had 100 reduces98 of them finished fast2 ran really slowThe key was a boolean Hortonworks Inc. 2013Page 31 32. Root Cause AnalysisAmbariApache project building Hadoop installation and management toolProvides metrics (Ganglia &amp; Nagios)Root Cause AnalysisProcesses MapReduce job logsDisplays timing of each part of query plan Hortonworks Inc. 2013Page 32 33. Root Cause Analysis Screenshots Hortonworks Inc. 2013Page 33 34. Root Cause Analysis Screenshots Hortonworks Inc. 2013Page 34 35. Thank You!Questions &amp; Answers@owen_omalley Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL &amp; PROPRIETARY INFORMATION Page 35 36. ORCFile - Comparison RC File Trevni ORC File Hive Type Model N NY Separate complex columnsN YY Splits found quicklyN YY Default column group size 4MB 64MB*250MB Files per a bucket1 &gt;1 1 Store min, max, sum, countN NY Versioned metadataN YY Run length data encodingN NY Store strings in dictionary N NY Store row count N YY Skip compressed blocksN NY Store internal indexesN NY Hortonworks Inc. 2013 Page 36 </p>