toward progress indicators on steroids for big data systems

Progress Indicator on Steroids for Big Data Systems

Jiexing Li#*, Rimma Nehme*, Jeff Naughton#*#University of Wisconsin-Madison*Microsoft Jim Gray Systems LabToward Progress Indicators on Steroids for Big Data SystemsExplosive growth in the complexity, diversity, number of deployments, and capabilities of big data processing systems.Explosive growth in big data systems2[Ailamaki et al., 2011]PDW Engine

PDW Store (SQL Server)Azure Engine

Azure StoreHDFSNepheleAsterix BTreeAmazon S3MySQL indexesHadoop Map/Reduce EngineHyracksruntime/execution enginesSQLMap/Reduce Programming ModelCascadingPigHiveJaqlPACTAQLSimple DB KVSSimpleDB PMPNutsHBaseAlgebrixprogramming modelsWith the unprecedented data growth, we need to store, archive, and most importantly analyze increasingly large datasets.

2They are large and complex beasts.

To operate them efficiently, we need information about what is going on in the system.Big data systems3Node 1Node 2Node nThousands of serversTask 1Task 2Task nData 1Data 2Data nInstantaneous snapshot information is important and nice, but not sufficient.

We also need to know what it will look like in the future.Need to know the future state4Node 1Node 2Node nData 1Data 2Data nTask 1Task 2Task nCPU overload!Bad disk!Lack of memory!Predicting the future in these systems is difficult or impossible.Dont require perfect predictions:Instead, anticipate the presence of errors.Detect them and react as time progresses.

Progress indicators fit this predict, monitor, revise paradigm really well.Need predict, monitor, revise paradigm5One-shot predict and ignorePredict, monitor, and reviseUnreliableA PI provides feedback to users on how much of the task has been completed or when the task will finish.Begins with a prediction of the query progress, and while query executes, modifies the prediction based on the perceived information.

But they are currently too weak and limited for big data systems.

Progress indicator (PI)

Our goal: to advocate for the creation and research into progress indicators on steroids.Use more practical evaluation metrics to depict quality. Expand the class of computations they can serve.Expand the kinds of information they can provide.Continue to increase the accuracy of their prediction.

Progress indicator on steroids7What?How?7Change our way of evaluating progress indicator technology.

Our vision8Helpful for specific tasksAccurate when nothing changesReact to changes quickly Accuracy

(Current PIs)Progress indicatorsExpand the class of computations they can serve.

Our vision (cont.) 9OptimizerStraggler/skew handlerSchedulerResource managerPerformance debuggerUser interface

(Current PIs)Progress indicatorsUser interface Handling stragglers and skewScheduling Determine a better execution order to avoid contentions or to share computation (e.g., synchronization scans).Resource managementGrow/shrink a database based on service-level agreements (SLA), allocate tasks to machines (e.g., with high processing speeds). Query optimizationStop bad plans, select plans based on future resource availability.Performance debugging

9Expand the kinds of information they can provide.

Our vision (cont.) 10Disk fragmentationStraggling tasksGood/bad machinesResource availabilityAutomatic failure diagnosisp% or time

(Current PIs)Progress indicatorsUser interface Handling stragglers and skewScheduling Determine a better execution order to avoid contentions or to share computation (e.g., synchronization scans).Resource managementGrow/shrink a database based on service-level agreements (SLA), allocate tasks to machines (e.g., with high processing speeds). Query optimizationStop bad plans, select plans based on future resource availability.Performance debugging

10A progress score provided by Pig for a MapReduce job:Divide it into 4 phases.For a phase, the score is the percentage of data read/processed.The overall progress for the job is the average of these 4 scores.

This is a very rough estimate, which assumes that each phase contributes equally to the overall score.

A promising simple example11Record Reader Map CombineCopySortReduceMap taskReduce taskHadoop uses these progress estimates to select stragglers and schedule backup executions on other machines.Improved execution time by 44%. [Dean et al., OSDI, 2004]Improved execution time further by a factor of 2. [Zaharia et al., OSDI, 2008]

A promising simple example (cont.)12Straggler: a task that makes much less progress than tasks in its category.Node 1Node 2Node nP1%P2%Pn%StragglerBackup executionAlready deployed!Simple and rough estimates, but really helpful!One line of research: retargeting even todays simple progress indicators to new systems can be interesting and challenging.Think: complexity and diversity of different data processing systems

Example: We attempted to apply a debug run-based PI developed for MapReduce jobs to parallel database systems.

Achieving vision requires research13For a query plan, estimates the processing speed for each phase/pipeline using information from earlier (debug) runs.

The idea of a debug run-based PI14Data1. Original data2. Sample data3. Execute the job[Morton et al., SIGMOD, 2010]4. Calculate the processing speed (e.g., how many bytes can be processed per second) for each phase.5. Remaining time (RT) = remaining data/speed.

This worked very well for map-reduce jobs.But what happens when we apply this debug-run approach to a parallel database system?We ran a simple experiment to find out.

Questions:15Implemented the progress indicator in SQL Server PDW.Cluster: 18 nodes (1 control node, 1 data landing node, and 16 compute nodes).Connected with 1Gbit Ethernet switch2 Intel Xeon L5630 quad-core processors32 GB memory (at most 24 GB for DBMS)10 300 GB hard drivers (8 disks for data)

Experimental setup16Database: 1TB TPC-H.Each table is either hash partitioned or replicated.When a table is hash partitioned, each compute node contains 8 horizontal data partitions (8*16 in total).

Experimental setup (cont.)17TablePartition keyTablePartition keyCustomerc_custkeyPartp_partkeyLineiteml_oderkeyPartsuppps_partkeyNation(replicated)Region(replicated)Orderso_orderkeySuppliers_supplykeyTPC-H Q1: no joins, 7 pipelines, and the speed estimates are accurate.Debug run-based PI can work well18TPC-H Q4: Later joins in the debug run have very few tuples to process.Complex queries are more challenging1900Percentage: 1%, 0.01%, 0.0001%, 0%, 0%.1% of the 1TB dataCost-based optimization may yield different plans for sampled versus entire dataset.Optimizer also presents challenges20Original dataTableScan [l]FilterHash MatchTableScan [o]Shuffle MoveSampleTableScan [l]FilterNested LoopTableScan [o]Broadcast MoveOnly 6 out of 22 TPC-H queries used the same plans. Even a simple task (porting debug run-based PI from MapReduce to parallel DBMS) is challenging.New ideas needed to make it work.How to build progress indicators for variety of systems for variety of uses is a wide-open problem.Conclusion from experiment.21OperatorsWork and speed estimationPipeline definition & shapeDynamicityStatisticsParallelismSome specific technical challenges22A promising direction, but still a really long way to go!Proposed and discussed the desirability of developing progress indicators on steroids.Issues to consider include:Evaluation metrics. Computations to serve.Information to provide.Accuracy.Small case study illustrates that even small steps toward progress indicators on steroids require effort and careful thought.Conclusions

toward progress indicators on steroids for big data systems

Documents

data ntask

big data systems2ailamaki

big data systems3node

query progress

unprecedented data growth

big data systems jiexing

progress indicator pi

progress indicator technology