1 cyberinfrastructure and geospatial information laboratory (cigi)

29
1 Babak Behzad Babak Behzad 1,3 , Yan Liu , Yan Liu 1,2,4 1,2,4 , Eric Shook , Eric Shook 1,2 1,2 , Michael P. Finn , Michael P. Finn 5 , , David M. Mattli David M. Mattli 5 and Shaowen Wang and Shaowen Wang 1,2,3,4 1,2,3,4 1 CyberInfrastructure and Geospatial Information CyberInfrastructure and Geospatial Information Laboratory (CIGI) Laboratory (CIGI) 2 Department of Geography and Geographic Information Department of Geography and Geographic Information Science Science 3 Department of Computer Science Department of Computer Science 4 National Center for Supercomputing Applications (NCSA) National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign 5 Center of Excellence for Geospatial Information Science U.S. Geological Survey (USGS) AutoCarto’12 AutoCarto’12 A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse- Scale Spatial Raster Data

Upload: yannis

Post on 22-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Babak Behzad 1,3 , Yan Liu 1,2,4 , Eric Shook 1,2 , Michael P. Finn 5 , David M. Mattli 5 and Shaowen Wang 1,2,3,4. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

11

Babak BehzadBabak Behzad1,3, Yan Liu, Yan Liu1,2,41,2,4, Eric Shook, Eric Shook1,21,2, Michael P. Finn, Michael P. Finn55, , David M. MattliDavid M. Mattli55 and Shaowen Wang and Shaowen Wang1,2,3,41,2,3,4

11CyberInfrastructure and Geospatial Information Laboratory (CIGI)CyberInfrastructure and Geospatial Information Laboratory (CIGI)22Department of Geography and Geographic Information ScienceDepartment of Geography and Geographic Information Science

33Department of Computer ScienceDepartment of Computer Science44National Center for Supercomputing Applications (NCSA)National Center for Supercomputing Applications (NCSA)

University of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-Champaign55Center of Excellence for Geospatial Information Science

U.S. Geological Survey (USGS)

AutoCarto’12AutoCarto’12

A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data

Page 2: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

OutlineOutline

OverviewOverview– Map re-projectionMap re-projection– pRasterBlaster: HPC Solution to Map pRasterBlaster: HPC Solution to Map

Re-ProjectionRe-Projection Performance ProfilingPerformance Profiling

– pRasterBlaster Computational and pRasterBlaster Computational and Scaling BottlenecksScaling Bottlenecks

ConclusionConclusion

22

Page 3: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

IntroductionIntroduction Map re-projection Map re-projection

– A important cartographic operation A important cartographic operation Desktop application: mapIMGDesktop application: mapIMG

– Challenges exist when scaling for Challenges exist when scaling for coarse-scale spatial datasetcoarse-scale spatial dataset

– Re-projecting a 1GB raster dataset can Re-projecting a 1GB raster dataset can take 45-60 minutestake 45-60 minutes

Parallel computing techniques will Parallel computing techniques will help scaling to large datasetshelp scaling to large datasets– Raster was born to be parallelizedRaster was born to be parallelized

Page 4: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Parallelizing Map Re-Parallelizing Map Re-ProjectionProjection

Map re-projection on large dataset is too slow or Map re-projection on large dataset is too slow or even impossible on desktop machineseven impossible on desktop machines

pRasterBlasterpRasterBlaster– mapIMG in HPC (High-Performance Computing) mapIMG in HPC (High-Performance Computing)

environmentenvironment– Early DaysEarly Days

Row-wise decompositionRow-wise decomposition I/O occurred directly in program inner loopI/O occurred directly in program inner loop

– Rigorous geometry handling and novel resamplingRigorous geometry handling and novel resampling Resampling options for categorical data and population counts Resampling options for categorical data and population counts (also (also

standard continuous data resampling methods)standard continuous data resampling methods)– Able to project/re-project large maps in short amount of Able to project/re-project large maps in short amount of

timetime

Page 5: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

pRasterBlasterpRasterBlaster

Fast and accurate raster re-projection in Fast and accurate raster re-projection in three (primary) stepsthree (primary) steps

Step 1: Calculate and partition output spaceStep 1: Calculate and partition output space Step 2: Read input and re-projectStep 2: Read input and re-project Step 3: Combine temporary filesStep 3: Combine temporary files

Page 6: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Performance Profiling: Performance Profiling: Motivation and ObjectivesMotivation and Objectives

Exploit performance Exploit performance profiling tools to make profiling tools to make pRasterBlaster more pRasterBlaster more scalable and efficientscalable and efficient– Early version was not Early version was not

scalable to large number scalable to large number of processors of processors

– Resolve computational Resolve computational bottlenecks to allow bottlenecks to allow pRasterBlaster leverage pRasterBlaster leverage thousands of processorsthousands of processors

Demonstrate Demonstrate techniques of using techniques of using performance profilers performance profilers – Potentially useful many Potentially useful many

GIS applicationsGIS applications

Page 7: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

What is performance What is performance profiling?profiling?

A form of dynamic program analysisA form of dynamic program analysis MeasuresMeasures

– memory footprint of programmemory footprint of program– time complexity of program time complexity of program – usage of particular instructionsusage of particular instructions– frequency and duration of function callsfrequency and duration of function calls

Aids program optimizationAids program optimization

77

Page 8: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

How do profilers work?How do profilers work?

Statistical profilersStatistical profilers– Operate by samplingOperate by sampling– Probes the program at regular Probes the program at regular

intervalsintervals– Pros: Low overheadPros: Low overhead– Cons: Typically less numerically Cons: Typically less numerically

accurate and specificaccurate and specific

88

Page 9: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

How do profilers work?How do profilers work? Instrumenting profilersInstrumenting profilers

– Instrument target programs with additional Instrument target programs with additional instructions to collect required informationinstructions to collect required information

– Pros: Much more accurate than statistical Pros: Much more accurate than statistical profilersprofilers

– Cons: Potentially slow the program (since new Cons: Potentially slow the program (since new instructions are added)instructions are added)

Different kinds of instrumenting profilersDifferent kinds of instrumenting profilers– Manual instrumentingManual instrumenting

Done by the programmersDone by the programmers– Automatic profilers Automatic profilers

Software instruments automatically Software instruments automatically TAU and IPM used in this research.TAU and IPM used in this research.

99

Page 10: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Manual InstrumentingManual Instrumenting The traditional way of instrumenting C code is with the time system

call, provided by the time.h library. Here is a code fragment that demonstrates its use:

#include <sys/time.h> #include <sys/time.h> int main(void) { int main(void) { time_t start, finish; time_t start, finish; ... ... time(&start); time(&start); /* section to be timed */ /* section to be timed */ time(&finish); time(&finish); printf("Elapsed time: %d\n", finish - start); ... printf("Elapsed time: %d\n", finish - start); ... ......}}

1010

Page 11: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Manual Instrumenting Manual Instrumenting in Parallel Programsin Parallel Programs Instrument the portion of the program running on individual processorsInstrument the portion of the program running on individual processors#include <sys/time.h> #include <sys/time.h> int main(void) { int main(void) { time_t start, finish; time_t start, finish; ... ... time(&start); time(&start); /* section to be timed */ /* section to be timed */ time(&finish); time(&finish); printf("Elapsed time printf("Elapsed time on Process %don Process %d: %d\n", : %d\n", my_rankmy_rank, ,

finish - start); ... finish - start); ... ......}}

1111

Page 12: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

IPMIPM(Integrated Performance (Integrated Performance Monitoring)Monitoring)

IPM is a portable profiling infrastructure for MPI programs– Provides a low-overhead performance profile of the

performance aspects and resource utilization of the parallel program

– Communication, computation, and IO are the primary focus

– http://ipm-hpc.sourceforge.net We initially profiled pRasterBlaster with IPM to

understand how communication, computation and IO usage breakdown for this application

1212

Page 13: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

TAUTAU (Tuning and Analysis Utilities)

TAU performance system is a portable profiling and tracing toolkit – Analysis of parallel programs written in Fortran, C, C+

+, Java, Python– http://tau.uoregon.edu

TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and state

IPM is designed to profile MPI applications, while TAU is used to profile any kind of parallel applications

1313

Page 14: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

TAU for pRasterBlasterTAU for pRasterBlaster

1414

Page 15: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

TAU for pRasterBlasterTAU for pRasterBlaster

1515

Page 16: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Computational Bottleneck I: SymptomBottleneck I: Symptom

Page 17: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Computational Bottleneck I: SymptomBottleneck I: Symptom

Page 18: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Computational Bottleneck I: SymptomBottleneck I: Symptom

Page 19: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Cause: Workload Cause: Workload Distribution IssueDistribution Issue

N rows on P processor coresWhen P is small When P is big

Page 20: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Solution: Load Solution: Load BalancingBalancing

2020

N rows on P processor coresWhen P is small When P is big

Page 21: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Computational Bottleneck I: SummaryBottleneck I: Summary SymptomSymptom

– Load imbalance Load imbalance – Detected by TAU firstDetected by TAU first– Verified by manual instrumentingVerified by manual instrumenting

CauseCause– Workload distribution algorithm problem Workload distribution algorithm problem

(not obvious on small platforms)(not obvious on small platforms) SolutionSolution

– Revised algorithm for distributing workloadRevised algorithm for distributing workload

2121

Page 22: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: SymptomII: Symptom

2222

Page 23: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: SymptomII: Symptom

2323

Page 24: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: CauseII: Cause

Page 25: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: AnalysisII: Analysis Spatial data-dependent performance Spatial data-dependent performance

anomalyanomaly– The anomaly is data dependentThe anomaly is data dependent– Four corners of the raster were processed by Four corners of the raster were processed by

processors whose indexes are close to the two processors whose indexes are close to the two endsends

Exception handling in C++ is costlyException handling in C++ is costly– Coordinate transformation on nodata area was Coordinate transformation on nodata area was

handled as an exceptionhandled as an exception SolutionSolution

– Remove C++ exception handling partRemove C++ exception handling part

2525

Page 26: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: Performance II: Performance ImprovementImprovement

Page 27: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Computational Bottleneck Computational Bottleneck II: SummaryII: Summary Symptom Symptom

– Processors responsible for polar regions Processors responsible for polar regions spent more time than those processing spent more time than those processing equatorial regionequatorial region

CauseCause– Corner cells were mapped to invalid input Corner cells were mapped to invalid input

raster cells generating exceptionsraster cells generating exceptions– C++ exception handling was expensiveC++ exception handling was expensive

Solution Solution – Removed C++ exception handlingRemoved C++ exception handling– Corner cells need not to be processedCorner cells need not to be processed

They now contribute less time of computation They now contribute less time of computation

2727

Page 28: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

ConclusionsConclusions

Performance profiling identified Performance profiling identified computational bottlenecks in pRasterBlastercomputational bottlenecks in pRasterBlaster

We demonstrated the value of profilers for pRasterBlaster– The techniques is likely valuable for other GIS

application Performance profiling is an important tool Performance profiling is an important tool

for developing scalable and efficient high for developing scalable and efficient high performance applicationsperformance applications

Page 29: 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Future WorkFuture Work

Identify and resolve remaining Identify and resolve remaining performance issues in performance issues in pRasterBlasterpRasterBlaster– Recently identified I/O is the next Recently identified I/O is the next

major road-blockmajor road-block

2929