pittsburgh supercomputing center rp update july 16, 2009
DESCRIPTION
Pittsburgh Supercomputing Center RP Update July 16, 2009. Bob Stock Associate Director [email protected]. Center for Analysis & Prediction of Storms. Oklahoma/NOAA Spring Severe Weather Forecast Experiment for 2009 CAPS used NICS (1 km) and PSC (4 km) At PSC from 4/20 to 6/5 - PowerPoint PPT PresentationTRANSCRIPT
© 2008 Pittsburgh Supercomputing Center
Pittsburgh Supercomputing CenterRP Update July 16, 2009
Bob Stock
Associate Director
© 2008 Pittsburgh Supercomputing Center
Center for Analysis & Prediction of Storms
• Oklahoma/NOAA Spring Severe Weather Forecast Experiment for 2009
• CAPS used NICS (1 km) and PSC (4 km)• At PSC
– from 4/20 to 6/5– Sunday-Thursday: reservations of 2000 cores
for 10-12 hours starting at 10:30 a.m. (eastern)
• Lots of data generated: E.g., 66 terabytes ingested into archive during May
2009 CAPS Spring Experiment on PSC BigBen
•Data Access and Screening•Create Input Files•Create Job Scripts
•Remap Radar Data [800 proc, 20 proc each radar]
•Process Initial and Boundary Conditions•Run Weather Analysis [80 processors]•Create Ensemble Perturbations•Run WRF & ARPS Forecast Models [18 x 80 processors]•Extraction & reformatting of 2-D output•Archive of 3-D results, over 50 TB data
•Generate derived products•Data display and interrogation•Analysis and verification•Publication
Sample 4-km Ensemble Forecast Products
18h Forecasts Valid 1800 UTC, May 8, 2009
PredictedProbabilitymatched reflectivity
ActualObservedRadarReflectivity
PredictedSpaghettiDiagramof 35 dBZ reflectivity
PredictedProbabilityof reflectivity >35 dBZ
MidwestZoomAll EnsembleForecastMembers
© 2008 Pittsburgh Supercomputing Center
Enhancing Operations on Pople
• Automatic Performance Measurement– Utilize Performance Monitor Unit (PMU)
• Backfilling using Predictive Walltimes
© 2008 Pittsburgh Supercomputing Center
Automatic Performance Measurement
Goal: Collect Intel Itanium 2 PMU stats for each job in order to Identify underperforming codes (MFLOPS) Provide users with PMU stats for their runs
Based on open source package: Perfmon2 http://perfmon2.sourceforge.net/
Collection started for each job using pfmon Counters collected: CPU_OP_CYCLES_ALL,
FP_OPS_RETIRED, L3_REFERENCES, L3_MISSES Counter detail for each process and thread collected Report issued from digested stats Currently testing and evaluating load on system
© 2008 Pittsburgh Supercomputing Center
Backfilling using Predictive Walltimes
Goal: Maximize backfilling during drain for larger jobs Problem: Backfilling for large jobs idles machine due to users
overestimating job run times Solution: Store estimated and actual job run times for each job
and statistically predict job run times Statistically calculated run time is used to optimize backfilling
opportunities Database used to store job actual and estimated walltimes for
each job Lightweight database engine, SQLite, used to store data
70,000 jobs in database Database uses only 87Kbytes!
Scheduler uses data from database to select jobs for backfill Still studying impact and benefits – shows promise
© 2008 Pittsburgh Supercomputing Center
PSC at TG09: Organization
• Shawn Brown: Science Track Co-Chair• Pallavi Ishwad: EOT Track Chair• Laura McGinnis: Student Program Chair• Shandra Williams: Communications
Committee Member in charge of signage• Mike Schneider: Wrote news items about
the conference
© 2008 Pittsburgh Supercomputing Center
PSC at TG09: Participation
• Phil Blood and Robin Flaus: Presented paper on Computation Exploration (Comp Ex) program in EOT Track
• Greg Foss: Presented visualizations in Visualization Showcase• Ed Hanna and Rob Light with Dave Hart (SDSC): Presented paper
on RDR in Technology Track• Anirban Jana and Sergiu Sanielevici with several people from other
institutions: Presented tutorial Preparing Your Application for TeraGrid Beyond 2010
• Nick Nystrom with several people from other institutions: Presented tutorial Using Tools to Understand Performance Issues on TeraGrid Machines: IPM and the POINT Project
• Josephine Palencia: Presented poster JWAN: PSC's Secure, Federated, Distributed Lustre Filesystem on the WAN (TeraGrid)