the lab’s computing support strategy for cdf and d0 victoria white, associate lab director for...
DESCRIPTION
The Lab’s Computing Support Strategy for CDF and D0 Victoria White, Associate Lab Director for Computing and CIO October 3, 2011. Our job in the Computing Sector. - PowerPoint PPT PresentationTRANSCRIPT
The Lab’s Computing Support Strategy for CDF and D0
Victoria White, Associate Lab Director for Computing and CIO
October 3, 2011
Our job in the Computing Sector
• Is to enable science and to optimize the support (human and technological) of the scientific programs of the lab (including the Experiment program)
Within funding and resource contraints In the face of growing demands To meet emerging needs To deal with rapidly changing technology
• We also have to provide computing to support the lab’s operations and provide all the standard services that an organization needs (and often expects 24x7)
Computing Support Strategy for CDF and D02
Computing Division -> Computing Sector
Computing Support Strategy for CDF and D03
Computing Support Strategy for CDF and D0
Fermilab Computing Facilities
4
•Lattice Computing Center (LCC)• High Performance Computing (HPC)• Accelerator Simulation, Cosmology nodes• No UPS
•Feynman Computing Center (FCC)• High availability services – e.g. core
network, email, etc.• Tape Robotic Storage (3 10000 slot
libraries)• UPS & Standby Power Generation• ARRA project: upgrade cooling and
add HA computing room - completed
•Grid Computing Center (GCC)• High Density Computational
Computing• CMS, RUNII, Grid Farm batch worker
nodes• Lattice HPC nodes• Tape Robotic Storage (4 10000 slot
libraries)• UPS & taps for portable generatorsEPA Energy
Star award 2010
Computing Support Strategy for CDF and D0
Facilities: more than just space power and cooling – continuous planning
5
ARRA funded new high availability computer
room in Feynman Computing Center
Cooling problems at GCC this summer• Soaker hoses to cool concrete condenser pad• Increased computer room operating temperatures• Numerous air management improvements inside the
computer room, including cold aisle containment test• Extended monitoring outside to the condenser pad• Executed load shed plan twice during hottest days• Rented portable air conditioning for use in CRB & outside
under the condensers (the latter was effective, not efficient)
The air intake to the condensers can reach temps of 120F causing the cooling to shutdown (20-25F above ambient on pad)• $650–950k to move condensers to a
platform for Comp.Rooms B and C h Rough estimate from FESSh Does not include Computer Room Ah Better estimate when the study is
complete in NovemberComputing Support Strategy for CDF and D06
Need to fix Grid Computing Center quickly – ready for next summer
• Need to be able to use the computer rooms as designed and plan for that going forward.
• Need to move forward with CRA renovations for greater power per rack.
• We cannot do this and run everyone ragged and be unreliable every summer
Computing Support Strategy for CDF and D07
Computing Support Strategy for CDF and D0
Run II Computing Strategy
• Production processing and Monte-Carlo production capability after the end of data taking
Ability to do some reprocessing if needed Monte Carlo production at the current rate through mid-
2013? • Analysis computing capability for at least 5 years, but
diminishing after end of 2012 Push for 2012 conferences for many results –no large drop
in computing requirements through this period• Continued support for up to 5 years for
Code management and science software infrastructure Data handling for production (+MC) and Analysis Operations
• Curation of the data: > 10 years with possibly some support for continuing analyses
8
We have pushed/insisted on sharing strategies for computing for many years –why?
• Cost• Coherent technical approaches and
architectures• Support over the entire lifecycle of an
experiment/project
Computing Support Strategy for CDF and D09
Experiment/Project Lifecycle and funding
Computing Support Strategy for CDF and D010
Early PeriodR&D, SimulationsLOI,Proposals
Shared services
Mature phase
Construction, Operations, Analysis
Shared services
Expt or Project specific
Final data-takingand beyondFinal analysis,Data preservationand access
Shared services
Project specific
Shared services
Computing Support Strategy for CDF and D0
Sharing via the Grid – FermiGrid
11
TeraGrid WLCG NDGF
User Login & Job
Submission
GRIDFarm
3284 slots
CMS7485 slots
CDF5600 slots
D06916 slots
FermiGridMonitorin
g/Accountin
gServices
FermiGridInfrastructu
reServices
FermiGridSite
Gateway
FermiGridAuthenticati
on/Authorizatio
nServices
Open Science
Grid
Budget/resource allocation for 2012 +
• There is always upward pressure for computing more disk and more cpu leads to faster results and greater
flexibility more help with software & operations is always requested
• Within a fixed budget each experiment can usually optimize between tape drives, tapes, disk, cpu, servers
assuming basic shared services are provided.• With so many experiments in so many different stages
we intend to convene a “Scientific Computing Portfolio Management Team” to examine the needs/computing models of the different Fermilab based experiments and help in allocating the finite dollars to optimize scientific output.
Computing Support Strategy for CDF and D012
“Data Preservation” for Tevatron data• Data will be stored and migrated to new tape
technologies for ~ 10 years Eventually 16 PB of data will seem modest
• If we want to maintain the ability to reprocess and do analysis on the data there is a lot of work to be done to keep the entire environment viable
Code, access to databases, libraries, I/O routines, Operating Systems, documentation…..
• If there is a goal to provide “open data” that scientists outside of CDF and Dzero could use there is even more work to do.
• 4th Data Preservation Workshop at Fermilab in May• The collaboration has to decide – soon if we need to do
more than maintain data for collaboration use.Computing Support Strategy for CDF and D013