Download - Status of the WLCG Tier-2 Centres
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1
Simon Fraser
Status of the WLCG Tier-2 CentresStatus of the WLCG Tier-2 Centres
M.C. VetterliSimon Fraser University
and TRIUMF
WLCG Overview Board,CERN, October 27th 2008
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #2
Simon Fraser
Sources of InformationSources of Information
Discussions with experiment representatives in July
APEL monitoring portal http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php
WLCG reliability reports http://lcg.web.cern.ch/LCG/accounts.htm
October GDB mtg; dedicated to Tier-2 issues http://indico.cern.ch/conferenceDisplay.py?confId=20234
Talks from the last OB & LHCC Slides labeled with a * are from MV’s LHCC rapporteur talk
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #3
Simon Fraser
Tier-2 Performance Summary*Tier-2 Performance Summary*
Overall, the Tier-2s are contributing much more now
Significant fractions of the Monte Carlo simulations are being done in the T2s for all experiments
Reliability is better, but still needs to improve
CCRC’08 exercise is generally considered a success for the Tier2s
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #4
Simon Fraser
Overall, the Tier-2s and the experiments considered the CCRC’08 exercise to be a success
The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out
Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated
While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested
Tier-2 Centres in CCRC’08 – General*Tier-2 Centres in CCRC’08 – General*
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #5
Simon Fraser
Tier-2 Issues/ConcernsTier-2 Issues/Concerns
As of CB and meetings with experiments this summer
Communications: Do Tier-2s have a voice? Is there a good
mechanism for disseminating information?
Better monitoring: Pledges vs actual vs used
Hardware acquisitions: What should be bought? kSI2006?
Tier-2 capacity: Size of datasets? Effect of LHC delay?
…
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #6
Simon Fraser
Tier-2 Issues/ConcernsTier-2 Issues/Concerns
Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern
User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2s especially.
Federated Tier-2s: Tools to federate? Monitoring? (averaging)
Interoperability of EGEE, OSG, and NDGF should be improved
Software/Middleware updates: Could be smoother; too frequent
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #7
Simon Fraser
Communications for Tier-2sCommunications for Tier-2s
Identified by the T2s at the last CB as a serious problem. Interesting to me that many in experiment computing management did not share this concern.
Should communication be organized according to experiment or to Tier-1 association? There are also differing opinions on this. There are two issues: Grid middleware/operations Experiment software
My view after studying this is that the situation is OK for “tightly coupled” Tier-2s, but not for remote and smaller Tier-2s that are not well coupled to a Tier-1.
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #8
Simon Fraser
Communications for Tier-2sCommunications for Tier-2s
Many lines of communication do indeed exist.
Some examples are: CMS has two Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN) - attend all operations meetings - feed T2 issues back to the operations group - write T2-relevant minutes - organize T2 workshops ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T2 site manager - weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #9
Simon Fraser
Communications for Tier-2sCommunications for Tier-2s
ATLAS uses its cloud structure for communications - Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e.g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinator Regional organizations, such as: + France Tier-2/3 technical group: - coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management + GRIF: Tier-2 federation of 5 labs around Paris + Canada: Weekly teleconferences of technical personnel (T1 & T2) to share information and prepare for upgrades, large production, etc. + Many others exist; e.g. in the US and the UK
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #10
Simon Fraser
Communications for Tier-2sCommunications for Tier-2s Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2s a voice there.
Tier-2 mailing list: Actually exists and is being reviewed for completeness & accuracy
Tier-2 GDB: The October GDB was dedicated to Tier-2 issues + reports from experiments: role of the T2s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year?
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #11
Simon Fraser
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #12
Simon Fraser
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #13
Simon Fraser
But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema
Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T2s better
Bad because APEL accounting still has problems Accounting seems to be very finicky; breaks when the CE or MON box is upgraded
How are jobs distributed to the Tier-2s?
Tier-2 Installed ResourcesTier-2 Installed Resources
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #14
Simon Fraser
How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well
Tier-2 Hardware QuestionsTier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #15
Simon Fraser
How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well
We need to use something other than SpecInt2000! + this benchmark is totally out-of-date & useless for new CPUs + continued delays in SpecHEP can cause sub-optimal decisions
Tier-2 Hardware QuestionsTier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #16
Simon Fraser
Networking to the nodes is now an issue. + with 8 cores per node, 1 GigE connection ≈ 16.8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7.5 MB/sec at ATLAS and much more (x10?) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking GigE the worker nodes.
We need more guidance from the experiments The next round of purchases is now!
Tier-2 Hardware QuestionsTier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #17
Simon Fraser
SummarySummary
The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T2s now.
The CCRC’08 exercise is considered a success by the Tier2s and by the experiments.
Availability and reliability are up, but still need improvement.
Resource acquisition vs pledges is better but still needs work
Issues for Tier2s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed