Download - Summary of the Analysis Systems
13th June 2006 David CollingImperial College London
1
Summary of the Analysis Systems
13th June 2006 David CollingImperial College London
2
Slightly unusual to be asked to summarise a session that everybody has everyone has just sat through so:
• I will try summarise the important points of each model-This will be a personal view- I “manage” a distributed Tier 2 in the UK that currently supports all LHC experiments- I am involved in CMS computing/analysis
• Then there will be further opportunity to question the experiment experts about the implementation of the models on the Tier 2s.
Outline
13th June 2006 David CollingImperial College London
3
• Firstly, only three of the four LHC experiments plan to do any analysis at the Tier 2s
• However, conceptually, those three have very similar models.
• They have the majority (if not all) end user analysis being performed at the Tier 2s. This gives the Tier 2 a crucial role in extracting the physics of the LHC.
• The analysis share the Tier 2s with Monte Carlo production
Comparing the Models
13th June 2006 David CollingImperial College London
4
Comparing the Models
• The experiments want to be able to control the fraction of the Tier 2 resources that are used for different purposes (analysis v production, analysis A v analysis B)
• They all realise that data movement followed by knowledge of the content and location of those data is vitally important.
-They all separate the data content and data location databases- All have jobs going to the data
• The experiments all realise that there is a need to separate the user from complexity of the WLCG.
13th June 2006 David CollingImperial College London
5
So where do they differ?
• Implementation, here they differ widely on:-What services need to be installed/maintained at each Tier 2.- What additional software software that they need above the “standard” grid installations.- Details of the job submission system (e.g. pilot jobs or not, very different UIs etc)
• How they handle different Grids
• Maturity:- CMS has a system capable of running >100K jobs/month whereas Atlas only has a few hundred GB of appropriate data
13th June 2006 David CollingImperial College London
6
ImplementationsLets starts with Atlas…
• Different implementations on different Grids.
Looking at the EGEE Atlas implementation.• No services required at the Tier 2 only software installed by SGM.
• All services (file catalogue, data moving services of Don Quixote etc) at the local T1.
• As a Tier 2 “manager” this makes me very happy as it minimises the support load at the Tier 2 and means that it is left to experts at the Tier 1. Means that all sites within the London Tier 2 will be available for Atlas analysis.
13th June 2006 David CollingImperial College London
7
Accessing data for analysis on the Atlas EGEE installation
SE
FTS
Datasetcatalog
Tier 0
VOBOX
Tier 1Tier 2
CE
LRC
http
lrc protocol rfio
dcap
gridftp
nfs
13th June 2006 David CollingImperial College London
8
Atlas Implementations
Prioritisation mechanism will come from the EGEE Priorities Working group
Production
Long
Short
CE
CE
Software
70%
20 %
1 %
9 %
13th June 2006 David CollingImperial College London
9
Atlas Implementations and maturity
US Using Panda system:
• Much more work at the Tier 2
• However US Tier 2 seem to be better endowed with support effort so this may not be problem.
NorduGrid• Implementation still ongoing
Maturity• Only a few hundred GB of appropriate data• Experience of SC4 will be important, especially Don Quixote
13th June 2006 David CollingImperial College London
10
CMS Implementation
• Require installation of some services at Tier 2s: PhEDEx & trivial file catalogue
• However, it is possible to run the instances for different sites within a distributed T2 at a single site.
• So as a distributed Tier 2 “manager” I am not too unhappy … for example in the UK I can see that all sites in the London Tier 2 and in SouthGrid running CMS analysis but less likely in NorthGrid and ScotGrid
13th June 2006 David CollingImperial College London
11
CMS Implementation across Grids
• Installation as similar as possible across EGEE and OSG.
• Same UI for both … called crab
•Can use crab to submit to OSG sites via an EGEE WMS or directly via CondorG
13th June 2006 David CollingImperial College London
12
CMS Maturity
• PhEDex has proved to be a very reliable since DC04
• CRAB in use since end of 2004 • Hundred thousand jobs a month
• Tens of sites both for execution and submission
• Note that there are still failures
13th June 2006 David CollingImperial College London
13
Alice Implementation
• Only really running on EGEE and Alice specific sites
• Puts many requirement on a site: xrootd, VO Box running AliEn SE and CE, the package management server, MonLisa server, LCG UI and AliEn file transfer.
• All jobs are submitted via AliEn tools
• All data is accessed only via AliEn
13th June 2006 David CollingImperial College London
14
SACE
FTDMonaLisaPackManLCG-UI
Vo-Box
Port # Access: Outgoing + Service=============================================8082 incoming from World SE (Storage Element)8084 incoming from CERN CM (ClusterMonitor)8083 incoming from World FTD (FileTransferDaemon)9991 incoming from CERN PackMan
Disk Server 1
Disk Server 2
Disk Server 3Xrootdredirector
Port # Access: Outgoing + Service=============================================)1094 incoming from World xrootd file transfer.
Can runon the VO Box
LCG CELCG FTS/SRM-SE/LFC
Storage
Workernode configuration/requirements are equal for batch processing at Tier0/1/2 centers (2 GB Ram/CPU – 4 GB local scratch space)
Tier-2 Infrastructure/Setup Example
13th June 2006 David CollingImperial College London
15
Alice Implementation
• All data access is via xrootd … allows innovative access to data. However, it is a requirement on site.
• May be able to use xrootd front ends to standard srm
• Batch analysis implicitly allows prioritisation through a central job queue
• However, this does involve using glexec like functionality
13th June 2006 David CollingImperial College London
16
Alice Implementation –Batch analysis
JDL
TaskQueue
TaskQueue
SubmissionJDL
JDL
Optimiziation- Splitting- Requirements- FTS replication- Policies
apply
matchAliEnCE
Tier-2LCGUI RB LCG
CE
Tier-2
Agent
BatchSys.
ROOT
JDL
JDL
XML
AliEn FC
Xrootd
Tier-2 SE
API Services
Central services
Tier-2
13th June 2006 David CollingImperial College London
17
Alice Implementation
• As a distributed Tier 2 “manager” this set up does not fill me with joy.
• I cannot imagine installing such VO boxes within the London Tier 2 and would be surprised if any UK Tier 2 sites (with the exception Birmingham) install such boxes.
13th June 2006 David CollingImperial College London
18
Alice Implementation
Interactive Analysis
• More important to Alice than others.• Novel and interesting approach based Proof and xrootd
Maturity
• Currently, only a handful of people trying to perform Grid based analysis• Not a core part of SC4 activity for Alice.
13th June 2006 David CollingImperial College London
19
Conclusions
• Three of the four experiments plan to use Tier 2 sites for end user analysis.
• These three experiments have conceptually similar models (at least for batch )
• The implementations of the similar models have very different implications for the Tier 2 supporting the VOs
13th June 2006 David CollingImperial College London
20
Discussion…