richard p. mount chep 2000data analysis for slac physics richard p. mount chep 2000 padova february...
TRANSCRIPT
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Data Analysis for SLAC Physics
Richard P. Mount
CHEP 2000 Padova February 10, 2000
2Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Some Hardware History1994 Still IBM mainframe dominated
AIX farm growing(plus SLD Vaxes)
1996 Tried to move SLD to AIX Unix farm
1997 The rise of Sun -- farm plus SMP
1998 Sun E10000 plus farm plus ‘datamovers’Remove IBM mainframe
1999 Bigger E10000, 300 Ultra 5s, more datamovers
2000 E10000, 700+ farm machines, tens ofdatamovers etc.(plus SLD Vaxes)
3Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
4Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Some non-Hardware History
• Historical Approaches:
– Offline computing for SLAC experiments was not included explicitly in the cost of constructing or operating the experiments;
– SLAC Computing Services (SCS) was responsible for running systems (only);
– Physics groups were responsible for software tools.
• Some things have changed . . .
5Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Data Analysis• 6 STK Powderhorn Silos with 20 ‘Eagle’
drives
• Tapes managed by HPSS
• Data-access mainly via Objectivity
6Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
STK Powderhorn Silo
7Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
8Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
9Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
#Bitfile Server#Name Server#Storage Servers
#Physical Volume Library# Physical Volume Repositories#Storage System Manager#Migration/Purge Server#Metadata Manager
#Log Daemon#Log Client
#Startup Daemon#Encina/SFS
#DCE
ControlNetwork
Data Network
HPSS: High Performance Storage System
Andy Hanushevsky/SLAC
10Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
#Bitfile Server#Name Server#Storage Servers
#Physical Volume Library# Physical Volume Repositories#Storage System Manager#Migration/Purge Server#Metadata Manager
#Log Daemon#Log Client
#Startup Daemon#Encina/SFS
#DCE
ControlNetwork
Data Network
HPSS at SLAC
Andy Hanushevsky/SLAC
11Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
oofs interface
File system interface
Objectivity DB in BaBar
Andy Hanushevsky/SLAC
oofs interface
File system interface
Datamover
12Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
IR2 FED
ConditionsConfiguration
Ambient
OPR FED
EventsConditions
Configuration
Analysis FEDEvents
ConditionsConfiguration
Events
HPSS
Conditions etc.
Analysis
Computer CenterIR2OPR
Prompt Reconstruction
Principal Data Flows
13Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
IR2 FED
ConditionsConfiguration
Ambient
OPR FED
EventsConditions
Configuration
Analysis FEDEvents
ConditionsConfiguration
Events
HPSS
Conditions etc.
Analysis
Computer CenterIR2OPR
Prompt Reconstruction
Daily“Sweep”
Twice a week “Sweep”
Twice a week “Sweep”
Database “Sweeps”
14Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
OPR to Analysis “Sweep”1) Flush OPR databases (tag, collection . . .) to HPSS
2) “diff” Analysis and OPR federation catalogs
3) Stage in (some) missing Analysis databases from HPSS
4) Attach new databases to Analysis federation
200 Gbytes moved per Sweep
1 Tbyte per sweep left in HPSS but attached to Analysis Federation.
Currently takes about 6 hours.
Achievable target of < 30 minutes.
Note that it takes at least 3 hours to stage in 1 TB using 10 tape drives.
15Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Offline Systems: August 1999
16Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Datamovers end 1999Datamove4 OPR
Datamove5 OPR
Datamove1 Reconstruction (real+MC)
Datamove6 Reconstruction (real+MC)
Datamove3 Export
Datamove2 RAW,REC managed stagein
Datamove9 RAW, REC anarchistic stagein
Shire (E10k) Physics Analysis (6 disk arrays)
Datamove7 Testbed
Datamove8 Testbed
Most are 4 processor Sun SMPs with two (0.5 or 0.8 TB each) disk arrays
17Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
SLAC-BaBar Data Analysis System50/400 simultaneous/total physicists, 300 Tbytes per year
HARDWARE UNITS End FY1999 End FY2000
Tape Silos (STK Powderhorn, 6000 tapes each)
silos 6 6
Tape Drives (STK Eagle, 20 Gbyte, 10 Mbytes/s)
drives 20 40
Disk (net capacity of RAID arrays)
Tbytes 20 56
File Servers and Data Movers (Sun)
CPUs 73 150
Interactive Servers (Sun + Linux)
CPUs 82 140
Batch Servers (Sun + Linux) CPUs 300 900
Network Switches (Cisco 6509)
switches 5 14
18Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Problems, August-October 1999:Complex Systems, Lots of Data
• OPR could not keep up with data– blamed on Objectivity (partially true)
• Data analysis painfully slow– blamed on Objectivity (partially true)
• Linking BaBar code took forever– blamed on SCS, Sun, AFS, NFS and even BaBar
• Sun E10000 had low reliability and throughput– blamed on AFS (reliability), Objectivity (throughput) . . .
19Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Reconstruction Production:Performance Problems with Early
Database Implementation
20Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Fixing the “OPR Objectivity Problem”BaBar Prompt Reconstruction Throughput (Test System)
21Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Fixing Physics Analysis “Objectivity Problems”: Ongoing Work
• Applying fixes found in OPR Testbed
• Use of Analysis systems and BaBar physicist as Analysis Testbed
• Extensive instrumentation essential
• A current challenge:– Can we de-randomize disk access (by tens of
physicists and hundreds of jobs)– Partial relief now available by making real
copies of popular collections
22Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Extensive (but still insufficient) Instrumentation
2 days traffic on one Datamove
machine
6 weeks traffic on one
Tapemove machine
23Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Kanga, the BaBar “Objectivity-Free” Root-I/O-based Alternative
• Aimed at final stages of data analysis
• Easy for universities to install
• Supports BaBar analysis framework
• Very successful validation of the insulating power of the BaBar transient-persistent interface
• Nearly working
24Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Exporting the Data• CCIN2P3 (France)
– Plan to mirror (almost) all BaBar data
– Currently have “Fast” (DST) data only (~3 TB)
– Typical delay is one month
– Using Objectivity
• CASPUR (Italy)– Plan only to store “Fast” data (but its too big)
– Data are at CASPUR but not yet available
– Prefer Kanga
• RAL (UK)– Plan only to store “Fast” data
– Using Objectivity
25Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Particle Physics Data GridUniversities, DoE Accelerator Labs, DoE Computer Science
• Particle Physics: a Network-Hungry Collaborative Application– Petabytes of compressed experimental data;
– Nationwide and worldwide university-dominated collaborations analyze the data;
– Close DoE-NSF collaboration on construction and operation of most experiments;
– The PPDG lays the foundation for lifting the network constraint from particle-physics research.
• Short-Term Targets:– High-speed site-to-site replication of newly acquired particle-physics
data (> 100 Mbytes/s);
– Multi-site cached file-access to thousands of ~10 Gbyte files.
26Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Bulk Transfer Service:100 Mbytes/s, 100 Tbytes/year
Primary Site
Data Acquisition,CPU, Disk,Tape-Robot
Replica Site(Partial)
CPU, Disk,Tape-Robot
Primary SiteData Acquisition,
CPU, Disk,Tape-Robot
Satellite Site
CPU, Disk,Tape-Robot
Satellite Site
CPU, Disk,Tape-Robot
UniversityCPU, Disk,
Users
UniversityCPU, Disk,
Users
UniversityCPU, Disk,
Users
Satellite Site
CPU, Disk,Tape-Robot
High-Speed Site-to-Site File Replication Service
Multi-Site Cached File Access
27Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
PPDG Resources• Network Testbeds:
– ESNET links at up to 622 Mbits/s (e.g. LBNL-ANL)– Other testbed links at up to 2.5 Gbits/s (e.g. Caltech-SLAC via NTON)
• Data and Hardware:– Tens of terabytes of disk-resident particle physics data (plus hundreds of terabytes of
tape-resident data) at accelerator labs;– Dedicated terabyte university disk cache;– Gigabit LANs at most sites.
• Middleware Developed by Collaborators:– Many components needed to meet short-term targets (e.g.Globus, SRB, MCAT,
Condor,OOFS,Netlogger, STACS, Mass Storage Management) already developed by collaborators.
• Existing Achievements of Collaborators:– WAN transfer at 57 Mbytes/s;– Single site database access at 175 Mbytes/s
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Picture Show
29Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
30Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Sun A3500 disk arrays used by BaBar (about 20 TB)
31Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
NFS File Servers:
Network Appliance F760 et al.
~ 3TB
32Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Datamovers (AMS Servers) and Tapemovers
33Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
More BaBar Servers:
Build, Objy Catalog,Objy Journal, Objy Test ...
34Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Sun Ultra5 Batch Farm
35Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Sun Netra T1 Farm Machines(440Mhz UltraSparc, one rack unit high)
36Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Sun Netra T1 Farm
now installing 450 machines
about to order another 260
37Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Linux Farm
38Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Core Network Switches and Routers
39Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Cisco 12000 External Router
one OC48 (2.4 Gbps) interface(OC12 interfaces to be added)
four Gigabit Ethernets
“Grid-Testbed Ready”
40Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Money and People
42Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
0.00
2.00
4.00
6.00
8.00
10.00
12.00
FY 1997 FY 1998 FY 1999 FY2000
$M
other
net
datamove
tape
disk
farm cpu
smp cpu
BaBar Offline Computing at SLAC:Costs other than Personnel
(does not include “per physicist” costs such as desktop support, help desk, telephone, general site network)
Does not include tapes
43Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
0.00
2.00
4.00
6.00
8.00
10.00
12.00
FY 1997 FY 1998 FY 1999 FY2000
$M
Software
Materials andSupplies
Equipment
BaBar Offline Computing at SLAC:Costs other than Personnel
(does not include “per physicist” costs such as desktop support, help desk, telephone, general site network)
44Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
0
20
40
60
80
100
120
FY 1997 FY 1998 FY 1999 FY 2000
Pe
op
le
SLAC-SCS BaBarApplications
SLAC-SCS BaBarSystems
BaBar Computing at SLAC:Personnel (SCS)
45Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Computing at SLAC:Personnel for Applications and
Production Support
0
20
40
60
80
100
120
FY 1997 FY 1998 FY 1999 FY 2000
Pe
op
le
non-DoE BaBarApplications andProduction (at SLAC)
DoE BaBarApplications andProduction (at SLAC)
SLAC-SCS BaBarApplications
SLAC-SCS BaBarSystems
Some guesses
46Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Computing PersonnelThe Whole Story?
0
20
40
60
80
100
120
140
160
FY 1997 FY 1998 FY 1999 FY 2000
Peo
ple
BaBar Physicists doingcomputing
SLAC Physicists doingcomputing
non-DoE BaBarApplications andProduction (at SLAC)DoE BaBar Applicationsand Production (atSLAC)SLAC-SCS BaBarApplications
SLAC-SCS BaBarSystems
M a n y g u e s s e s
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Issues
48Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Complexity
• BaBar (and CDF,D0,RHIC,LHC) is driven to systems with ~1000 boxes performing tens of functions
• How to deliver reliable throughput with hundreds of users?– Instrument heavily– Build huge test systems– “Is this a physics experiment or a computer
science experiment?”
49Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Objectivity
• Current technical problems:– Too few Object IDs (fix in ~ 1 year?)– Lockserver bottleneck (inelegant workarounds
possible, more elegant fixes possible (e.g. read-only databases)
– Endian translation problem (e.g. lousy Linux performance on Solaris-written databases)
• Non-technical problems– Will the (VL)ODBMS market take off?– If so, will Objectivity Inc. prosper?
50Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Personnel versus Equipment• Should SLAC be spending more on people and
buying cheaper stuff?We buy:– Disks at 5 x rock bottom
– Tape drives at 5 x rock bottom
– Farm CPU at 2-3 x rock bottom
– Small SMP CPU at 2-3 x farms
– Large SMP CPU at 5-10 x farms
– Network stuff at “near monopoly” pricing
All at (or slightly after) the very last moment
I am uneasily happy with all these choices
51Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
Personnel Issues• Is the SLAC equipment/personnel ratio a good
model?SLAC-SCS staff are:– smart
– motivated
– having fun
– (unofficially) on call 24 x 7
– in need of reinforcements
52Richard P. Mount CHEP 2000Data Analysis for SLAC Physics
BaBar Computing Coordinator
The search is now on
An exciting challenge
Strong SLAC backing
Contact me with your suggestions and enquiries ([email protected])