hall d online data acquisition cebaf provides us with a tremendous scientific opportunity for...
TRANSCRIPT
Hall D Online Data Acquisition
CEBAF provides us with a tremendous scientific
opportunity for understanding one of the
fundamental forces of nature.
75 MB/s
900 MB/s
Critical Role for Computing in Hall D
The quality of Hall D science depends critically upon the
collaboration’s ability to conduct it’s computing tasks.
Design Focus
Get the job done Minimize the effort required to
perform computing Fewer physicists Lower development costs Lower hardware costs Keep it simple
Provide for ubiquitous access and participation
Goals for the Computing Environment
1. Only two people are required to run the experiment.2. Everyone can participate in solving experimental
problems – no matter where they are located.3. Offline analysis can more than keep up with the online
acquisition.4. Simulations can more than keep up with the online
acquisition.5. First pass analysis and simulations can be planned,
conducted, monitored, validated and used by a group.6. First pass analysis and simulations can conducted
automatically with group monitoring.7. Subsequent analysis can be done automatically if
individuals so choose.
Goal #1: Two person acquisition team
100 MB/s raw data. Need an estimate of designed good event rate to set online trigger performance
Automated system monitoring Automated slow controls Automated data acquisition Automated online farm Collaborative environment for access to experts Integrated problem solving database links current
to past problems and solutions Well defined procedures Good training procedures
Goal #2: Ubiquitous expert participation
Online system information available from the web.
Collaborative environment for working with online team.
Experts can control systems from elsewhere when data acquisition team allows or DAQ inactive.
Goal #3: Concurrent Offline Analysis
Offline analysis can be completed in the same length of time as is required for data taking (including detector and accelerator down time). This includes: Calibration overhead. Multiple passes through the data (average
of 2). Evaluation of results. Dissemination of results
Goal #4: Concurrent Simulations
Simulations can be completed in the same length of time as is required for data taking (including detector and accelerator down time). This includes: Simulation planning. Systematic studies ( up to 5-10 times as
much data as is required for experimental measurements).
Analysis of simulation results. Dissemination of results.
Goal #5: Collaborative computing
First pass analysis and simulations can be planned by a group.
Multiple people can conduct, validate, monitor, evaluate and use first pass analysis and simulations without unnecessary duplication.
A single individual or a large group can manage appropriate scale tasks effectively.
Goal #6: Automated computing
First pass analysis and simulations can conducted automatically without intervention.
Progress is reported automatically. Errors in automatic processing are
automatically flagged.
Goal #7: Extensibility
Subsequent analysis can be done automatically if individuals so choose.
The computational management system can be extended to include any Hall D computing tasks.
April 16, 2001 L. Dennis, FSU
Technical Details
Technical requirements that the computing system must
meet.
Technical Details
100 MB/s raw data. Need an estimate of designed good event rate to set online trigger performance.
Average of two analysis passes through the data. Average of 10 events simulated for every event
taken. All required information available online – no
electronically generated information will go unrecorded.
All computer tasks automated - can be submitted and monitored from any computer system that can reach the internet.
Trigger Rates for Hall D
Detector180 kev/s
Trigger15 kev/s
5 kB/ev75 MB/s
Trigger requires~100 CPU’s*
* Assume a factor of 10 improvement over existing CPU’s
5 CPU-ms/ev Full Reconstruction (CLAS) 50 ms/ev today.100 CPU-ms/ev Full Simulation (CLAS) 1-3 s/ev today.1/3 Assumed detector & accelerator efficiency.
Required Sustained Reconstruction Rate
[15 kev/s] * [1/3] * [2] = 10 kev/s
EquipmentDuty
Factor
RawRate
Duplication Factor
10 kev/s * 5 CPU-ms/ev = 50 CPU’s
Required Sustained Simulation Rate
5 kev/s * 100 CPU-ms/ev = 500 CPU’s
[15 kev/s] * [1/3] * [10] * [1/10] = 5 kev/s
EquipmentDuty Factor
RawRate
Systematics
Studies
Good Event
Fraction
PWA error is determined by one’s knowledge of systematicerrors. This requires extensive simulations, but not allevents simulated are accepted events.
Annual Date Rate to Archive
Raw Data
75 MB/sec * (3 *107 s/yr) * (1/3) = 0.75 PB/yr
Simulation Data
25 MB/sec * (3 *107 s/yr) = 0.75 PB/yr
Reconstructed Data
50 MB/sec * (3 *107 s/yr) = 1.50 PB/yr
Total Rate to Archive ~ 3 PB/yr
Requirements Summary
Hall D CPU Requirements
First Pass7%
Trigger13%
Analysis13%
Simulation
67%
Hall D Annual Data Rates
Analyzed Data50%
Raw Data25%
Simulated Data25%
Some comparisons: Hall D vs. other HENP
Data Volumes (tape)
TB/year
Data ratesMB/s
Disk CacheTB
CPUSI95/year
People
CMS 2 000 (total) 100 500 500 000 ~1800
US Atlas (Tier 1)
300 100 100 100 000 ~500 (?)
STAR 200 40 >20 7000 ~300
D0/CDF Run II
300 ~500
BaBar 300 ~500
Not just an issue of equipment. These experiments all have the support of large dedicated computing groups within the experiments well defined computing models
JLAB– current
120 10-22 25 8000~240 (CLAS)
Hall D 1 - 3000 75 200 200 000 100
April 16, 2001 L. Dennis, FSU
Proposed Solution
“You can’t always get what you want. You can’t always get what you want.You can’t always get what you want.
But if you try sometimes, well you just might find
You’ll get what you need.”
Rolling Stones, You can’t always get what you want.
Meeting the Hall D Computational Challenges
Moore’s law: Computer performance increases by a factor of 2 every 18 months.
Gilder’s Law: Network bandwidth triples every 12 months.
Solving the information management problems requires people working on the software and developing a workable computing environment.
Dennis’ Law: Neither Moore’s Law nor Gilder’s Law will solve our computing
problems.
Hall D Computing Tasks
First PassAnalysis
Data Mining
Physics Analysis
Partial WaveAnalysis
Physics Analysis
Acquisition
Monitoring
Slow Controls
Data Archival
Planning
Simulation
Publication
Calibrations
Initial Estimate of Software Tasks & Timeline
Hall D Grid
Hall D Grid Sites
First Pass Analysis (Jefferson Lab) Simulation Sites (3-5) Physics Analysis Sites (3-5) Partial Wave Analysis Sites (2) Calibration Site
Hall D Offline Data Flow
Grid Efficiency Considerations
Need extensive resources. Need universal access. Need good workflow. Need good communication about
what has been done and what needs to be done.