computing in hep a introduction to data analysis in high energy physics max sang applications for...

28
Aug 2001 Max Sang, CERN/IT, [email protected] 2 Introduction to HEP Accelerators produce high intensity, high energy beams of particles like protons or electrons. Detectors are huge, multi-layered electronic devices constructed around the points where the beams collide with targets or other beams. Planned and constructed by multinational collaborations of hundreds of people over several years. Once operational, they run for years (e.g. LEP program 1989-2000).

Upload: morris-chapman

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Aug 2001Max Sang, CERN/IT, 3 The Large Hadron Collider 27km circumference 100m below surface First beam 2006 CERN Eight underground caverns for detectors

TRANSCRIPT

Page 1: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 2

Introduction to HEP

Accelerators produce high intensity, high energy beams of particles like protons or electrons.

Detectors are huge, multi-layered electronic devices constructed around the points where the beams collide with targets or other beams.

Planned and constructed by multinational collaborations of hundreds of people over several years.

Once operational, they run for years (e.g. LEP program 1989-2000).

Page 2: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 3

The Large Hadron Collider

27km circumference100m below surface

First beam 2006

CERN

Eight underground caverns for detectors

Page 3: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 4

CMS

Under construction now - ready 2006

21 m long, 15 m diameter

12500 tons As much iron as

the Eiffel Tower 1900 physicists

from 31 countries

Page 4: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 5

Introduction to HEP (II)

‘Events’ are like photographs of individual subatomic interactions taken by the detectors.

Events produced at high rates (kHz-MHz) for months at a time with minimal human intervention. Analysis continues for years.

Fundamental physics processes are quantum (probabilistic). They are uncorrelated (consecutive events unconnected) but occur at a wide range of frequencies - some very rare. Some are more interesting than others...

Page 5: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 6

Introduction to HEP (III)

Data are grouped into runs, periods, years. Calibrations, detector faults, beam conditions, etc. are associated with certain time periods, e.g. “The calorimeter was off during run 1234”

‘Event Generators’ simulate the collisions and and produce the final state particles.

These are processed by simulated detectors to produce ‘Monte Carlo data’ for comparison with what we see in the real thing. Iterative process of comparison, tuning, model verification.

Page 6: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 7

Extracting the Data

Passage of particles through detector components produces ionisation which is amplified to a detectable level.

Front-end electronics turn pulses into digits.Hardware processing turns digits into ‘hits’.Software turns hits into ‘tracks’, ‘clusters’

etc. Multi-level trigger/filter decides what events

to keep (sometimes only one event in 107).‘Online reconstruction’ storage.

Page 7: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 8

The LEP Era (Started 1989)

Four detectors (300 people each) producing 50 kHz collision rate 5 Hz storage rate. Event size ~100kB, reconstructed by small

farm of O(10) very high-end workstations.< 500 GB/year/experiment

Stored on tape (with disk caching) at CERN. Analysed on mainframes by remote batch jobs. Ntuples ( 100MB) returned to user for more

(interactive) analysis and calculation. Plots produced for presentations and papers.

Page 8: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 9

The LHC Era (Starts 2006)

4 detectors (6k people in total) 50 MHz collision rate 100 Hz storage rate. 500 GB/s raw data rate after triggering. Event size 1-2 MB, reconstructed by farm of 1k PCs.

1 PB/year/experiment in 2007, increasing rapidly. Total by 2015 for all detectors = 100 PB.

Searches may look for single events in 107. Every user (in 30 countries) will want to eat millions of events at a single sitting, with reasonably democratic data access.

Page 9: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 10

Physicists are also Programmers

All data analysis done using computersThe physicists are all ‘programmers’, but

almost none of them have any formal CS training Some will be very experienced (usually F77). Will

write lots of code for reconstruction, triggering etc. Others write more modest programs for their own

data analysis. Some will be fresh graduate students who’ve

never written a line of code.Our job is to help them do physics.

Page 10: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 11

What Software do they Need?

Experiment-specific code Triggering, data acquisition, slow controls,

reconstruction, new ‘physics code’ Mostly written by the experimentalists without

assistanceEvent generators

Highly technical, constantly in flux Written by phenomenologists

We don’t help with these!

Page 11: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 12

What Software do they Need?(II)

Specialised HEP tools Detector simulation tools, relativistic kinematics, ...

General purpose scientific tools with a HEP slant Data visualisation, histogramming, ...

General purpose technical libraries Random numbers, matrices, geometry, analytical

statistics, 2D and 3D graphics, ...

We do help with these!

Page 12: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 13

The Situation in ~1995

Millions of lines of F77, some of it very technical

Thousands of man-years of debuggingUsers know and love/hate the software, and

they don’t want to changeSerious and unavoidable maintenance

commitment for old code - F77 is here to stay!Shrinking manpower in IT divisionNot long until the start of the LHC programme.

Change now or wait until 2020!

Page 13: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 14

The Old Software

Largely home-grown in 70s and 80s: Persistent storage and memory management: ZEBRA Code management: PATCHY Scripting: KUIP/COMIS Histograms and Ntuples: HBOOK Detector simulation: GEANT 3 Fitting & Minimisation: MINUIT Mathematics, random numbers, kinematics: MATHLIB Graphics: HIGZ/HPLOT Visualisation and interactive analysis: PAW

Page 14: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 15

The Anaphe Project

Provide a modern, object-oriented, more flexible, more powerful replacement for CERNLIB with fewer people in less time.

Identify areas where commercial and/or Open Source products can (or must) be used instead of home-grown solutions

Concentrate efforts on HEP-specific tasksUse object-oriented techniques and plan for

very long term maintenance and evolutionDetector simulation is a separate project (v.

big)

Page 15: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 16

Commodity Solutions

Luckily, computing has also evolved.What can we get off-the-shelf?

Open Source tools Code management (CVS) Graphics (Qt, OpenGL) Scripting (Python, Perl)

Commercial products Persistency (Objectivity OODB) Mathematics (Nag library ‘CERN edition’)

Page 16: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 17

HEP Community Developments

Not everything is being done solely at CERN!

CLHEP - C++ class libraries for HEP Random numbers 3D geometry, vectors, matrices, kinematics Units and dimensions Generic HEP classes (particles, decay chains etc)

Generators being moved (slowly) to C++ The competition (JAS, Open Scientist, Root)

Page 17: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 18

Anaphe C++ Libraries (I)

Fitting: FML (fitting and minimisation library) Flexible, extensible library based on Gemini engine Gemini - core fitting engine based on Nag or MINUIT

Histograms: HTL (histogram template library) Histograms are statistical distributions of measured

quantities - the workhorse of HEP analysis. Must be flexible, extensible and very efficient.

Page 18: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 19

Anaphe C++ Libraries (II)

QPlotter: Graphics package For drawing histograms and more Based on Qt (superset of Motif)

NtupleTag Extends concept of ntuple (~ static table of

data) Can add with new columns as you work Can navigate back to original events Smart clustering of data See Zsolt’s presentation...

Page 19: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 20

Interactive Analysis

Analysis in HEP = ‘Data Mining’ Extract parameters from large multi-

dimensional samples.Typical tasks:

Plot one or more variables with cuts on yet others - exploring the variable space.

Perform statistical tests on distributions (fitting, moments etc.)

Produce histograms etc. for papers or talks.

Page 20: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 21

Interactive Analysis (II)

Almost all analyses begin as interactive ‘playing’ with the data and progress organically to large, complex, CPU intensive procedures.

Step 1: single commands to a script interpreter e.g. “plot x for all events with y > 5”

Step 2: multi-command scripts/macrosStep 3: procedures can be translated into C++

functions and called interactivelyStep 4: user can build new libraries and interact

with them through the command line (etc...)

Page 21: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 22

Interactive Analysis (III)

The progression from command line, to macro, to compiled library, should be smooth and simple.

Doing the easy things should be easy to allow rapid development and prototyping of algorithms.

Doing complex things then becomes significantly easier than starting from scratch in C++

Distributed analysis must also be possible (see Kuba’s talk)

Page 22: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 23

Lizard (I)

Interactive environment for data analysis using the other Anaphe components First prototype (with limited functionality) available since

CHEP 2000 Re-design started in April 2000 Beta version October 2000 Full version out since June 2001 Much more work and testing to do, but already

approaching (and surpassing) PAW functionalityEmbedded in Python

Page 23: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 24

Lizard (II)

Architecture: Everything interacts with everything else through

their abstract interfaces so the implementation is hidden.

‘Commander’ C++ classes load the implementation classes at run time and become proxies for them.

Use SWIG to generate ‘shadow’ classes from the Commander header files. These are compiled into the Python library and become accessible as new Python objects.

Swapping components at run time becomes trivial.

Page 24: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 25

Lizard Screenshot

Page 25: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 26

Behind the Scenes

User

PythonController Shadow classes

C++ interfaces

C++ implementations

Automatically generated by SWIG

AIDA Interfaces

Anaphe implementations

Page 26: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 27

AIDA

Use of abstract interfaces promotes weak coupling between components.

AIDA (Abstract Interfaces for Data Analysis) project is extending this to community-wide standard interfaces which will allow use of C++ components in Java and vice versa.

Developers only need to learn one way of interacting with a ‘histogram’, which works with all compliant implementations.

Page 27: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 28

Summary

HEP has (and has always had) serious computing requirements

The old model (F77 monoliths) is no longer workable in the LHC era

New software in C++ and Java uses modern software design to plan for the long term

Anaphe is CERN IT division’s contribution Flexible, extensible, modular, efficient

The LHC is coming and we must be ready!

Page 28: Computing in HEP A Introduction to Data Analysis in High Energy Physics Max Sang Applications for Physics Infrastructure Group IT Division, CERN, Geneva

Aug 2001 Max Sang, CERN/IT, [email protected] 29

Further information

More information about the detectors and HEP in general http://cmsinfo.cern.ch http://cern.ch/atlas

CERN IT Division http://cern.ch/IT

The Anaphe project http://cern.ch/Anaphe