physics analysis tools for the cms experiment at lhc

24
Physics Analysis Tools for the CMS experiment at LHC Luca Lista, INFN Napoli Francesco Fabozzi, INFN Napoli Benedikt Hegner, DESY Christopher D. Jones, Cornell

Upload: davis-deleon

Post on 31-Dec-2015

22 views

Category:

Documents


3 download

DESCRIPTION

Physics Analysis Tools for the CMS experiment at LHC. Luca Lista, INFN Napoli Francesco Fabozzi, INFN Napoli Benedikt Hegner, DESY Christopher D. Jones, Cornell. Outline. Data Tiers in CMS EDM Analysis Tools Analysis Workflow. Main Features of CMS EDM. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Physics Analysis Tools  for the CMS experiment at LHC

Physics Analysis Tools for the CMS experiment at LHC

Physics Analysis Tools for the CMS experiment at LHC

Luca Lista, INFN NapoliFrancesco Fabozzi, INFN NapoliBenedikt Hegner, DESYChristopher D. Jones, Cornell

Page 2: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 2

OutlineOutline

• Data Tiers in CMS EDM

• Analysis Tools

• Analysis Workflow

Page 3: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 3

Main Features of CMS EDMMain Features of CMS EDM

• CMS Event Data Model (EDM) is the uniform format for all CMS event data– An Event is a container of many “products” of any possible

(C++) type• Most of the products are collections of objects such as tracks,

clusters, particles, …– The EDM allows no “C” pointers allowed, and provides

custom persistent references• Product ID and indices in a collection identify referred objects

• Persistent and transient data representations are identical (based on ROOT I/O)

• All EDM data are accessible with ROOT interactively– See Chris Jones’ talk, Event processing session

• Reflex dictionaries must be provided for all products

Page 4: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 4

Data Tiers and Analysis Object DataData Tiers and Analysis Object Data

• CMS defines different data tiers containing different levels of details of an event– FEVT: full event output, containing (almost…) the complete

output of all intermediate reconstruction steps– RECO: detailed reconstruction output allowing to apply new

calibrations and alignments, and reprocess many of the products

– AOD: a proper subset of RECO chosen to satisfy the needs of a large fraction of analysis studies

• Adding or dropping object collections to/from AOD/RECO/FEVT is just a matter of changing a job’s configuration– The actual AOD content (and disk size…) is till under

definition, it will likely evolve also with data taking

Page 5: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 5

Modular Event ProductsModular Event Products

• Object collections can be split into different products• This allows us to define different levels of details

avoiding to store redundant information

t t t t t tTracks …Kinematics(helix parameters)

T T T T TTracksExtra T …Track extrapolation,references to RecHits

h h h h hTracksHits h h h h h h h h h … RecHits

AO

DR

EC

O

Page 6: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 6

Particle CandidatesParticle Candidates

• Candidate is a common base class for all high-level physics objects– Muons, electrons, photons, jets, missing ET, … inherit from

Candidate– Can contain references to AOD components, like tracks,

clusters, calorimeter towers, …– Supports mother(s)daughter(s) navigation in specialized

sub-classes

• Composite particle reconstruction from multi-body decay chains uses specialized Candidates– E.g.: Z, HZZee, BsJ/KK, …

• Event generator tree in AOD is stored using Candidates with mother/daughter references

Page 7: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 7

Jet from Heterogeneous SourcesJet from Heterogeneous Sources

t t t t t t m m m e e e

CaloTowers Muons Electrons

c c c c c c c c c c c cJet constituents

(Candidates)

j j j j Jets

Contain updatedkinematics info,so energy correctionscan be applied

AOD Collections

Multiple Jet collectionscan have links to the sameconstituent collection

Further energy correctionscan be applied

Page 8: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 8

Candidates and Associated DataCandidates and Associated Data

i i i

e e e

Electron isolation

Electrons

Z

i

e

i

e

e e

Z

e e

Z

e e

Z candidates

Associated collection

Standard RECO collectionused as “master clone”

Electrons cloneswith reference to master

(“shallow” clones)

Page 9: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 9

Framework modulesFramework modules

• Reconstruction and analysis code is organized as independent modules steered by the framework

• A job configuration script defines the modules to be loaded (as plugins), their parameters and their execution order – Modules execution sequences are organized into “paths”

• Each module can get data from the Event and can add new products to the Event

• Product provenance tracking including module parameters is saved as part of the Event output file

• Once a product is added to the Event it can’t be changed by another module

• Modules can act as event filters, stopping the processing path if a condition is not fulfilled – E.g.: High Level Trigger paths

Page 10: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 10

Available Common ToolsAvailable Common Tools

• Layered approach to common tools:– AOD (and RECO…): basic “primitive” objects for analysis

• Tracks, super-clusters, calo-towers, , e,, jets, MET

• Mainly data container, no “fancy” C++ structures– Generic common tools (for AOD and more)

• Selectors, filters, lepton isolation, matching tools

– Particle Candidates• Generic class hierarchy to manage particles for analysis• Base class for high level objects: , e,, jets, Met,

gen-particles, composite decays (Z, J/, Bs, Higgs, …)– Particle Candidates common tools

• Combiners, selectors, filters, overlap removal• MC truth matching tools• Generic isolation algorithms• Constrained fitters (initial integration examples)

Eve

nt c

olle

ctio

nsA

lgo

rith

ms

and

mod

ules

Page 11: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 11

Generic AOD Framework ModulesGeneric AOD Framework Modules

• Uniform interface is enforced throughout AOD classes – Everywhere pt(), eta(), phi(), etc.

• Generic programming is used to write algorithms applicable to different object types

• A suite of generic selector and filter modules is provided as part of the common Physics Tools

• More high level algorithms are being written using generic programming– Isolation algorithms can run on muons, electrons,

tracks, …

Page 12: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 12

Generic Object SelectorsGeneric Object Selectors• A selection criteria can generate specialized selectors performing

specific actions:– Save clones of the selected objects– Save references to the selected objects (i.e.: “indices”) – Clone the selected objects and all the underlying constituents

• e.g.: clone selected electrons with clones of tracks and clusters

• Internal implementation specializations use template traits on the basis of the input and output collection types

• The simplest object selections can be written as a simple function object (returning a Boolean result)

• A string-configurable selector functor is provided to parse a configurable string-based cut:

string cut = "(pt>10 & abs(eta)<2.5) & normalizedChi2<10"

– Variable names are mapped to objects methods via Reflex dictionary

Page 13: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 13

Generic Selector ExamplesGeneric Selector Examplesstruct PtMinSelector { PtMinSelector(double ptMin) : ptMin_(ptMin) { } template<typename T> bool operator()(const T& t) const { return t.pt()>=ptMin; } private: double ptMin_; };

typedef SingleObjectSelector< reco::MuonCollection, PtMinSelector> PtMinMuonSelector;

typedef SingleObjectSelector< reco::TrackCollection, StringCutObjectSelector<reco::Track> >

TrackSelector;

typedef SingleObjectSelector< reco::TrackCollection, StringCutObjectSelector<reco::Track>, reco::TrackRefVector> TrackRefSelector;

Page 14: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 14

Selector configurationSelector configuration

module highPtMuons = PtMinMuonSelector { InputTag src = allMuons double ptMin = 10}

module bestTracks = TrackSelector { InputTag src = allTracks string cut = "pt > 10 & normalizedChi2 < 20"}

module bestTrackReferences = TrackRefSelector { InputTag src = allTracks string cut = "pt > 10 & normalizedChi2 < 20"}

Page 15: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 15

Common Physics ToolsCommon Physics Tools

• Combinatorial analysis• Overlap checking• Monte Carlo matching tools

– Implement navigation to parent to find matching to a composite particle

• Constrained fitter – Examples of integration with external fitting packages exist– Covariance matrices (5x5) are fetched from AOD object for

vertex fits using tracks – Specialized candidate containing error matrices are being

developed for the cases where errors are not stored in AOD objects

• E.g.: jet or photon mass-constrained fits require Ecal and Hcal energy resolutions, retrieved from specialized framework services

Page 16: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 16

Example of Combinatorial SearchExample of Combinatorial Search

module JPsiCandidates = CandCombiner { string decay = "muonCandidates@+ muonCandidates@-" string cut = "2.8 < mass < 3.4"}

module PhiCandidates = CandCombiner { string decay = "trackCandidates@+ trackCandidates@-" string cut = "0.9 < mass < 1.1"}

module BsCandidates = CandCombiner { string decay = "JPsiCandidates PhiCandidates" string cut = "5.3 < mass < 5.6"}

Page 17: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 17

Analysis Custom Data TypesAnalysis Custom Data Types

• Analysis Groups can easily define new data types to be added to the Event for analysis– The output of a Analysis jobs is fully configurable– Needs not always be standard RECO or AOD

• Analysis “skim” productions run centrally– Event pre-selection is performed in central skims– New analysis collection can be added to standard

AOD (or any other data format) for the events selected by each particular analysis skim

• Analysis collections can contain either standard or any user-defined type

• Particle Candidate collections can be added to the Event as analysis output

Page 18: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 18

CMS Analysis Work-FlowCMS Analysis Work-Flow

RECORECO AODAODRAWRAW

First pass at Tier0/CAF

Central analysis skims at Tier1

AODAOD

RECO, AODshipped at Tier1

Analysis algosAnalysis algos

Analysis DataAnalysis Data

AOD + AOD + Analysis skimoutput shipped at Tier2

Analysis DataAnalysis Data

AOD + AOD + Further selection,Reduced output

Further selection,Reduced output

Analysis DataAnalysis Data

Fewer AOD coll. Fewer AOD coll.

fast processing and FWLiteat Tier3

Final analysis pre-selection at Tier2Final samplesshipped at Tier3

Page 19: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 19

CMS Analysis Work-FlowCMS Analysis Work-Flow

RECORECO AODAODRAWRAW

First pass at Tier0/CAF

Central analysis skims at Tier1

AODAOD

RECO, AODshipped at Tier1

Analysis algosAnalysis algos

Analysis DataAnalysis Data

AOD + AOD + Analysis skimoutput shipped at Tier2

Analysis DataAnalysis Data

AOD + AOD + Further selection,Reduced output

Further selection,Reduced output

Analysis DataAnalysis Data

Fewer AOD coll. Fewer AOD coll.

fast processing and FWLiteat Tier3

Final analysis pre-selection at Tier2Final samplesshipped at Tier3

Reprocess central analysis skims every ~3 months (?)

Reprocess central analysis skims every ~3 months (?)

Reprocess Tier2 analysisselection every ~2 weeks

Reprocess Tier2 analysisselection every ~2 weeks

Analyze data locally daily with frequent developments

Analyze data locally daily with frequent developments

Full reprocessing ~ twice a year (?)

Full reprocessing ~ twice a year (?)

Page 20: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 20

ConclusionsConclusions

• A flexible event content and a variety of common tools help implement the most commonly required tasks needed for CMS analysis.

• The organization of data formats and tools is designed to be integrated with CMS analysis workflow running on distributed computing as well as for the final stage of analysis.

• A realistic exercise of analysis skims using custom data formats containing analysis collections reconstructed with common analysis modules is being put in production– Will run in summer and autumn this year.

Page 21: Physics Analysis Tools  for the CMS experiment at LHC

Backup slidesBackup slides

Page 22: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 22

Polymorphism and “Views”Polymorphism and “Views”

• Modules can retrieve event products in a type safe way specifying the collection type:– Handle<MuonCollection> muons,– event.getByLabel(“muons”, muons);

• Modules can also specify the base class of contained (or referred to) objects via collection “View”:– Handle<View<Candidate> > leptons;– event.getByLabel(tag, leptons);

• Both collections of objects and collections of references are supported

Product tag, typically part of the configuration

Page 23: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 23

• The selection criteria definition is decoupled from the technical implementation details of selector module specializations – Specific selections are written for alignment and calibration

samples by people with no necessary experience with “core” software

– No explicit definition of cut configuration, reference and clone management is needed in most of the cases

• The most commonly used framework module are provided as part of the release, need not be explicitly instantiated by users

• If new modules are needed, most of the users request them centrally rather then instantiating them privately– The reuse of common module occurs very naturally

Generic Selectors DevelopmentGeneric Selectors Development

Page 24: Physics Analysis Tools  for the CMS experiment at LHC

Luca Lista, CHEP 2007 24

Utility Classes vs ModulesUtility Classes vs Modules

• Many common utilities are provided as framework modules– Plugging modules into sequences is easy

to do, and module reuse is very simple– EDM Provenance mechanism is useful to

tack the analysis process

• A number of tools are also provided as utility class that can be included in “private” modules– Framework overhead is reduced