atlas distributed analysis: overview

22
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop

Upload: urania

Post on 14-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

ATLAS Distributed Analysis: Overview. Distributed Analysis working group ATLAS software workshop. David Adams BNL December 8, 2004. ADA Architecture Components Datasets Transformations Services Changes Generic dataset schema Hierarchical content DIAL catalog interfaces - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ATLAS Distributed Analysis: Overview

David Adams

ATLAS

ATLAS Distributed Analysis:Overview

David AdamsBNL

December 8, 2004

Distributed Analysis working groupATLAS software workshop

Page 2: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 2David Adams

ATLAS

Contents

ADA Architecture

Components• Datasets

• Transformations

Services

Changes• Generic dataset schema

• Hierarchical content

• DIAL catalog interfaces

Goals for this release

Current status

Goals for the next release• Transformation interface

Conclusions

Page 3: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 3David Adams

ATLAS

ADA Architecture

R O O T P Y T H O N

A M I D B S D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R gLite W M SA T P R O D

G U I andc o m m and l inec l ie nts

H igh le ve l s e rvic e sfo r c atalo ging andjo b s ubm is s io n andm o nito r ing

W o rklo adm anage m e nts ys te m s

AJ D L

s h S Q L g L ite

AM I w s

AJ D L

Generalized

Page 4: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 4David Adams

ATLAS

ComponentsADA model

• Data described by a dataset– Location of data, e.g. files

– Content, e.g. list of event ID’s and the type of data for each event

• Transformation describes an operation that can act on a dataset to produce a new dataset

– Application = code shared by multiple transformations

– Task = user-supplied configuration (parameters or code)

• Job is an instance of a transformation acting on a dataset– User preferences may be provided

> Should not affect the essential result

– Typically run as a collection of sub-jobs by splitting the input dataset

> Each sub-job applies the same xform its sub-dataset

> Results (output datasets) must be merged

– More generally the transformation might be a DAG (future)

Page 5: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 5David Adams

ATLAS

Components (cont)

D atase t 1 D atase t 2

D atase t

U se r a n a ly sisfra m e w o rk

A p p lic a tio n T a sk

C od e P a ra m s

7 . c re a te

4 . s e le c t

2 . s e le c t 3 . c re a te o r s e le c t

A n alys isS ervice

1 . c re a te o r lo c a te5 . s u b m it(a p p ,ts k ,d s )

R esult 1

R esult 2

Jo b 1

Jo b 2

8 . ru n(a p p ,ts k ,d s 1 )

8 . ru n(a p p ,ts k ,d s 2 )

9 . fill

9 . fill

1 0 . ga the r

6 . s p lit

D IA L c o m p o nentsS ep tem b er 20, 2004

R O O T ,G AN G A, . . .

E v en t d a ta ,s u m m ar y d a ta ,tu p les , . .

Ath en a , d ia lp aw ,R O O T , . . .

Transformation

Page 6: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 6David Adams

ATLAS

DatasetsDatasets enable users to examine and access data

For ATLAS data, we identify• Types of data

– Used to define dataset categories

– Category is part of the content specification

• Types of datasets– Currently C++ classes with XML data representation

– Third column indicates if this class exists

– Parameter in the new dataset XML

• See table on following page for ATLAS examples

There is now a single XML schema for all types of datasets

Page 7: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 7David Adams

ATLAS

DatasetsName Type ? Description

EVIDS EventDataset × List of event ID’s

EVGEN AtlasPoolEventDataset × From event generator

HITS AtlasPoolEventDataset × Hits, e.g. from GEANT

DIGITS AtlasPoolEventDataset × Digitization of hits

BYTE AtlasByteStreamEventDataset Raw data

ESD AtlasPoolEventDataset × Event summary data

AOD AtlasPoolEventDataset × Analysis oriented data

TAG AtlasPoolTagEventDataset Event metadata

NTUP RootNtupleDataset Ntuples

HISTO RootHistogramDataset × Histograms

CBNT CbntDataset × DC1 combined ntuples

TEXT TextDataset Text data, e.g. log files

Page 8: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 8David Adams

ATLAS

Datasets (cont)acas001> dataset_property -i 10003-20151 print AtlasPoolEventDataset 10003-20151 with no parent is locked and not empty   Content includes 1 block:     AtlasPoolEventDataset:AOD       Content ID list has 17 entries:         type MissingET with with key MET_Base         type MissingET with with key MET_Calib         type MissingET with with key MET_Truth         type ParticleBaseContainer with with key BCandidates         type ParticleBaseContainer with with key ElectronCollection         type ParticleBaseContainer with with key MuonCollection         type ParticleBaseContainer with with key ParticleJetContainer         type ParticleBaseContainer with with key PhotonCollection         type ParticleBaseContainer with with key TauJetCollection         type LVL1_ROI with with key LVL1_ROI         type VxContainer with with key VxPrimaryCandidate         type CTP_Decision with with key CTP_Decision         type INavigable4MomentumCollection with with key MuonboyTrackParticles         type INavigable4MomentumCollection with with key TrackParticleCandidate         type Rec::TrackParticleContainer with with key MooreTrackParticles         type Rec::TrackParticleContainer with with key MuidCombnoSeedTrackParticles         type Rec::TrackParticleContainer with with key MuidStandAloneTrackParticles       Event count is 1073   Location has 1 logical file:     Logical file:       Catalog: MagdaFileCatalog:Atlas       ID: AOD_3401_MultiLeptonGamma.AOD.pool.root       State: READONLY

Content

Location

ID

Type

Example dataset

Too many events to list

Content type

No sub-datasets

Page 9: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 9David Adams

ATLAS

Transformations (cont)IN \ O UT EVT ID S EVG EN HIT S DIG ITS B Y TE E S D A O D TA G NTUP HIS TO

ID BLD D AQ

EVT ID S G EN

EVG EN G 4SIM G 4SIM G 4SIM

HIT S D IG I D IG I D IG I

D IG IT S PAC K RECO R EC O R EC O

BYT E UNPAC K

ESD AO D BLD

AO D SELEC T T AG BLD ANALYZE ANALYZE

T AG SELEC T

NT UP ANALYZE ANALYZE

For ATLAS we identify the above transformations• Characterized by input and output dataset categories• Most common ones listed above

– Others likely

• Those available now are highlighted– See talks by F. Fassi and C. Haeberli

Soon

Now

Page 10: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 10David Adams

ATLAS

ServicesServices enable users to find and examine existing data and create new data.

Services include:• Analysis services to submit and monitor jobs

• Catalog services to– Select data

– Record data, metadata and transformations

– Examine and record data provenance

• Data management services to access the data (files)

Clients provide the user interface to these services• ROOT command line

• Python command line (back soon)

• GUI (based on Python) planned

Page 11: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 11David Adams

ATLAS

ChangesMove from DIAL 0.92 to 0.94 (almost)

• Generic dataset schema (see following)

• Hierarchical content (see following)

• Unique ID service

• Many changes to catalog interface (see following)

Transformations• Integration with production system (C. Haeberli)

• Integrate analysis algorithm from the analysis tools group (F. Fassi)

Package management• Define user/application interface (G. Rybkine)

• Provide reference implementation (G. Rybkine)

Page 12: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 12David Adams

ATLAS

Changes (cont)Analysis services

• Continued integration with GLite (D.Liko)

• Begin work on prodsys analysis service (F. Brochu)

Data management• Improved understanding of SRM

• Integration of gLite prototype file catalog into DQ (F. Orellana)

Page 13: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 13David Adams

ATLAS

Generic dataset schemaVersion 0.94 of DIAL include a class GenericDataset

• Means to write to and read from an XML description

• All ADA datasets inherit from this without adding persistent data

Advantages• Processing system does not need to know the full dataset type

• Much easier to make use of datasets outside of DIAL– Including languages other than C++, e.g. python

Other components already have generic schema• I.e., the application, task, job

• Schema for the first two need work

Page 14: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 14David Adams

ATLAS

Hierarchical contentEach dataset description includes content:

• List of event ID’s if relevant and not too large

• List of type-keys describing the contained object– For each event in an event dataset

– Like the type-keys in StoreGate

• New release sorts this list into groups– typically one per processing stage

– For ATLAS: RDO, ESD, AOD, …

– Dataset can now hold both ESD and AOD with clear distinction

Page 15: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 15David Adams

ATLAS

DIAL catalog interfaceMuch work in DIAL to rationalize the interface through which users interact with catalogs

• Class interface for standard catalog types– XyzRepository stores string (XML) descriptions of Xyz objects– XyzSelectionCatalog associates metadata with Xyz ID and name– XyzReplicaCatalog associates replica-logical ID’s for Xyz– Here Xyz = Dataset, Job, Application, Task, …

• Generic interface for each of the above– String ID instead Xyz ID– So implementation of GenericRepository interface can be shared by

DatasetRepository, JobRepository, …

• Generic implementations include– File based (only for GenericRepository)– MySQL table– AMI– Web service (so far only GenericRepository)

Page 16: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 16David Adams

ATLAS

Goals for this releaseUser should be able to

• Select dataset from DSC (dataset selection catalog)

• Run aodhisto transformation– Input is any AOD (or other event collection) dataset

– Output is a dataset containing root histograms

– Makes use of the analysis tools algorithm

– User can supply their own job options and analysis algorithm

• Run atlasreco transformation– Input is any RDO dataset

– Output is ESD dataset

– Makes use of the production system transform for release 9.0.x

• Monitor job status for running jobs

• Get description including location of any output dataset

• Easily view the histograms in a root histogram dataset

Page 17: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 17David Adams

ATLAS

Current statusReleases

• DIAL release 0.94 is on hold until everything else needed for the release goals is in place

• Dial 0.93 changes often but is now close to what 0.94 will be

Functionality• Root demos 4 and 5 have been added to illustrate use of aodhisto

and atlasreco, respectively

• aodhisto has only been run with one dataset at one site

• atlasreco cannot use 9.0.2 and is flaky with 9.0.1 due to ATLAS SW problems

• Magda is being used to catalog and move files

• A few demo single-file datasets have been cataloged– See http://www.atlasgrid.bnl.gov/dialds/dlShowMain.pl

Page 18: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 18David Adams

ATLAS

Goals for the next releaseTransformations

• Clarify transformation interface (see following)– So users can add transformations

• Continue development of aodhisto (F. Fassi)

• Complete suite of prodsys transformations (C. Haeberli)

Catalogs• Build catalog of datasets from existing production and user data

• Add transformation catalogs

• Add local (to server) and global job catalogs

• Provide catalog interface integrated with job submission client(s)

Analysis services• Enable ADA production with DC2 production system (F. Brochu)

• Enable ADA production and analysis with gLite WMS (D. Liko)

Page 19: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 19David Adams

ATLAS

Goals for the next release (cont)Data management

• User anywhere can put and get data from a storage element (SE)

• SE can retrieve requested data from other SE’s

• Integrate DIAL with DQ and SRM (F. Orellana)

Package management• Continue development of ADA package management interface

and implementations (G. Rybkine)

• Integrate DIAL with ADA PM system

• Deploy PM at processing sites, i.e. integrate with existing systems

AJDL• Revisit transformation specification

• Integrate with GANGA and DC2 production system

Better error reporting

Page 20: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 20David Adams

ATLAS

Transformation interfaceClarify and document transformation interface

• How xform is packaged and released

• How analysis service finds xform

• Runtime environment that a xform can expect

• How xform is called

• How xform finds input dataset and extracts it files

• How transform locates software (including itself and its task)

• How transform stores output files and creates output dataset

• How transform indicates job status (running, failed, done, …)

• How task (user code) is built and accessed

Make it easy for users to add their own transformations• E.g. run my athena algorithm

• Keep task mechanism for runtime configuration

Page 21: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 21David Adams

ATLAS

ConclusionsStatus

• Much progress since last meeting but more to do

• Still in demo mode

Releases• Expect DIAL 0.94 soon

– When other pieces are in place

– Then like to get feedback on interface and functionality

• Aim for ADA/DIAL 1.0 in February– Useful system: more than demo

– Meeting the short-term goals outlined earlier

• Need more people– Within ADA

– More attention from external providers (DQ, AMI, prodsys)

– Physics contributions of data and algorithms

Page 22: ATLAS Distributed Analysis: Overview

ATLAS SW Wkshp ADA Overview Dec 8, 2004 22David Adams

ATLAS

More informationFor more information on ADA, see the home page

• http://www.usatlas.bnl.gov/ADA

• Includes status of subprojects, relevant talks and documents, and links to associated projects

DIAL release 0.94 is described at• http://www.usatlas.bnl.gov/~dladams/dial/releases/0.94/index.html

To try it out, run DIAL root demos 4 and 5 in that release

Comments and questions• ADA mailing list

• ADA Savannah coming soon

• DIAL Savannah (with bug reporting) linked from DIAL page