cms data analysis current status and future strategy
DESCRIPTION
CMS Data Analysis Current Status and Future Strategy. On behalf of CMS Collaboration Lassi A. Tuura Northeastern University, Boston. Overview. The Context — CMS Analysis Today Data Analysis Environment Architecture Overview COBRA IGUANA GRID/Production Tomorrow and Beyond - PowerPoint PPT PresentationTRANSCRIPT
ACAT 2002ACAT 2002
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
CMS Data AnalysisCMS Data AnalysisCurrent Status and Future StrategyCurrent Status and Future Strategy
On behalf of CMS CollaborationOn behalf of CMS Collaboration
Lassi A. TuuraLassi A. Tuura
Northeastern University, Boston
2June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
OverviewOverview The Context — CMS Analysis Today Data Analysis Environment Architecture
Overview COBRA IGUANA GRID/Production
Tomorrow and Beyond Leveraging current frameworks in the Grid-enriched analysis environment Clarens client-server prototype Other prototype activities
3June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Challenges:Challenges:ComplexityComplexityGeographic DispersionGeographic DispersionDirect Access To DataDirect Access To DataMigration from Reconstruction to TriggerMigration from Reconstruction to Trigger
Environments:Environments:Real-Time Event Filter, Online MonitoringReal-Time Event Filter, Online MonitoringPre-emptive Simulation, Reconstruction, AnalysisPre-emptive Simulation, Reconstruction, AnalysisInteractive Statistical AnalysisInteractive Statistical Analysis
ContextContext
4June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Current CMS ProductionCurrent CMS Production
PythiaZebra fileswith HITS
HEPEVTNtuples
CMSIM(GEANT3)
ORCA/COBRADigitization
(merge signaland pile-up)
ObjectivityDatabase
ORCA/COBRAooHit
FormatterObjectivityDatabase
OSCAR/COBRA(GEANT4)
ORCAUser
AnalysisNtuples orRoot files
ObjectivityDatabaseIGUANA
InteractiveAnalysis
5June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Complexity of Production 2002Complexity of Production 2002
7TB toward T1
4TB toward T2File Transfer by GDMP and by perl Scripts over scp/bbcp
17TBData Size (Not including fz files from Simulation)
~11,000Number of Files
6-8Number of Production Passes for each Dataset(including analysis group processing done by production)
176 CPUsLargest Local Center
~1000Number of CPU’s
21Number of Computing Centers
11Number of Regional Centers
6June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Interactive AnalysisInteractive Analysis
Lizard Qt Lizard Qt plotterplotter
ANAPHE histogramANAPHE histogramextended with pointers extended with pointers to CMS eventsto CMS events
Emacs used to edit a CMS Emacs used to edit a CMS C++ plugin to create and fill C++ plugin to create and fill histogramshistograms
OpenInventor-based OpenInventor-based display of selected display of selected event event
Python shell with Lizard Python shell with Lizard & CMS modules& CMS modules
Most of analysis is done Most of analysis is done using NTUPLEs in PAW, using NTUPLEs in PAW, some in ROOTsome in ROOT
7June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Behind the Scenes: FrameworksBehind the Scenes: Frameworks
FederationFederationwizardswizards
Detector/EventDetector/EventDisplayDisplay
Data BrowserData Browser
Analysis jobAnalysis jobwizardswizards
Generic analysis Generic analysis ToolsTools
ORCAORCA
FAMOSFAMOS
ObjyObjytoolstools
GRIDGRID
OSCAROSCAR COBRACOBRADistributedDistributedData StoreData Store
& Computing& ComputingInfrastructureInfrastructure
CMSCMStoolstools
Consistent User Consistent User InterfaceInterface
Coherent basic tools Coherent basic tools and mechanismsand mechanisms
8June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
ODBMSODBMSGEANT
3 / 4GEANT
3 / 4CLHEPCLHEP
PAW Replacement
PAW Replacement
C++ Standard Library+ Extension ToolkitsC++ Standard Library+ Extension Toolkits
Frameworks DisectedFrameworks Disected
CalibrationObjects
CalibrationObjects Generic Generic
Application Application FrameworkFramework
Physics modulesPhysics modulesGrid-UploadableGrid-Uploadable
BasicBasicServicesServices
Adapters and ExtensionsAdapters and Extensions
ConfigurationObjects
ConfigurationObjects Event
Objects Event
Objects
(Grid-aware) Data-Products
(Grid-aware) Data-Products
SpecificSpecificFrameworksFrameworks
EventEventFilterFilterEventEventFilterFilter
ReconstructionReconstructionAlgorithmsAlgorithms
ReconstructionReconstructionAlgorithmsAlgorithms
PhysicsPhysicsAnalysisAnalysisPhysicsPhysicsAnalysisAnalysis
DataDataMonitoringMonitoring
DataDataMonitoringMonitoring
9June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Several frameworks provide the environment together Open: No central framework with all functionality
– Frameworks are designed to be extensible
– … and to collaborate with other software Coherent: User sees “final” smooth interface
– Achieved by integrating the frameworks together
– … but the user does not do this work him/herself ! Design applied at both framework and object design level
Successfully applied in many parts of CMS software Applications, persistency; sub-frameworks; visualisation; … No loss of usability, functionality or performance Has made it easy to integrate directly with many existing tools
This is nothing novel — it is part of the standard risk-mitigation strategy of any modern industrial solution
Framework Design BasisFramework Design Basis
10
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Frameworks: COBRAFrameworks: COBRA
FederationFederationwizardswizards
Detector/EventDetector/EventDisplayDisplay
Data BrowserData Browser
Analysis jobAnalysis jobwizardswizards
Generic analysis Generic analysis ToolsTools
ORCAORCA
FAMOSFAMOS
ObjyObjytoolstools
GRIDGRID
OSCAROSCAR COBRACOBRADistributedDistributedData StoreData Store
& Computing& ComputingInfrastructureInfrastructure
CMSCMStoolstools
Consistent User Consistent User InterfaceInterface
Coherent basic tools Coherent basic tools and mechanismsand mechanisms
11
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
COBRA: Main ComponentsCOBRA: Main Components Push- and pull-mode execution—and any mixture
Reconstruction-on-demand is a key concept in COBRA Detector-centric reconstruction—push data from event Reconstruction-unit-centric reconstruction—pull/create data as needed
Event data and related structures Basic support for commonly needed objects (hits, digis, containers, …)
Application environments Basic application frameworks, various semi-specialised applications Lots of error-handling and recovery code (automatic recovery after crash,
…)
Meta data: a key component Data chunking, system and user collections, data streams, file management,
job concepts, configuration and setup records, redirected navigation after reprocessing, …
12
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
COBRA: Main StrengthsCOBRA: Main Strengths Algorithms in plug-ins
“Publish-yourself-plug-ins”—self-describing data producers
Strong meta-data facilities Reconstruction-on-demand matches data product concept very well
– Grid virtual data products concept really just an extension Convenient mapping of data products to chunks: files, containers, … Scatter / gather: decompose jobs, gather data
– One logical job can be chopped into many physical processes, we still know it is logically the same job no matter which process it is running in
Adapts automatically to many environments without special configuration: interactive, batch, farm, stand-alone, trigger, … Through appropriate use of enabling techniques (transactions, locking, refs) No data post-processing required Well-matched to production tools (IMPALA)
13
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
StorageManagerStorageManager
SchemaManagerSchemaManager
TransactionManager
TransactionManager
C++BindingFile I/OFile I/O
LockServerLock
Server
PageServerPage
Server
Catalog ManagerCatalog Manager
DDL SourceProcessingDDL SourceProcessing
MetaDataMetaData
ObjectAccessObjectAccess
MSS, Grid& Farm
Interface
MSS, Grid& Farm
Interface
Objectivity
14
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Refs &NavigationRefs &Navigation
QueriesQueries
CacheManagementCacheManagement
StorageManagerStorageManager
SchemaManagerSchemaManager
TransactionManager
TransactionManager
C++BindingFile I/OFile I/O
LockServerLock
Server
PageServerPage
Server
Catalog ManagerCatalog Manager
DDL SourceProcessingDDL SourceProcessing
MetaDataMetaData
ObjectAccessObjectAccess
MSS, Grid& Farm
Interface
MSS, Grid& Farm
Interface
Objectivity
15
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
ObjectNamingObjectNaming
Configurations(Data Sets)Configurations(Data Sets)
CollectionsCollections
Run Resume &Crash RecoveryRun Resume &Crash Recovery
StorageManagerStorageManager
SchemaManagerSchemaManager
TransactionManager
TransactionManager
C++BindingFile I/OFile I/O
LockServerLock
Server
PageServerPage
Server
Catalog ManagerCatalog Manager
DDL SourceProcessingDDL SourceProcessing
MetaDataMetaData
ObjectAccessObjectAccess
MSS, Grid& Farm
Interface
MSS, Grid& Farm
Interface
Objectivity
16
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
File SizeControlFile SizeControl
FarmManagementFarmManagement
SystemManagementSystemManagement
StorageManagerStorageManager
SchemaManagerSchemaManager
TransactionManager
TransactionManager
C++BindingFile I/OFile I/O
LockServerLock
Server
PageServerPage
Server
Catalog ManagerCatalog Manager
DDL SourceProcessingDDL SourceProcessing
MetaDataMetaData
ObjectAccessObjectAccess
MSS, Grid& Farm
Interface
MSS, Grid& Farm
Interface
Objectivity
17
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Frameworks: IGUANAFrameworks: IGUANA
FederationFederationwizardswizards
Detector/EventDetector/EventDisplayDisplay
Data BrowserData Browser
Analysis jobAnalysis jobwizardswizards
Generic analysis Generic analysis ToolsTools
ORCAORCA
FAMOSFAMOS
ObjyObjytoolstools
GRIDGRID
OSCAROSCAR COBRACOBRADistributedDistributedData StoreData Store
& Computing& ComputingInfrastructureInfrastructure
CMSCMStoolstools
Consistent User Consistent User InterfaceInterface
Coherent basic tools Coherent basic tools and mechanismsand mechanisms
18
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
User Interface and VisualisationUser Interface and Visualisation IGUANA: a generic toolkit for user interfaces and visualisation
Builds on existing high-quality libraries (Qt, OpenInventor, Anaphe, …) Used to implement specific visualisation applications in other projects
Main technical focus: provide a platform that makes it easy to integrate GUIs as a coherent whole, to provide application services and to visualise any application object Many categories / layers: GUI gadgets & support, application environment,
data visualisers, data representation methods, control panels, … Designed to integrate with and into other applications Virtually everything is in plug-ins (can still be statically linked)
Plug-InCachePlug-In
Cache
ObjectFactoryObject
FactoryComponentDatabase Plug-In
Cache
Plug-InPlug-In
Plug-InPlug-In
Plug-In ObjectFactory
AttachedUnattache
d
19
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Illustration: 3D VisualisationIllustration: 3D Visualisation
QMainWindowBrowser Site
QMDIShellBrowser Site
QMDIShellBrowser Site
3DBrowser
TwigBrowser
20
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
IGUANA GUI IntegrationIGUANA GUI Integration
IntegrationIntegration
ActionAction
Visualise Results,Visualise Results,Modify Objects,Modify Objects,
Further InteractionFurther Interaction
21
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Tomorrow and BeyondTomorrow and Beyond Leverage the current frameworks on the grid
Many native COBRA concepts match well with grid– (Virtual) data products ~ reconstruction-on-demand– Recording and matching configuration and setup information– Production interfaces: catalogs, redirection, MSS hooks– Scatter/gather job decomposition, production environment
COBRA-based applications can be encapsulated for distributed analysis IGUANA already separates application objects, model and viewer
– Many possibilities for introducing distributed links IGUANA+COBRA provides a platform for a coherent, well-integrated
interface no matter where the code runs and data comes and goes– Both have loads of knobs and hooks for integration
Aiming at adapting the existing software where possible Adapt and work within CMS software (COBRA, ORCA, …) and
existing analysis tools (ROOT, Lizard, …)—don’t replace them
22
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Client
RPC
Web Server
Clarens
Service
http
/htt
ps
Prototypes: Clarens Web PortalsPrototypes: Clarens Web Portals Grid-enabling the working environment for
physicists' data analysis Communication with clients via the
commodity XML-RPC protocol Implementation independence
Server implemented in C++: access to the CMS OO analysis toolkit
Server provides a remote API to Grid tools The Virtual Data Toolkit: Object collection access Data movement between tier centres using GSI-FTP CMS analysis software (ORCA/COBRA) Security services provided by the Grid (GSI) No Globus needed on client side, only certificate
23
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Tool plugin
module
Production system and data repositories
ORCA analysis farm(s) (or distributed `farm’ using grid queues)
RDBMS based data
warehouse(s)
PIAF/Proof/..type analysis
farm(s)
Local disk
User
TAGs/AODsdata flow
Physics Query flow
Tier 1/2
Tier 0/1/2
Tier 3/4/5
Productiondata flow
TAG and AOD extraction/conversion/transport services
Data extractionWeb service(s)
Local analysis tool: Lizard/ROOT/… Web browser
Query Web service(s)
Prototypes: Clarens Web Portals…Prototypes: Clarens Web Portals…
24
June, 2002 Lassi A. Tuura, Northeastern Universityhttp://iguana.cern.ch
Other PrototypesOther Prototypes Tag database optimisation
Fast sample selection is crucial Various models already tried Experimenting with RDBMS
MOP: distributed job submission system Allows submission of CMS
production jobs from a central location, run on remote locations, and return results
– Job Specification: IMPALA
– Replication: GDMP
– Globus GRAM
– Job Scheduling: Condor-G and local systems