run control and monitor system for the cms experiment michele gulmini cern/ep – infn legnaro
DESCRIPTION
Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro On behalf of the CMS DAQ collaboration CHEP 2003, San Diego USA, March 2003. Outline. R un C ontrol and M onitor S ystem : RCMS RCMS Architecture Session Managers Subsystem Controllers - PowerPoint PPT PresentationTRANSCRIPT
Michele Gulmini, CHEP2003, San Diego USA, March 20031
CMS
Run Control and MonitorSystem for the CMS
Experiment
Michele GulminiCERN/EP – INFN Legnaro
On behalf of the CMS DAQ collaboration
CHEP 2003, San Diego USA, March 2003
Michele Gulmini, CHEP2003, San Diego USA, March 20032
CMS
OutlineOutline
Run Control and Monitor System : RCMS
• RCMS Architecture• Session Managers• Subsystem Controllers• Services
• RCMS Prototypes• RCMS for Small DAQ Systems• RCMS Demonstrators
– Performance and Scalability Tests
• Plans
• Summary
Michele Gulmini, CHEP2003, San Diego USA, March 20033
CMS
Run Control and Monitor SystemRun Control and Monitor System
RCMSInternetIntranet
InternetIntranet
UI
UI
UI
• The Run Control and Monitor System (RCMS) is the collection of hardware and software components responsible for controlling and monitoring the CMS experiment during the data taking.
• RCMS enables users to access and control the experiment from any part in the world providing a “virtual counting room”, where physicists and operators can effectively taking shifts from a distance.
• RCMS views the experiment as a set of partitions, where a partition is a grouping of entities that can be operated independently.
• Main operations are configuration, monitoring, error handling, logging and synchronization with other subsystems.
RCMS
Trigger
Event Builder
Event Filter
DCS
ComputingServices
UI
RCMS Context
Michele Gulmini, CHEP2003, San Diego USA, March 20034
CMS
PartitionsPartitions Example Example
Session Manager-A
UIUIUI
Services Connection
ServicesServicesServices
CSCtrl
TRGCtrl
DCSCtrl
EVFCtrl
CS Sub- System
GlblMuCal
DCS Sub- System
EVB Sub-System TRG Sub-System EVF Sub-System
Session Manager-B
UIUIUI
EVBCtrl
FED BuilderSub-System
RU BuilderSub-System
FED-BCtrl
RU-BCtrl
Michele Gulmini, CHEP2003, San Diego USA, March 20035
CMS
RCMS Logical LayoutRCMS Logical Layout
• The execution of the RCMS is organized on the basis of “Sessions”.
• A Session is the allocation of the hardware and software of a CMS partition needed to perform data-taking.
• Multiple Sessions may coexist and operate concurrently
• Each Session is associated with a Session Manager (SMR), that coordinates all the actions
Michele Gulmini, CHEP2003, San Diego USA, March 20036
CMS
Sub-System Controller (SSC)Sub-System Controller (SSC)
• A SSC consists of a Function Manager (FM) and a local database (DB) service.
• There is one FM per partition that receives requests from a Session Manager (SMR) and transforms them into the corresponding requests for actions that are sent to the sub-system.
• The local DB service can be used as a proxy to the services.
Michele Gulmini, CHEP2003, San Diego USA, March 20037
CMS
Basic RCMS ServicesBasic RCMS Services– SECURITY SERVICE
• login and user account management;
– RESOURCE SERVICE (RS)• information about DAQ
resources and partitions;– INFORMATION AND MONITOR
SERVICE (IMS)• Collects messages and
monitor data; distributes them to the subscribers;
– JOB CONTROL• Starts, monitors and stops
the software elements of RCMS, including the DAQ components
– PROBLEM SOLVER• Uses information from the
RS and IMS to identify mulfunctions and attempts to provide automatic recovery procedures where applicable
Michele Gulmini, CHEP2003, San Diego USA, March 20038
CMS
Resource Service Block DiagramResource Service Block Diagram
• The Resource Service (RS) handles all the hardware and software components of the DAQ system including its partitions.
SS UserDB
RS
IMS
Job Ctrl
PS
SSC
Ser
vice
s C
on
nec
tio
n
Session Manager
RCMS
UIUIUI
ConfDB
LogDB
Michele Gulmini, CHEP2003, San Diego USA, March 20039
CMS
Information and Monitor Service Block DiagramInformation and Monitor Service Block Diagram
• The Information and Monitor Service (IMS) collects the information (log, warning, errors, monitoring, etc.) from the sub-systems and provides them to the subscribers.
SS UserDB
RS
IMS
Job Ctrl
PS
SSC
Ser
vice
s C
on
nec
tio
n
Session Manager
RCMS
UIUIUI
ConfDB
LogDB
Michele Gulmini, CHEP2003, San Diego USA, March 200310
CMS
Time RequirementsTime Requirements
– Configuration and setup of the system: minutes
– Control (state change, execution of commands): seconds
– Monitoring: depending on the amount of data required
Information and Monitor Service:• Tens of subscribers• Peak: about 2000
messages (status change, log)
• Average: Tens to a few hundred messages/s
Michele Gulmini, CHEP2003, San Diego USA, March 200311
CMS
RCMS PrototypesRCMS Prototypes
• RCMS for small DAQ Systems
– Fully functional RCMS systems targeted to small DAQs (Production systems, Testbeam DAQ systems)
– Real-life examples used to check the RCMS functionality.
• RCMS demonstrators
– Partially functional RCMS systems targeted to prove scalability issues.
– Test bed systems used to emulate slices or parts of the hierarchical structure of the final DAQ.
– Help to confirm the architecture and to evaluate and eventually select the technologies to be used in the final system.
Michele Gulmini, CHEP2003, San Diego USA, March 200312
CMS
RCMS for small DAQsRCMS for small DAQs• Current Running Prototype:
– Designed to work together with XDAQ CMS online software framework (XDAQ: See Chep2003 J. Gutleber talk - “Using XDAQ in Application Scenarios of the CMS Experiment”)
– Available services:• Resource Service (RS)• Information and Monitor Service (IMS)• SubSystem Controllers (Function Managers)• Session Managers• GUIs
• Technologies and tools:
• Java Servlets (Apache Tomcat)• Sun “Java Web Services Developer Package” (JWSDP)
– JAXP, JAXM, XPath, ...• SOAP communication protocol• Databases
– XMLDB interface» eXist native XML database
– mySQL
Michele Gulmini, CHEP2003, San Diego USA, March 200313
CMS
RCMS for Small DAQs – Current ApplicationsRCMS for Small DAQs – Current Applications
• CMS Muon Drift Tubes• Chamber Production DAQ (Legnaro - Italy)• Testbeam (CERN – next May)
• CMS Tracker• “ROD System Tests” (CERN)• Testbeam (CERN – next May)
• CMS TriDAS (CERN)• DAQ Column• TDR Demonstrator
Michele Gulmini, CHEP2003, San Diego USA, March 200314
CMS
Session and Function ManagerSession and Function Manager Prototype PrototypeSS UserDB
RS
IMS
Job Ctrl
PS
SSC
Ser
vice
s C
on
nec
tio
n
Session Manager
RCMS
UIUIUI
ConfDB
LogDB
XML definitionXML definition
JavaJava
ImplementationImplementation
FF
SS
MM
SM/FM servletSM/FM servlet
• Function Managers and Session Manager have a built in Finite State Machine (FSM) to command the controlled components, and to track their state;
• The FSM is composed of a XML definition and a Java class implementation representing the actions to be performed;
• The definition and the implementation of the FSMs are managed by the Resource Service;
• Session Manager and Function Managers are launched when a new “Session” is opened, and can have a hierarchical structure;
Michele Gulmini, CHEP2003, San Diego USA, March 200315
CMS
RS and IMS PrototypeRS and IMS Prototype
IMS Xpath Xpath Filter Filter
EngineEngineJAXM
XML messageTomcat servlet containerTomcat servlet container
NOTIFY
PUBLISH
Subs InfoSubs Info
JDOM FSJDOM FS
Java
Publisher
JAXM
Java Subscriber
JAXM
SUBSCRIBE
Tomcat/Tomcat/
JettyJetty
Soap
DB (eXist,DB (eXist,
File,mySQL)File,mySQL)
XDAQ
Application
XOAP
XMLDBXMLDB
Servlet container (TOMCAT)
Java Servlet
Resource Service
XML
Java client
Java Objs
XML Parser(CASTOR)
XML Parser
C++ client
XML Parser
XMLDB
Interface
REL DB
XML:DB
SOAP
Michele Gulmini, CHEP2003, San Diego USA, March 200316
CMS
RCMS GUIsRCMS GUIs
• Generic GUI:– Insertion and retrieval of resources
(PCs, software, partitions, etc.)– Ability to command, set and retrieve
parameters from XDAQ applications– Scripting facility– Customisation facilities (plugins)
• Muon DT TestBeam GUI
Michele Gulmini, CHEP2003, San Diego USA, March 200317
CMS
Legnaro T2 CMS farm:136 P3 1-1.2 GHz processors
RCMS DemonstratorsRCMS Demonstrators
Michele Gulmini, CHEP2003, San Diego USA, March 200318
CMS
Demonstrator 1Demonstrator 1
• Exploring the ability to command a set of XDAQ executives running “empty” applications
• The time measured represents the time required to perform a state change of the entire cluster
.......... XDAQXDAQPC
XDAQXDAQ
PC
FMFMPC FM: Function Manager
.......... XDAQXDAQXDAQXDAQ
FMFM
.......... XDAQXDAQXDAQXDAQ
FMFM
FMFMPC SOAP
Michele Gulmini, CHEP2003, San Diego USA, March 200319
CMS
Demonstrator 1Demonstrator 1
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60 70 80 90 100 110 120 130
Nr. of Nodes
Tim
e (
ms
)
Sequential FM
FM with Threads
2 intermediate FMs
120 nodes120 nodes
100 ms100 ms
Michele Gulmini, CHEP2003, San Diego USA, March 200320
CMS
Demonstrator 2Demonstrator 2
0
1000
2000
3000
4000
0 1 2 3 4 5
Number of Web Services
To
tal N
o. o
f m
es
sag
es
/s(r
ece
ive
d b
y th
e s
erv
ice
s)
• Simplified version of a log message service based on Web Services technologies (Glue platform)
• 15 clients and a variable number of Web Services used
• The performance scales linearly with the number of instances of the service available
Michele Gulmini, CHEP2003, San Diego USA, March 200321
CMS
IMS Prototype Test (I)IMS Prototype Test (I)
IMS Prototype
0
50
100
150
200
250
300
350
400
0 8 16 24 32 40 48 56 64 72
Number of Publishers
To
tal N
o. O
f m
ess
age
s/s
No Persistency
Percistency on File
Percistency on DB(mySQL)
PUBLISHPUBLISH
IMSIMSIMSIMS
IMSIMSPublisherPublisher mySQLmySQL DBDB
• Percistency on eXist XML native DB not plotted – very slow• Between 200 and 300 SOAP messages/s handled by the IMS prototype
Michele Gulmini, CHEP2003, San Diego USA, March 200322
CMS
IMS Prototype Test (2)IMS Prototype Test (2)
PUBLISHPUBLISHNOTIFYNOTIFY
IMSIMSIMSIMSIMSIMS
IMSIMSIMSIMS
PublisherPublisher
IMSIMSIMSIMS
SubscriberSubscriberSUBSCRIBESUBSCRIBE
mySQLmySQL DBDBSOAP
• Performance improves augmenting the number of service instances
• Notification mechanism not optimized
• Test to be completed
4 Publishers - Persistency on DB
0
100
200
300
400
500
600
0 1 2 3
Number of Subscribers
To
tal N
o. O
f m
es
sa
ge
g/s
1 IMS
2 IMS
4 IMS0
200
400
600
1 2 3 4
Number of IMS Servlets
To
tal N
o. O
f m
ess
age
s/s
Michele Gulmini, CHEP2003, San Diego USA, March 200323
CMS
IMS hierarchical structureIMS hierarchical structure
– Performance test done with the present prototypes:
– Commanding a cluster of DAQ application fits the requirements
– Information and Monitor Service prototype needs further investigation
– Notification architecture
– Hierarchical structure
IMS hierarchical structure:
..........
IMSIMS
IMS proxyIMS proxyIMS proxyIMS proxy
XDAQXDAQ XDAQXDAQ ..........XDAQXDAQ XDAQXDAQ
Michele Gulmini, CHEP2003, San Diego USA, March 200324
CMS
Future – OGSA???Future – OGSA???
• RCMS architecture is service and web oriented
• Web services development tools (Apache Axis, Glue) may help to deploy reliable services quickly
• Open Grid Service Architecture (OGSA) (http://www.globus.org/ogsa) is Web Services based
• An alpha release of the framework is now available
• First official release foreseen in a few months time
• OGSA could be adopted for the RCMS services, providing several advantages:
• RCMS open to the Grid world
• Well supported and reliable framework
• Useful built-in services
• OGSA is under evaluation:
• The RCMS Resource Service has been successfully ported (Globus 3.0 alpha release)
• functionality and performance tests in progress
Michele Gulmini, CHEP2003, San Diego USA, March 200325
CMS
Summary and PlansSummary and Plans
• RCMS architecture defined
• Prototypes developed aiming:– Control of small DAQs to be used in Testbeam applications:
• Next May Testbeams (CMS Tracker and Muon DT) will provide important feedbacks on its functionality
– Demonstrators aiming the validation of the architecture in terms of performance and scalability
• Further investigation needed mainly on the IMS
• Open Grid Software Architecture (OGSA) under evaluation
• Problem Solver development in progress:– Error detection and recovery
• Databases studies and evaluation foreseen