GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 1
GRIDIID
UKParticlePhysics
Grid Monitoring ServicesGrid Monitoring Services
Robin MiddletonRobin Middleton
RAL/PPDRAL/PPD
24-May-0124-May-01
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 2
GRIDIID
UKParticlePhysics
OverviewOverview
What is Monitoring ?What is Monitoring ? GGF Perf-WGGGF Perf-WG DataGrid WP3DataGrid WP3 Example : NetloggerExample : Netlogger SummarySummary
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 3
GRIDIID
UKParticlePhysics
IntroductionIntroduction
Information Services part dealt with separately today Information Services part dealt with separately today
DataGrid WorkPackage 3 (WP3)DataGrid WorkPackage 3 (WP3) UK leadership / responsibility WP3 = Grid Monitoring AND Information Services
Global Grid Forum - Perf Mon WorkgroupGlobal Grid Forum - Perf Mon Workgroup http://www-didc.lbl.gov/GridPerf/
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 4
GRIDIID
UKParticlePhysics
What is Monitoring ?What is Monitoring ?
Application performanceApplication performance Fabric availabilityFabric availability Network availability / performanceNetwork availability / performance Event / AlertEvent / Alert ArchivesArchives Forecasting (e.g NWS)Forecasting (e.g NWS) IssuesIssues
update/read frequency information streaming hierarchical .vs. relational relaxed coherence; timestamps scalable; non-invasive non-repeatable
Monitoring .vs. Monitoring & Information ?Monitoring .vs. Monitoring & Information ?
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 5
GRIDIID
UKParticlePhysics
BoundariesBoundaries
MassStorage
ComputingFabric
Network
Monitoring
Application
WorkloadMgt DataMan
End-Users
Sys/Grid-Admin
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 6
GRIDIID
UKParticlePhysics
GGF : Perf-WG GGF : Perf-WG
““The Grid Performance working group is focused on The Grid Performance working group is focused on defining standards and best practices for the gathering, defining standards and best practices for the gathering, representation, storage, distribution, and query of representation, storage, distribution, and query of performance information about Grid resources and performance information about Grid resources and applications.”applications.”
Four Projects (!)Four Projects (!)1.Define a schema for data formats for performance
monitoring. This would be a common interchange format that tools could use to interoperate.
2.Taxonomy / classification of performance monitoring and analysis tools.
3.Survey of existing tools classified by the above taxonomy. 4.Recommendations on the aspects of grid applications,
services and resources that should be monitored. 5.The development of performance monitoring tools based
upon the survey of tools.
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 7
GRIDIID
UKParticlePhysics
GGF Perf-WG : Use CasesGGF Perf-WG : Use Cases
1: Instrumented library for performance measurement (e.g. I/O system)1: Instrumented library for performance measurement (e.g. I/O system)
2: Netlogger/DPSS monitoring streams to log file2: Netlogger/DPSS monitoring streams to log file
3: JAMM (Java) sensors stream data to a GUI3: JAMM (Java) sensors stream data to a GUI
4: JAMM/Port Monitor4: JAMM/Port Monitor
5: Fault detection & analysis5: Fault detection & analysis
6: Job progress monitoring6: Job progress monitoring
7: Distributed system performance analysis7: Distributed system performance analysis
8: Network-aware , self-tuning applications8: Network-aware , self-tuning applications
9: Data replication (choice of “best” location)9: Data replication (choice of “best” location)
10: Scheduling & prediction services10: Scheduling & prediction services
11: Auditing systems11: Auditing systems
12: Configuration monitoring12: Configuration monitoring
13: User application monitoring13: User application monitoring
14: Application self-tuning14: Application self-tuning
15: 15: Real-time adaptive simulation & presentationReal-time adaptive simulation & presentation
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 8
GRIDIID
UKParticlePhysics
DataGrid : WorkPackage 3DataGrid : WorkPackage 3
The aim of this workpackage is to specify, develop, integrate and test tools and infrastructure to enable end-user and administrator access to status and error information in a Grid environment and to provide an environment in which application monitoring can be carried out. This will permit both job performance optimisation as well as allowing for problem tracing and is crucial to facilitating high performance Grid computing.
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 9
GRIDIID
UKParticlePhysics
Architecture (GGF : Perf-WG)Architecture (GGF : Perf-WG)
Producer
Sensor SensorSensor
Host - A
Sensor SensorSensor
Host - B
Consumer DirectoryService
Producer
PublishSubscribe
Discovery
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 10
GRIDIID
UKParticlePhysics
WP3 : TasksWP3 : Tasks
UmbrellasUmbrellas
Task 3.1: Requirements & Design (month 1-12)
Task 3.2: Current Technology (month 1-12)
Task 3.3: Infrastructure (month 7-24)
Task 3.4: Analysis & Presentation (month 7-24)
Task 3.5: Test & Refinement (month 19-36)
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 11
GRIDIID
UKParticlePhysics
WP3 : Deliverables (as in the TA)WP3 : Deliverables (as in the TA)
D3.1 (Report) Month 12: Evaluation Report of D3.1 (Report) Month 12: Evaluation Report of current current technologytechnology
D3.2 (Report) Month 9 : Detailed D3.2 (Report) Month 9 : Detailed architectural designarchitectural design report report and evaluation criteria (also input to WP12 architecture and evaluation criteria (also input to WP12 architecture deliverable)deliverable)
D3.3 (Prototype) Month 9: Components and documentation D3.3 (Prototype) Month 9: Components and documentation for the for the First Project ReleaseFirst Project Release (see WP 6) (see WP 6)
D3.4 (Prototype) Month 21: Components and D3.4 (Prototype) Month 21: Components and documentation for the documentation for the Second Project ReleaseSecond Project Release (see WP (see WP 6)6)
D3.5 (Prototype) Month 33: Components and D3.5 (Prototype) Month 33: Components and documentation for the documentation for the Final Project ReleaseFinal Project Release (see WP 6) (see WP 6)
D3.6 (Report) Month 36: D3.6 (Report) Month 36: Final evaluationFinal evaluation report report
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 12
GRIDIID
UKParticlePhysics
WP3 : Milestones (as in the TA)WP3 : Milestones (as in the TA)
M3.1 Month 6: Decide baseline architecture & technologies.M3.1 Month 6: Decide baseline architecture & technologies.
M3.2 Month 9: Provide requirements for collation by Project M3.2 Month 9: Provide requirements for collation by Project ArchitectArchitect
M3.3 Month 9: Prototype components integrated into First M3.3 Month 9: Prototype components integrated into First Project release (see WP 6)Project release (see WP 6)
M3.4 Month 21: Interim components integrated into Second M3.4 Month 21: Interim components integrated into Second Project Release (see WP 6)Project Release (see WP 6)
M3.5 Month 33: Final components integrated into Final M3.5 Month 33: Final components integrated into Final Project Release (see WP 6)Project Release (see WP 6)
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 13
GRIDIID
UKParticlePhysics
WP3 : First Release (PM9)WP3 : First Release (PM9)
• Information services based on a new version of the Information services based on a new version of the Globus MDS (soon to be in alpha release).Globus MDS (soon to be in alpha release).
• Rudimentary implementation of a relational approach Rudimentary implementation of a relational approach to information services.to information services.
• A set of APIs in support of both MDS and GMA A set of APIs in support of both MDS and GMA approaches.approaches.
• Basic presentation of performance monitoring data based around Netlogger
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 14
GRIDIID
UKParticlePhysics
WP3 : EffortWP3 : Effort
FundedFunded UnfundedUnfundedTotalTotal
PPARCPPARC 3.03.0 1.831.83 4.834.83
SZTAKI (HU)SZTAKI (HU) 2.082.08 0.920.92 3.03.0
INFN (IT)INFN (IT) 0.00.0 1.161.16 1.161.16
IBM-UKIBM-UK 1.01.0 0.00.0 1.01.0
TotalTotal 6.086.08 3.913.91 10.010.0
+ Trinity College Dublin+ Trinity College Dublin
(NB : for both Monitoring and Information Services )(NB : for both Monitoring and Information Services )
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 15
GRIDIID
UKParticlePhysics
WP3 : Use CasesWP3 : Use Cases
Fault Detection & Analysis, Heartbeats [5]Fault Detection & Analysis, Heartbeats [5] Job Status & Progress Monitoring [6]Job Status & Progress Monitoring [6] Application Performance Monitoring [1,13]Application Performance Monitoring [1,13] Performance Analysis of Distributed Systems [7]Performance Analysis of Distributed Systems [7] Scheduling Services and Self Tuning Applications Scheduling Services and Self Tuning Applications
[8,10,14,([8,10,14,(15)]] Data Replication Services [9]Data Replication Services [9] Accounting & Auditing [11]Accounting & Auditing [11] Configuration monitoring [12]
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 16
GRIDIID
UKParticlePhysics
WP3 : Decisions (end 2000)WP3 : Decisions (end 2000)
Try to track standards & best practice from Global Grid Try to track standards & best practice from Global Grid ForumForum
evaluate, steer, adopt, … Other WPs should provide the majority of sensorsOther WPs should provide the majority of sensors
network, fabric, mass-storage WP3 will provide the instrumentation APIWP3 will provide the instrumentation API Key deliverables will beKey deliverables will be
Performance Services Error / Alert Services Status / Parameter Services Logging / Archival Services (forecasting) - information to enable other WPs to do this
WP3 subcontracts archival services (in terms of the WP3 subcontracts archival services (in terms of the data management aspects) ?data management aspects) ?
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 17
GRIDIID
UKParticlePhysics
NetloggerNetlogger
Supervisor
ProcessingNode
Readout Buffer
Acknowledgement : Weidong LiAcknowledgement : Weidong Li
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 18
GRIDIID
UKParticlePhysics
NetloggerNetlogger
Supervisor
ProcessingNode
Readout Buffer
Acknowledgement : Weidong LiAcknowledgement : Weidong Li
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 19
GRIDIID
UKParticlePhysics
Sequence DiagramSequence Diagram
SupervisorReadoutBuffer
ProcessingNode
1 2
35 4
6
7
Request
Fetch Data
Return data
Result
TIME
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 20
GRIDIID
UKParticlePhysics
ResultsResults
1 2 3 41 2 3 4
5 6 75 6 7
X : X : secssecs
Y : “count”Y : “count”
Acknowledgment : Weidong LiAcknowledgment : Weidong Li
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 21
GRIDIID
UKParticlePhysics
Netlogger SummaryNetlogger Summary
Example deploymentExample deployment Time resolutionTime resolution
NTP (~5ms) Custom h/w (~50s)
Thread safety ?Thread safety ? Variety of visualisation methodsVariety of visualisation methods ““non-invasive” ?non-invasive” ? Moving towards the GMAMoving towards the GMA
e.g. integration of directory service
GridPP Collaboration meeting - R.P.Middleton (RAL/PPD)23-25th May 2001 22
GRIDIID
UKParticlePhysics
SummarySummary
Information Service is KEY to MonitoringInformation Service is KEY to Monitoring …and nature of service to be determined !
Unified Information Architecture is importantUnified Information Architecture is important …otherwise duplication and inconsistencies
Align with Global Grid Forum for “standards”, etc.Align with Global Grid Forum for “standards”, etc. Starting point is NetloggerStarting point is Netlogger DataGrid deliverable DataGrid deliverable detailsdetails are testbed “driven” are testbed “driven” Cross-DataGrid WP - service to many areasCross-DataGrid WP - service to many areas