ceilometer lsf-intergration-openstack-summit
TRANSCRIPT
CeilometerCERN use case:
● CERN delivers resources in form of virtual machines and via traditional batch and Grid computing
● Individual batch nodes execute payload from different users and communities
● Accounting should cover both use cases ● Interesting metrics include
● What is the resource usage of experiment A during December ?● What is the resource usage of user B last year ?
● Accounting information has to be reported to Grid bodies (WLCG) by experiment
Facts:● Details of user's jobs present in batch accounting database already ● It is a huge DB with around 400,000 records being added everyday
Solution● Use of ceilometer as single source of truth for accounting data● Batch data is put in the ceilometer database for accounting purpose
CERN's idea to use ceilometer
Ceilometer: Current Implementation
CeilometerAgent Central
With batch Plugin
CeilometerCollector
for batch Data
CeilometerDatabase
(mongodb)
RabbitMQRabbitMQ-LSF
CeilometerAgent Central
CeilometerCollector
CeilometerAPI
CeilometerAgent
Computebatch specific
instancesBatch
accounting database
IaaS specificinstances
Ceilometer: Current Implementation
● Written a ceilometer-agent-central plugin, which polls the batch accounting database for unpublished records
● The unpublished records are then pushed to metering queue (RabbitMQ)
● The ceilometer-collector instance consumes the messages from the metering queue and inserts them in the ceilometer database (mongodb)
Ceilometer: Current Implementation
● In order to decrease the load on the openstack messaging server, the batch data is being pushed to a different messaging server than the one to which other openstack messages (e.g. those from agent-compute) go.
● This means that there are dedicated instances of agent-central and collector for VM and batch metering
● The collectors writes the data into a single database
Ceilometer: LSF Data Statistics
● The batch plugin is run once per hour if the previous run has finished
● Most runs do not have any unpublished data as data in the batch accounting database arrives in bursts
● Most data of the day is published to the messaging server within 2 runs of around 200,000 job records each
● It takes around 5 hrs to complete one such run
Ceilometer: Batch Data Statistics
● The average rate of record publishing to the batch rabbitmq server is 11 Hz. This includes
– the time to read unpublished records, – push them to the rabbit-server and – marking records in batch accounting database as
published ● Most of this time is spent in records publishing only● The time for activities other than publishing is
minuscule● The grow rate of the mongodb database is about
2GB/day