ceilometer lsf-intergration-openstack-summit

CeilometerCERN use case:

● CERN delivers resources in form of virtual machines and via traditional batch and Grid computing

● Individual batch nodes execute payload from different users and communities

● Accounting should cover both use cases ● Interesting metrics include

● What is the resource usage of experiment A during December ?● What is the resource usage of user B last year ?

● Accounting information has to be reported to Grid bodies (WLCG) by experiment

Facts:● Details of user's jobs present in batch accounting database already ● It is a huge DB with around 400,000 records being added everyday

Solution● Use of ceilometer as single source of truth for accounting data● Batch data is put in the ceilometer database for accounting purpose

CERN's idea to use ceilometer

Ceilometer: Current Implementation

CeilometerAgent Central

With batch Plugin

CeilometerCollector

for batch Data

CeilometerDatabase

(mongodb)

RabbitMQRabbitMQ-LSF

CeilometerAgent Central

CeilometerCollector

CeilometerAPI

CeilometerAgent

Computebatch specific

instancesBatch

accounting database

IaaS specificinstances


● Written a ceilometer-agent-central plugin, which polls the batch accounting database for unpublished records

● The unpublished records are then pushed to metering queue (RabbitMQ)

● The ceilometer-collector instance consumes the messages from the metering queue and inserts them in the ceilometer database (mongodb)


● In order to decrease the load on the openstack messaging server, the batch data is being pushed to a different messaging server than the one to which other openstack messages (e.g. those from agent-compute) go.

● This means that there are dedicated instances of agent-central and collector for VM and batch metering

● The collectors writes the data into a single database

Ceilometer: LSF Data Statistics

● The batch plugin is run once per hour if the previous run has finished

● Most runs do not have any unpublished data as data in the batch accounting database arrives in bursts

● Most data of the day is published to the messaging server within 2 runs of around 200,000 job records each

● It takes around 5 hrs to complete one such run

Ceilometer: Batch Data Statistics

● The average rate of record publishing to the batch rabbitmq server is 11 Hz. This includes

– the time to read unpublished records, – push them to the rabbit-server and – marking records in batch accounting database as

published ● Most of this time is spent in records publishing only● The time for activities other than publishing is

minuscule● The grow rate of the mongodb database is about

2GB/day

ceilometer lsf-intergration-openstack-summit

Documents