cern it department ch-1211 genève 23 switzerland t it configuration activities gavin mccance...
TRANSCRIPT
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
IT Configuration Activities
Gavin McCanceOnline Cross-experiment Meeting, 14 June 2012
Why?
2
• We’re changing the tools we use to manage the centre
• Ten years ago, we were big in compute– There were no real IT ops tools at our scale, so we
developed our own– Our tools are becoming increasingly brittle and high
maintenance– Inefficiencies exist but root cause cannot be easily
identified– Learning curve remains high
• About to expand to new remote tier-0• Our needs are no longer special
Why?
• Last few years have seen an explosion in the IT operations tool space– Configuration, management and monitoring– Large, supportive user communities
• Strategy is absolute minimum development– Other than involvement in upstream projects
3
Scaling challenges: hosts and people
4
• Currently we have 10k hosts• We’ll add another 5k in the medium term and move to VMs
– 50 – 300k “hosts” depending on how we chop the CPUs up
• Many, diverse applications (“clusters”) managed by different teams
• ..and 700+ other “unmanaged” Linux nodes in VMs that could benefit from a simple configuration system
What’s the config stack?
• Based around the Puppet tool and eco-system– Declarative configuration tool– Scales well– Very active, wide community– Very well integrated with other tools
5
Deployment status
• ~140 nodes in test with single puppetmaster– Will be soon expanding to 4k (virtual) nodes on load-
balanced puppet setup– Integrating with Openstack for VMs
• Investigating and understanding tools– IT-internal “early adopters” starting (castor, lxbatch,
lxplus, webservices, …)
• Foreman dashboard as front-end and ENC
6
Major bits
• Puppet and Foreman dashboard using git to version the templates– We’re putting “useful to others” modules in
https://github.com/cernops
– We’ve added integration of Puppet to the CERN CA
– Hiera for cluster-specific parameterisation• Should make modules more portable in the future.
• Our software (and scripts) are built using Koji -> mash -> yum
• Automation: Looking at Crucible for automated configuration-code-review
• Keeping Lemon for monitoring (for now) though changing alarms to use messaging notifications
• mcollective for task orchestration
7
Current arch
8
mcollective: task orchestration
9
• Broadcast
• Run
• Collect
• Very fast response
• Automatable
Interesting CERNish modules
• Will be putting things in https://github.com/cernops
• Modules– AFS– Keytab, Kerberos– CVMFS– SSO with Apache httpd– SSL Apache load-balancer– CERN auth with LDAP (SSSD)– CERN Lemon– + usual OS level configurations
• Openstack integration• Cloud-init auto-registration into Puppet
10
Summary
• We’re moving to standard tools for configuration (and VM + monitoring)
• We’re gaining experience using Puppet and friends– Internal IT early adopters now– On track to move our IT services 2013
• We interested to collaborate on the work
11