openstack @ cern · introduction cern cloud ... - iaas based on openstack “all servers shall be...

43

Upload: others

Post on 16-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate
Page 2: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Openstack @ CERN

José Castro LeónCERN Cloud Infrastructure

Page 3: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Outlines

3

● Introduction

● CERN Cloud service

– Service operations

– Service automation

– Baremetal provisioning

– Storage Services

● Upcoming work

● Q & A

Page 4: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

European Organization for Nuclear Research

4

● World largest particle physics laboratory

● Founded in 1954

● 22 member states

● Fundamental research in physics

Page 5: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

5

● Infrastructure as a Service

● Production since July 2013

● CentOS 7 based

● Geneva and Wigner Computer centres

● Highly scalable architecture > 70 nova cells

– 2 regions

● Currently running Rocky release

CERN Cloud Service

Page 6: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

6

Page 7: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

7

Back in 2012

0

20

40

60

80

100

120

140

160

Run 1 Run 2 Run 3 Run 4

GRID

ATLAS

CMS

LHCb

ALICE

● LHC Computing and Data requirements where increasing

● Constant team size

● LS one ahead next window on 2019

● Other deployments have surpassed CERN

we werethere

what wecan afford

3 core areas: - Centralized Monitoring

- Configuration management

- IaaS based on OpenStack

“All servers shall be virtual!”

Page 8: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

8

Situation now

● 300k core cloud and increasing

– Addition of new services

– Continuous improvements on existing ones

● No change in number of staff

● Follow technological trends

– Incorporate new use cases

– Integrate them into ecosystem

– Improve current infrastructure

Page 9: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

CERN Cloud Infrastructure – initial offering

9

IaaS

IaaS+

Compute Storage

glance keystone

Identity

horizon

Web UI

nova

Page 10: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

CERN Cloud Infrastructure - now

10

IaaSneutron manila

Network

Orchestration

heat

barbican

Container Orchestration

magnum

Automation

mistral

IaaS+

Key manager

Compute Storage

cinder glance keystone

Identity

horizon

Web UI Optimization

watcher

HA

masakarirally

Testdeploy

ironic nova

Page 11: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Service operations

● Availability techniques for users

– 3 availability zones in Meyrin, 2 in Wigner + critical area

● Eat our own dogfood (use same tools as the rest of IT)

● Automation “likes”

– Delegate some administrative tasks

– Detect and fix known issues

– Communicate with end users

● Quite some global campaigns:

– Consolidation to KVM, Spectre/Meltdown and L1TF

Page 12: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Patch the entire cloud

● Patching the cloud after Spectre/Meltdown/L1TF

– L1TF ~1100 servers rebooted (~11.5k VMs)

● Validated on QA environment

● Review steps

– Install latest kernel, make sure is default

– Configure l1tf_full kernel boot option

– Reboot and wait

– Check hypervisors and VMs

Page 13: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Patch the entire cloud feedback

● L1TF announcement upfront to all user community

● ASDF announcements and updates

● Updates via SSB (and Town Square in mattermost)

● Reachable by tickets

– Only 5 service teams tickets on L1TF campaign

● No serious issues found during campaign

– Performance impact after disabling SMT

Page 14: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Automation in the CERN Cloud

14

IaaSneutron manila

Network

Orchestration

heat

barbican

Container Orchestration

magnum

Automation

mistral

IaaS+

Key manager

Compute Storage

cinder glance keystone

Identity

horizon

Web UI Optimization

watcher

HA

masakarirally

Testdeploy

ironic nova

Page 15: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Automation in the CERN Cloud - architecture

15

mistral

C

HR

Resources

cornerstone

collectd

grafana

GNI

rally

Page 16: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

16

Automation in the CERN Cloud - topics

Resource Lifecycle management

Host and Servicemonitoring

Optimize resourceavailability

Improve VM availability

and Performance

Page 17: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

17

Host and Service Monitoring

● Monitor HW events with Collectd

● Collect service logs through Flume

● General Notification Infrastructure

– Support tickets for repairs

● Service alarms in Grafana

● Rundeck jobs

– Time-scheduled jobs to fix common issues

– Offload ticket handling

– Schedule interventions

Page 18: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

18

RunDeck: Task delegation

collectd GNI

● Rely on Rundeck for offloading tasks to different teams

– Procurement

– Repair Team

– Resource Coordinator

– Cloud Service operations

● Example: disk replacement

RepairTeam

Page 19: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

19

Resource Lifecycle Management

● Types of projects

● Provisioning and cleanup in Mistral workflows

– Service inter-dependencies

Affiliation Expired User Disabled User Deletion

Shared Promote - -

Personal - Stop Delete

Page 20: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

21

Resource Lifecycle Management for end user

mistral

Page 21: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

22

Optimize resource availability - Expiration

● Each VM in a personal project has an expiration date

● Set shortly after creation and evaluated daily

● Configured to 180 days and renewable

● Reminder mails starting 30 days before expiration

● Implemented on a Workbook in Mistral

ACTIVE EXPIRED

Reminder Expiration Deletion

Page 22: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

23

Expiration of Personal Instances

Page 23: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

25

Improve Cloud utilization

userVMs

pre

userVMs

preaardvark

A

userVMs

pre

userVMs

Page 24: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Baremetal provisioning

26

IaaSneutron manila

Network

Orchestration

heat

barbican

Container Orchestration

magnum

Automation

mistral

IaaS+

Key manager

Compute Storage

cinder glance keystone

Identity

horizon

Web UI Optimization

watcher

HA

masakarirally

Testdeploy

ironic nova

Page 25: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

27

Why baremetal provisioning?

ironic● Vms not sensible/suitable for all of our use cases

– Storage nodes, HPC clusters,

● Complete our service offering

– Physical nodes (in addition to VMs and containers)

– OpenStack as single pane of glass

● Simplify hardware provisioning workflows

● Consolidate accounting & bookkeeping

– Machine re-assignments will be easier to track

Page 26: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

28

Baremetal as a Service

● Provision ‘physical’ instances

● Compute service manages physical servers as if they were virtual machines

● Users interfaces with Nova

– Quotas, scheduling, ...

● HW management via common interfaces

– PXE, IPMI

– Allows for unified interface to manage the whole park

ironic

NovaAPI + Scheduler

Requests physical instance

IronicAPI + Conductor

NovaCompute

picks

Ironic Driver

Physical Servers

User

enrolls

Admin

Glance

Neutron

Page 27: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

29

Ironic components

ironic

Ironic DB

Inspector DB

ironic-api

Message Queue

Nova

ironic-conductor

ironic-inspector

Admin

Physical Servers

ironic-python-agent (IPA)

RESTAPI

ironic-inspectorcan be used for in-band inspection

(boot node into RAM disk,collect data and update DB) ironic-api

receives, authenticates, andhandles requests (by RPC’ing

the ironic-conductor)

ironic-conductororchestrates node tasks:

add, edit, delete, provision,deploy, clean, power, …

Database for service data(e.g. nodes/ports, conductors)

Message queue for inter-component communication

Node inspection states

Page 28: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

30

Ironic Service setup and status

ironic

Users:

- Cloud

- HPC

- Windows

- DB

- ...

Page 29: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

31

A new use case: Containers on Baremetal

● Put together OpenStack managed containers and baremetal

● General service offer: managed clusters

– Users get only K8s credentials

– Cloud team manages the cluster and the underlying infra

● Batch farm runs in VMs as well

– 3% performance overhead, 0% with containers

– Federated kubernetes for hybrid cloud integration

Page 30: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Storage Services

32

IaaSneutron ironic manila

Network

Orchestration

heat

barbican

Container Orchestration

magnum

Automation

mistral

IaaS+

Key manager

Compute Storage

nova cinder glance keystone

Identity

horizon

Web UI Optimization

watcher

HA

masakarirally

Testdeploy

Page 31: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

33

Block Storage as a Service

● Allows to add additional block devices to instances

● Connects to several Ceph clusters

● Volume types define QoS client capabilities and/or location

– standard, io1, cp1, cpio1, wig-cp1, wig-cpio1, hyperc

● Volume type mapped to single cluster => no availability zones

● Last upstream contributions

– Deferred deletion

– Extension of RBD in-use volumes

cinder

Page 32: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

34

cinder

Page 33: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

35

File shares as a Service

● #1 user request

– Block devices <> File Shares

● Share protocols

– CephFS

● Use cases

– High-Performance Computing

– Replacement of NFS Filers

● Ongoing work

– Enable NFS access through Ganesha

manila

Page 34: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

36

manila

Page 35: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Upcoming work

37

IaaSneutron ironic manila

Network

Orchestration

heat

barbican

Container Orchestration

magnum

Automation

mistral

IaaS+

Key manager

Compute Storage

nova cinder glance keystone

Identity

horizon

Web UI Optimization

watcher

HA

masakarirally

Testdeploy

hyperconverged

Page 36: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

38

Hyperconverged Servers

● Compute + Storage Nodes

● Local Ceph pool

– Instances

– Volumes

● Ease management

● Small IO latency

● Increased Disk capacity

● Use cases:

– DB and Storage services

Page 37: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

39

Get even more performance

● Hyperconverged servers

– Fixed CPU allocation for protecting IO operations

● Dynamically adjust CPU usage in the setup

– Keeping free resources for IO

– Avoid impact on compute

– Automatic live-migration

watcher

Page 38: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

40

Improve resource availability

● Automatic recovery requestsmasakari

Page 39: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

41

Container orchestration Engines

● Creates clusters for container deployment

● Template based

– Kubernetes, docker-swarm, DCOS

● Integration into ecosystem

– CVMFS, Kerberos, CSI (CephFS)

● Ongoing work:

– Automation (upgrades and healing)

– Availability (Kubernetes multi master)

– Central logging

– Multitenancy

magnum

Page 40: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

42

Here are the links

● https://gitlab.cern.ch/cloud-infrastructure/

– cinder, horizon, ironic, keystone, mistral, neutron and nova

– mistral-workflows

– mistral-radosgw-actions (python-radosgw-admin)

– hzrequestspanel

– cci-scripts

– cci-tools

Page 41: Openstack @ CERN · Introduction CERN Cloud ... - IaaS based on OpenStack “All servers shall be virtual! ... No change in number of staff Follow technological trends – Incorporate

Thank you

43

gitlab.cern.ch/cloud-infrastructure

techblog.web.cern.ch

[email protected]

@josecastroleon