multi-cell openstack: how to evolve your cloud to scale - november, 2014
TRANSCRIPT
Multi-Cell Openstack: How to Evolve your Cloud to Scale ● Belmiro Moreira - CERN● Matt Van Winkle - Rackspace● Sam Morrison - NeCTAR, University of
Melbourne
Cells: How we use them at NeCTAR
NeCTAR Research Cloud
● Started in 2011● Funded by the Australian Government● 8 institutions around the country● Production early 2012 - Openstack Diablo● All federated to appear as 1 cloud from the
users point of view● Put the compute near the data and tools● 5000+ users
NeCTAR Sites● University of Melbourne● National Computation Infrastructure● Monash University● Queensland CyberInfrastructure Foundation● eResearch SA● University of Tasmania● Intersect, NSW● iVEC, WA
Cells to build a Federation
● Use cells to federate geographically separated sites
● Different hardware/networks/people● Parent cell run centrally at unimelb along
with keystone/cinder/glance etc (no neutron)● Each site has 1 or more compute cells● These roughly match up to availability zones from a
users perspective (cells are behind the scenes)
How big?
● Each site ~4000 cores, ~150 hypervisors● 6 sites in production, 4600+ instances● Last 2 sites in prod by end of year● ~1000 hypervisors, 40k cores● ~10 compute cells● Some sites have multiple datacenters so
have multiple cells
Pain points
● Cell scheduling isn’t smart● Broadcast calls rely on all cells to be alive● Not many people to share experiences with● Upgrades, although havana → icehouse
could happen in stages. Much easier!
Things we’ve added, not in trunk (yet)● Security group syncing● ec2 id mappings (needed for metadata)● Availability zone / aggregate support● Flavour management
*We assume a cell only has 1 parent
Cells: How we use them at CERN
Belmiro Moreiraemail: belmiro.moreira @ cern.ch@belmiromoreira
CERN
● Conseil Européen pour la Recherche Nucléaire – aka European Organization for Nuclear Research
● Founded in 1954 with an international treaty○ 21 state members, other countries contribute to experiments○ Situated between Geneva and the Jura Mountains, straddling the Swiss-
French border
● CERN mission is to do fundamental research● CERN provides particle accelerators and other infrastructure
for high-energy physics research
CERN - Cloud Infrastructure● In production since July 2013● Performed two upgrades: Grizzly -> Havana -> Icehouse
○ Currently running: nova; glance; keystone; horizon; cinder w/ Ceph; ceilometer
● RDO distribution on SLC6; pip with Windows Server 2012 R2● 2 geographically separated data centres
○ Geneva (Switzerland) and Budapest (Hungary)
● Numbers○ ~3000 compute nodes (75k cores; 140TB RAM)
■ ~2900 kvm; ~100 Hyper-V;○ ~8000 virtual machines
CERN - Cloud Infrastructure - Cells● Why we use cells?
○ Scale transparently between different Data Centres
○ Availability and Resilience
○ Isolate different use-cases
● Today: 1 api Cell and 8 compute Cells○ 2 level tree○ size range between 100 to ~1600 compute nodes
○ 6 Compute Cells in Switzerland; 2 Compute Cells in Hungary
● “Shared” and “Private” Cells○ 3 availability zones available in “Shared” Cells
CERN - Cells Limitations● Missing functionality
○ Security Groups○ Flavor propagation (api -> compute)○ Manage aggregates on api Cell○ Server groups
● Cell scheduler● Ceilometer integration
CERN - Cells Challenges● More ~74000 cores by beginning 2015
○ How to organize and distribute nodes between different cells?● Split current large cells into a small number (~200) of
compute nodes○ Expected to have +30 cells by end 2015○ How to manage a large number of Cells?
Created by: Matt Van Winkle @mvanwinkModified Date: 10/29/2014
Cells at Rackspace
Cells: How to Evolve Your Cloud to Scale
• Managed Cloud company offering a suite of dedicated and cloud hosting products
• Founded in 1998 in San Antonio, TX
• Home of Fanatical Support
• More than 200,000 customers in 120 countries
Rackspace
www.rackspace.com
• In production since August 2012– Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder
• Regular upgrades from trunk– Package built on trunk pull from 10/21 in testing now
• Compute nodes are Debian based– Run as VMs on hypervisors and manage via XAPI
• 6 Geographic regions around the globe– DFW; ORD; IAD; LON; SYD; HKG
• Numbers– 10’s of 1000’s of hypervisors (over 330K Cores, 1+ Petabyte of RAM)
• All XenServer
– Over 150,000 virtual machines
Rackspace – Cloud Infrastructure
www.rackspace.com
• Why we use cells?– Manage Multiple Flavor Classes– Network resources (Public IPs, Private IPs, aggregation routers, etc)– Network Constraints– Continual Supply Chain
• 1 Global API cell per region with multiple Compute cells (3 – 35+)– 2 level tree– Size between ~100 and ~600 hosts per cell
• Control infrastructure exists as instances in small OpenStack deployment• All cells available to all tenants
– Tested “dedicated” cells for potential large customers
Rackspace – Cloud Infrastructure - Cells
www.rackspace.com
• Missing Functionality– Security Groups– Host aggregates
• Scheduler – No “disable”– Incomplete host statuses
• Other services are not cell aware– Neutron is a prime example
Rackspace – Cells Limitations
www.rackspace.com
• Increasing number of flavor classes– Different Hardware specs per class– Sizing varies by average VM density
• Multiple vendor sources– Subtle hardware differences in same specs across different vendors
• Scaling global services with cell growth– Still don’t have the perfect ratios
Rackspace – Cells Challenges
www.rackspace.com
• Nova Dev team met this morning to discuss cells in a few sessions:– Cells – Wednesday, November 5, 09:00– Cells continued – Wednesday, November 5, 09:50
• Areas of discussion– Feature completion– No-op/single cell as default– Cell awareness in APIs
• Recap from sessions
Cells Feature Completion
www.rackspace.com
Thank You!
● Belmiro Moreira - CERN - [email protected]● Matt Van Winkle - Rackspace - @mvanwink● Sam Morrison - NeCTAR, University of Melbourne - sam.morrison@unimelb.
edu.au
Questions?
www.rackspace.com