case study: university alabama-birmingham

26
Case Study: The University of Alabama at Birmingham OpenStack , Ceph, Dell Kamesh Pemmaraju, Dell John-Paul Robinson, UAB OpenStack Summit 2014 Atlanta, GA

Upload: redhatstorage

Post on 25-May-2015

240 views

Category:

Technology


2 download

DESCRIPTION

The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This session will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment. Vide presentation: http://bit.ly/1o85YKV

TRANSCRIPT

Page 1: Case Study: University Alabama-Birmingham

Case Study: The University of Alabama at BirminghamOpenStack , Ceph, Dell

Kamesh Pemmaraju, DellJohn-Paul Robinson, UABOpenStack Summit 2014

Atlanta, GA

Page 2: Case Study: University Alabama-Birmingham

An overview

• Dell – UAB backgrounder• What we were doing before• How the implementation went• What we’ve been doing since• Where we’re headed

Page 3: Case Study: University Alabama-Birmingham

Dell – UAB background• 900 researchers working on Cancer and Genomic Projects.• Their growing data sets challenged available resources– Research data distributed across laptops, USB drives, local

servers, HPC clusters– Transferring datasets to HPC clusters took too much time and

clogged shared networks– Distributed data management reduced researcher productivity

and put data at risk• They therefore needed a centralized data repository for

Researchers in order to insure compliances concerning retention of data.

• They also wanted scale-out cost-effective solution and hardware that could be re-purposed for compute & storage

Page 4: Case Study: University Alabama-Birmingham

Dell – UAB background (contd..)

• Potential solutions investigated– Traditional SAN– Public cloud storage– Hadoop

UAB chose Dell/Inktank to architect a platform that would be very scalable and provide lost costs per GB and was the best of all worlds that provide compute and storage on the same hardware.

Page 5: Case Study: University Alabama-Birmingham

A little background…

• We didn’t get here overnight• 2000s-era High Performance Computing• ROCKS-based compute cluster• The Grid and proto-clouds• GridWay Meta-scheduler• OpenNebula an early entrant that connected grids

with this thing called the cloud• Virtualization through-and-through• DevOps is US

Page 6: Case Study: University Alabama-Birmingham

Challenges and Drivers

• Technology• Many hypervisors• Many clouds• We have the technology…can we rebuild it here?

• Applications• Researcher started shouting “Data”!

NextGen SequencingResearch Data RepositoriesHadoop

• Researcher kept on shouting “Compute”!

Page 7: Case Study: University Alabama-Birmingham

Data Intensive Scientific Computing

• We knew we needed storage and computing• We knew we wanted to tie it together with an HPC

commodity scale-out philosophy• So August 2012 we bought 10 Dell 720xd servers

• 16-core• 96GB RAM• 36TB Disk

• A 192-core, ~1TB RAM, 360TB expansion to our HPC fabric

• Now to integrate it…

Page 8: Case Study: University Alabama-Birmingham

December 2012

• Bob said:Hearing good things about open stack and ceph at this week at dell world.Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today.He is also chair of company that supports He also spoke highly of dell crowbar deployment tool.

I

Page 9: Case Study: University Alabama-Birmingham

December 2012

• Bob said:Hearing good things about open stack and ceph at this week at dell world.Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today.He is also chair of company that supports He also spoke highly of dell crowbar deployment tool.

• I said:Good to hear. I've been thinking a lot about dell in this picture too. We have the building blocks in place. Might be a good way to speed the construction.

Page 10: Case Study: University Alabama-Birmingham

Lesson 1:

Recognize when a partnership will help you achieve your goals.

Page 11: Case Study: University Alabama-Birmingham

The 2013 Implementation

• The Timeline• In January we started our discussions with Dell and

Inktank• By March we had committed to the fabric• A week in April and we had our own cloud in place

• The Experience• Vendors committed to their product• Direct engagement through open communities• Bright people who share your development ethic

Page 12: Case Study: University Alabama-Birmingham

Next Step…Build Adoption

• Defined a new storage product based on the commodity scale-out fabric

• Able to focus on strengths of Ceph to aggregate storage across servers• Provision any sized image to provide Flexible Block

Storage

• Promote cloud adoption within IT and across the research community

• Demonstrate utility with applications

Page 13: Case Study: University Alabama-Birmingham

Applications• Crashplan Backup in the cloud

• A couple hours to provision the VM resources• An easy half-day deploy with the vendor because we controlled our resources

a.k.a. firewall• Add storage containers on the fly as we grow…10TB in few clicks

• Gitlab hosting• Start a VM spec’d according to project site• Work with Omnibus install. Hey it uses Chef!

• Research Storage• 1TB storage containers for cluster users• Uses Ceph RBD images and NFS• The storage infrastructure part was easy • Scaled provisioning, 100+ user containers (100TB) created in about 5 minutes.• Add storage servers as existing ones fill

Page 14: Case Study: University Alabama-Birmingham

Ceph Rebalances as Storage Grows :)

Page 15: Case Study: University Alabama-Birmingham

Lesson 2:

Use it! That’s what it’s for!

Page 16: Case Study: University Alabama-Birmingham

Lesson 2:

Use it! That’s what it’s for!

The sooner you start using the cloud the sooner you start thinking like the cloud.

Page 17: Case Study: University Alabama-Birmingham

How PoC Decisions Age Over Time

• Pick the environment you want when you are in operation…you’ll be there before you know it

• Simple networking is good• But don’t go basic unless you are able to reinstall the fabric• Class B ranges to match the campus fabric• We chose a split admin range to coordinate with our HPC admin range• We chose a collapsed admin/storage network due to a single switch…

probably would have been better to keep separate and allow growth• It’s OK to add non-provisioned interfacing nodes…know your net

• Avoid painting yourself in corner• Don’t let the Paranoid Folk box-in your deployment• An inaccessible fabric is an unusable fabric

• Fixed IP range mismatch with “fake” reservations

Page 18: Case Study: University Alabama-Birmingham

Lesson 3:

The fabric is flexible. Let it help you solve your problems

Page 19: Case Study: University Alabama-Birmingham

Problems will Arise

• The release version of the ixgbe driver in Ubuntu 12.04.1 kernel didn’t perform well with our 10Gbit cards

• Open source has an upstream• Use it as part of debug network• Upgrading the drivers was a simple fix

• Sometimes when you fix something you break something else

• There are still a lot of moving parts but each has a strong open source community

• Work methodically• You will learn as you go • Recognize the stack is integrated and respect tool boundaries

Page 20: Case Study: University Alabama-Birmingham

Sometimes a Problem is just a Problem

• Code ex

Page 21: Case Study: University Alabama-Birmingham

Lesson 4:

The code *is* the documentation

Page 22: Case Study: University Alabama-Birmingham

Lesson 4:

The code *is* the documentation…and that’s a *good* thing

Page 23: Case Study: University Alabama-Birmingham

Where we are today

• OpenStack plus Ceph are here to stay for our Research Computing System

• They give us the flexibility we need for an ever expanding research applications portfolio• Move our UAB Galaxy NextGen Sequencing platform to

our Cloud• Add Object Storage services• Put the cloud in the hands of researchers

• The big question…

Page 24: Case Study: University Alabama-Birmingham

…how far can we take it?

• The goal of process automation is scale• Incompatible, non-repeatable, manual processes are

a cost• Success is in dual-use

• Satisfy your needs and customer demand• Automating process implies documenting process…great for

compliance and repeatability• Recognize the latent talent in your staff today’s system admins

are tomorrows systems developers

• Traditional infrastructure models are ripe for replacement

Page 25: Case Study: University Alabama-Birmingham

Lesson 5?

You can we learn from research and engage as a partner