the ibm research compute cloud (rc2): innovation, best practices and lessons learned on delivering a...

38
© 2012 IBM Corporation 11/2012 The IBM Research Compute Cloud (RC 2 ) – Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise Lorraine M. Herger Director Research Integrated Solutions [email protected]

Upload: society-of-women-engineers

Post on 06-May-2015

207 views

Category:

Engineering


5 download

DESCRIPTION

Presented by: Lorraine Herger

TRANSCRIPT

Page 1: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation11/2012

The IBM Research Compute Cloud (RC2) – Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

Lorraine M. Herger

Director

Research Integrated Solutions

[email protected]

Page 2: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation2

Agenda

Setting the Context – IBM Research– Research Integrated Solutions (RIS)– Research IS Innovation

History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today

• Architecture • Monitoring, Management, Statistics, Metrics• Applications

Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges

Page 3: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation3

IBM Research Labs

IBM Research – Openings in 2011

IBM Research – Openings in 2012

IBM Research: Globalization

China

WatsonAlmaden

Austin

Tokyo

Zurich

India

Dublin

Australia

Brazil

Africa Next Gen Public Sector Water & transportation Human Capacity Development

Natural Resources Disaster management Healthcare/Life Sciences

(60% funding from gov’t)

Natural Resources Smarter Devices Human Systems/Events

Industry Solutions

Accessibility

Internet of Things

“Big Data” Analytics

Security

Haifa

Smarter Cities

Services Mobile

Communications

Semiconductors Systems Software &

Services Semiconductors Processors

Analytics Storage Nanotech

Healthcare

Science Nanotech

Materials

Page 4: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation4

ChemistryComputer Science

ElectricalEngineering

Materials Science

Mathematical Science Physics

Service Science

Behavioral Science

IBM Research – Who We Are

BusinessInnovation

TechnologyInnovation

Social Innovation

Demand Innovation

Science & Engineering

Business & Data Management

Social & Cognitive Sciences

Economics & Markets

3,000 engineers, scientists and technical professionals

Pushing the boundaries of science and technology to make the world work better

Helping clients, governments and universities apply scientific breakthroughs to solve challenges in business and society

Page 5: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 International Business Machines Corporation 5

Defining research values provide the compass for an evolving technical

agenda and business model

Defining values:

– “..people from several different disciplines trying to visualize … future potentials..”

– “..intellectual curiosity and the love of knowledge..”

– recognition that the “greatest progress seems to come from almost casual encounters..”

“… the long, curved front halls. With beautiful views on the outside, but with the sense as you walked along that you could not quite see what lay ahead, just around the curve. This surely reflected the adventure of research…”

- Gardiner Tucker, Lab Director, IBM Research 1963 - 1967

IBM Research – Our 2nd Home, Yorktown Heights, NY

Page 6: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation6

2012: Watson Research Computing Environment

30,000 sq ft / 7 data centers / 20 Labs in Yorktown, Hawthorne and Poughkeepsie

2,500 IBM P and X Servers

310 Blade Center Chassis / 3,200 Blade Servers

500+ non Cloud Virtual Servers

>1 petabyte Storage

Research Compute Cloud (RC2) – 2,900 current Virtual Machines– 3.2 petabytes storage– Over 50,000 VMs created and used since Nov 2009

2.5 Mw IT Energy Load

6

Yorktown Hawthorne Poughkeepsie

Page 7: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation7

Mission of Research Integrated Solutions

The worldwide Research Information Services community is a global team of IT professionals which participates in the Research Division mission as an innovation partner.

The worldwide Research Information Services community is a global team of IT professionals which participates in the Research Division mission as an innovation partner.

Provide leading edge Information Services to our users and serve as an agile "Living Laboratory” for Researchers to deploy and demonstrate experimental technology

Drive Research-wide teamwork and a common strategy

Increase IBM value through partnerships– Main Partners: IBM Research, IBM Divisions and Clients– Innovation partner with Research teams on key initiatives– Be recognized as the best of breed IT organization in IBM and outside of IBM

in the industry as a premier leadership IT organization

Research IS

Page 8: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation8

Research Living Lab Partnerships: Delivering Innovation

Innovation + Solution Delivery - Delivering Value to the World

DC Robot

Mobile Measurement Technology

Deep Thunder

8

Page 9: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation9

Agenda

Setting the Context – IBM Research– Research Integrated Solutions – Research IS Innovation

History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today

• Architecture • Monitoring, Management, Statistics, Metrics• Applications

Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges

Page 10: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation10

Cloud computing is a new consumption and delivery model inspired by consumer internet services.

5 key characteristics:

1. On-demand self-service 2. Ubiquitous network access3. Location independent resource

pooling4. Rapid elasticity5. Flexible pricing models

VirtualizationServiceAutomation

UsageTracking

Web 2.0SOA

End User Focused

What is Cloud Computing?

An effective cloud deployment is built on an Integrated Service Management Platform, Dynamic Infrastructure and can be part of an overall data center transformation plan

See What is Cloud Computing for more information,

http://www.ibm.com/cloud-computing/us/en/what-is-cloud-computing.html

Page 11: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation11

Cloud – Our Goals – Innovation and Operational Excellence

2009

Deliver customer value through the Cloud in the areas of Security, Energy Efficiency, and Services Enablement.

Create a Cloud ecosystem through collaboration services, platform scaling technologies, and exploration of cloud aware middleware.

2010

Create a competitive Compute cloud offering and significant value through specialized clouds such as Storage Cloud, Test Cloud, Desktop Cloud and Industry Solution Specific clouds.

2011

Substantially contribute to the IBM Cloud product & service offerings.

Drive end-to-end differentiation through private, hybrid, and industry specific clouds.

Leverage structure aware image lifecycle management, fine-grained security, quality of service optimized Platform as a Service.

Optimally manage virtualized environments on the cloud.

Leverage High Scale Low Touch Cloud Operating Environment and single view of the hybrid cloud.

Include innovative Research contributions in the Common Cloud Management Platform.

Page 12: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

Research Compute Cloud - Objectives

12

Goal: Create a Research cross strategy initiative to establish an environment for innovation in Cloud Computing. Harness the Research ‘living lab’ for high growth areas. Harvest Cloud Computing technology for client facing opportunities.

Approach: Use Virtualization and ‘cloud’ as a technology enabler to deliver a Worldwide Research Computing Service as a means to unite I/T process and rapid technology delivery. RC2 defined as IBM Research intersection point.

Collaboration: Overlay on Research Computing Cloud leveraging existing major initiatives. Focusing on client-driven scenarios, and close partnership with IBM Services, software and systems to maximize IBM integrated value proposition. Intersect IBM Cloud Roadmap.

Value: Deliver greater a more effective and efficient set of I/T capabilities in support of Research and IBM priorities. Showcase via real world application of cloud computing and virtualization technologies.

Future Capabilities: Datacenter Optimization & Management (Ensembles), Catalog-based I/T Services, Virtual Image management, Security Zones, Workload Optimization

Greater IT Capabilities

VirtualWeb

Server

VirtualWeb

Server

WebServer

VirtualDatabase

Server

DatabaseServer

AppServer

VirtualApp

Server

AppServer

Virtual resources reduce cost and fulfill On-Demand promise − Adjustable capacity− Movable across Research Labs− Rapid Provisioning− Collaborative shared Images− Standard Software Stacks− Dense Asset utilization− Simplified Systems Management

VirtualSystems

VirtualDisksVirtual

LANs

Watson

Australia

Haifa

Page 13: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

13 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Four axis of Innovation Strategy for Research Cloud

1. Deliver Effective and Efficient I/T

low high

low

high

IT Cost

Value

target

Virtual MachinesStorage CloudFast provisioningImage SearchIncreased High Availability

2. Function as Cloud Big Bet Living Lab Execution based Innovation Best practices & Skills ( IBM SW, Cloud SW) Client facing Resource Rapid tech. adoption amidst ‘quirks’

Research Cloud

Software; Hardware; Services

4. WorldClass Research Organization3. Lead Cloud Innovation for IBM

Tivoli Product Enterprise ready; Desktop Cloud ( Security; Optimization; On Boarding) Platform as a service Workload placement & migration Storage Cloud ( GPFS ++ ) Image Analytics and construction Patch Management

Client facing testimonials IBM Technology leadership Internal / External Cloud Standards participation Papers and Patents Internal Recognition and References

Page 14: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

Cloud – The Journey to Implementation

Page 15: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

RC2 Component Diagram

Watson A

Watson B

RC2 Virtual datacenters

RC2 Business Support ServicesUsers

Cloud Team

UseCompute/Storage

RequestPortal

ManageSupport

Page 16: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

RC2 Infrastructure Diagram

Page 17: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation17

Research Compute Environment: Transformation Journey

20042003 2006

Offices

Labs

Closets

Decentralized environments

Centralized model• Two major location• Pool resources• Share infrastructure• Provide resource

flexibility• Leveraging Skill sets• Achieve economies

of scale• Centralized capital

planning

• Virtualized resource• Optimize power and

cooling• Dynamic resource

allocation• Standardize• Consolidated more

than 3500 sq ft• Manage 6000+ assets

2009-2012

Research Lab Systems Engineering phase I

Research Lab Systems Engineering phase II

Research Compute CloudIT Consolidation: 2 New Datacenters

Watson Pok

Page 18: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

IBM Research IT – Operational Pains – How Cloud Helps

Takes too long to create application infrastructures– The average lead time to get a new application environment up and running is 4-6 weeks!

• Approvals, procurement, shipment, HW installation, license procurement, OS installation, application installation, configuration

Cloud images can be created in a matter of minutes

Creating middleware infrastructures is a manual and error-prone process– 30% of bugs are introduced by inconsistent configurations

• These types of bugs are often the most difficult to detect• These bugs typically only emerge when moving between dev/test, QA, production

Cloud instances are built from pre-tested images, reducing infrastructure errors

Poorly utilized resources driving up hardware & labor costs– Because it’s so expensive to set up an environment, there is an incentive to hold onto hw/sw resources ->

just in case it may be needed at a later date– Future environments = new hardware, rather than recycling returned hardware

Cloud resources are always available, whenever a project requires resources, so no need to ‘hoard’ machines

18

Page 19: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

19 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Current Status of Research Compute Cloud

Largest self service cloud in IBM

Solution for Research projects needing IT resources

Sunset 820 Physical Machines as part of Legacy Update

Dynamic usage patterns Real time monitoring and

alerts Chargeback policies shape

usage patterns Significant contributions to

IBM cloud product offerings Cost efficiencies realized

(graph)

Page 20: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

20

Next Steps: RC2 Research Lab(s) Federated Services

IBM SCP/HSLT 1.2IaaS

(with RC2 Innovation)

Watson A

Watson B

Australia

Haifa

RC2 Global Pods

RC2 Central Cloud Services Users

LocalAdmins

Cloud Team

RequestUnified Portal

Local Compute/Storage

ManageSupport

Consume & Contribute

ManageSupport

India

Page 21: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

21 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Monitoring, Management – Key to a reliable solution

Continuous monitoring of all infrastructure components

Track Availability

System Performance (real time and historical)

Capacity Tracking – foundation for analysis and forecasting

Events and Outage Tracking (maintenance windows, outages, failures)

Monitoring Services

Backup

Alerting

Page 22: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

22 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

RC2 Health Dashboard

Page 23: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation23

RC2 Metrics Journey

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2012

Launched RC2 HealthDashboard

Launched RC2 HealthDashboard

E2E Probes

and Alerts

E2E Probes

and Alerts

UI Availability

Metrics

UI Availability

Metrics

Instance Availability

Metrics

Instance Availability

MetricsCapacity Metrics

Capacity Metrics

Usage Delta

Reports

Usage Delta

Reports

Health Dashboard

•Reactive•Minimal monitoring

•No alerting

•Reactive•Minimal monitoring

•No alerting

•Proactive•Reduced Problems

• Improved Availability

•Proactive•Reduced Problems

• Improved Availability

Page 24: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation2424

RC2 Metrics Journey2012

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2012

Smart Metrics

Launched Smart

Metrics

Launched Smart

Metrics

Service Asset

Metrics Updates

Service Asset

Metrics Updates

Improved Problem and User Request

Categories

Improved Problem and User Request

Categories

Problem Category Trending

Problem Category Trending

Launch of Maximo (Tivoli

ticketing solution)

Launch of Maximo (Tivoli

ticketing solution)

•Inconsistent Methodologies

•Minimal metrics• No Trending

•Inconsistent Methodologies

•Minimal metrics• No Trending

•Consistency•Strategic Solutions

• Measures that Matter

•Consistency•Strategic Solutions

• Measures that Matter

Page 25: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

25 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Compliance Automation: Automated Windows License Tracking

Daily check Available Licenses vs. Windows instances

High water mark triggers ordering process

Real time reports

Page 26: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

26 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Capacity Tracking

Page 27: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

27 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Event and Outage Tracking

Capture every outage, failure, service or maintenance window Atom feed for subscription Reports Integrated with Availability and Performance monitoring Service ticket correlation

Page 28: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

28 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

Usage Statistics

Page 29: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

29 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation

RC2 Chargeback

An experimental chargeback service to help recover costs, and to encourage efficient use of cloud resources.

Page 30: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

Why is automating image management important?

30

Virtualization + Cloud Automation + Scale-out Applications = Virtual Server Explosion

Linear scaling of maintenance cost is not good enoughPandora’s box problem: rate of growth of virtual servers >> rate of growth of IT budget

Image Management

Page 31: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

IBM Virtual Image Library

• Centralizes storage of reference images

• Index image content

• Check-in/out supports distributed environments

• Version numbering

• Search and compare

• Deep analytics on content

IBM’s Common Cloud Stack

IBMWorkloadDeployer

Image Management

Page 32: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

RC2 Cloud Applications

Rational

Team Concert

IBM Systems

Director

DevOps

Rational

Application

DeveloperVLSI

IBM Power9

Development

IBM Systems

SoftwareCrunch Day

Page 33: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation33

One Time

Recurring

Order & Approval Receiving/Delivery Registration (MAD, eAMT)

Install & Configure

Security Scan & Findings

Patch -> Rescan KCO Selection Physical Audit

Compliance Management Audit

Re-Image

P | V P | V P | V

P | VP | V

P | V

P | VP | V P | V

Physical Machine | Virtual MachineHours

RC2 Efficiencies over Physical Machine Management

•One time 65%•Recurring 49%

Measuring Our Progress

Page 34: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation34

Agenda

Setting the Context – IBM Research– Research Integrated Solutions – Research IS Innovation

History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today

• Architecture • Monitoring, Management, Statistics, Metrics• Applications

Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges

Page 35: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

Cloud Lessons: Technical Cloud management solutions are complex

Do not assume these are easy to implement, unless using pre-packaged appliances.

Networks are the enabler as well as the inhibitor ! Early verification of network viability for Cloud services delivery is vital, especially when

Cloud spans beyond the data centre.

Build to the lowest denominator Build the basics starting with IaaS capability and move up the stack to PaaS and SaaS

offerings.

Don’t forget the development and test environments Testing must be done, as for any new implementation

A top-down approach (green field infrastructure) will achieve greater benefits Higher levels of service standardisation possible by designing from top-down. Avoids “legacy” infrastructure & processes which may constrain the “purity” of Cloud services.

Take a workload based approach Understanding workloads and how these map to the cloud is key to a successful implementation

Development & Test workloads are ‘low hanging fruit’ 50-60% of IT spend covers non-production systems, which suffer from low utilisation, high cost and

many cycles of deployment. These attributes align extremely well with cloud.

New aspects of Cloud do need to be carefully planned (Cloud) Service definition, quality of service, evolution of the service, service catalogue, and service

life cycle need to be well defined and designed. Clarity in use-cases, service catalogue and non-functional requirements fundamental to success.

35

Page 36: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation36

Changing The Way People Work Takes Time and Patience!Centralize the IT capital management process

• moved from department / projects control to IT control• fair & rapid exception process

•Centralize the IT staff

• system administrators under a single group

•Education• brown bag lunch series• IT web / wiki pages explaining rationale• Department Outreach (Single Points of Contact for communication)

•Development Team Leadership• Early adopters self-identify, sign up early, help create the environment, become• advocates• Commit to / execute process • Continuous socialization of successes• Rapid response to problems / requests for help

•Issue Kevlar Uniforms to Cloud Team• They are learning too, provide them cover!

Cloud Lessons: Cultural

Page 37: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation

Fundamental Security Challenges

What is unique about cloud computing security?– Loss of physical ownership – “technological, cultural and psychological issue”

• Redefines boundaries of IT infrastructure, redefines “insider attacks”– Scale; many VMs, few system administrators; mis-configuration– Complexity of reasoning and optimization: multiple layers & constraints

• Complexity implies the need for a framework to manage security– Data loss risk: concentration of computing and data magnifies risk

Mission critical workloads and sensitive business data will not migrate to the cloud, unless customers are convinced that the cloud offers security and compliance guarantees that are equivalent or better than what they can provide with physical systems.

Security and in particular, authentication, access control, isolation management, integrity management and image management are key enabling technologies for cloud computing.

Research Topic

Page 38: The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise

© 2012 IBM Corporation38

For More Information

IBM Research - http://www.research.ibm.com

Lorraine M. Herger

Director

Research Integrated Solutions

[email protected]