chep2012

30
Middleware Evolution: from Grids to Clouds, a non- HEP Perspective. Dr. Sebastien Goasguen Clemson University

Upload: sebastien-goasguen

Post on 11-May-2015

589 views

Category:

Technology


1 download

DESCRIPTION

Plenary talk from CHEP 2012 conference.

TRANSCRIPT

Page 1: Chep2012

Middleware Evolution:

from Grids to Clouds, a non-HEP

Perspective.Dr. Sebastien Goasguen

Clemson University

Page 2: Chep2012

A non-exhaustive Middleware timeline...

1999

Purdue University Network Computing Hubs (PUNCH)Kapadia, Fortes, Lundstrom, Figueiredo et al.

Powered nanoHUBVirtual file systemShadow Accounts

Access to batch queuesInteractive applications via VNC

Page 3: Chep2012

A Middleware timeline...

1999

Purdue University Network Computing Hubs (PUNCH)Powered nanoHUBVirtual file systemShadow Accounts

Access to batch queuesInteractive applications via VNC

2001

The Grid

Page 4: Chep2012

The Grid

Page 5: Chep2012

“Anatomy of the Grid”• “Why do we also consider application programming interfaces

(APIs) and software development kits (SDKs)? There is, of course, more to VOs than interoperability, protocols, and services. Developers must be able to develop sophisticated applications in complex and dynamic execution environments. Users must be able to operate these applications. Application robustness, correctness, development costs, and maintenance costs are all important concerns. Standard abstractions, APIs, and SDKs can accelerate code development, enable code sharing, and enhance application portability. APIs and SDKs are an adjunct to, not an alternative to, protocols.”

• “In summary, our approach to Grid architecture emphasizes the identification and definition of protocols and services, first, and APIs and SDKs, second.”

• “The anatomy of the grid” Foster, Kesselman, Tuecke, published in 2001

Page 6: Chep2012

A Middleware timeline...

1999

PUNCHPowered nanoHUBVirtual file systemShadow Accounts

Access to batch queuesInteractive applications via VNC

2001

The Grid

InVIGOVirtual file systemVirtual machinesOverlay Networks

Access to batch queuesInteractive applications via VNC

2003

Page 7: Chep2012

InVIGO• Fortes and

Figueiredo, circa 2004/2005

• Virtual machines, virtual file system, virtual networks

• In 2012: Only ViNE, IPOP remains...maybe because it was created as a single system rather than a composition of services with multiple providers.

Page 8: Chep2012

Virtualization•Create an isolated and portable

execution environment which:

•Guarantees execution

•Isolates users

•Hides WAN complexities

•Delivers the data where it is needed

•Deploys application on-demand

Page 9: Chep2012

A Middleware timeline...

1999

PUNCH

2001

The Grid

InVIGO

20032004

Dynamic Virtual EnvironmentsKate Keahey, mother of Nimbus

Page 10: Chep2012

A Middleware timeline...

1999

PUNCH

2001

The Grid

InVIGO

20032004

DVE

2008-2012

Eucalyptus, UCSBNimbus, UC

Opennebula, MadridOpenstackCloudstack

Page 11: Chep2012

A Middleware timeline...

1999

PUNCH

2001

The Grid

InVIGO

20032004

DVE

2008-2012

Eucalyptus, UCSBNimbus, UC

Opennebula, MadridOpenstackCloudstack

Virtual Organization Clusters

Clusters of Virtual Machines provisioned on the grid to create a personal condor

cluster.

Page 12: Chep2012

A Middleware timeline...

1999

PUNCH

2001

The Grid

InVIGO

20032004

DVE

2008-2012

Eucalyptus, UCSBNimbus, UC

Opennebula, MadridOpenstackCloudstack

Virtual Organization Clusters

Clusters of Virtual Machines provisioned on the grid to create a personal condor

cluster.

Page 13: Chep2012

Google trends

•Cloud computing trending down, while “Big Data” is booming. Virtualization remains “constant”.

VOCs

Page 14: Chep2012

Careful, Head Winds Ahead

•Cloud Computing Going down to the “through of Disillusionment”

• “Big Data” on the Technology Trigger

Page 15: Chep2012

Clouds are in Production

• Amazon Web Services (AWS), reported to reach $1B business in 2012.

• http://www.geekwire.com/2011/amazon-web-services-billiondollar-business/

• Zynga, reportedly spent $100M per year on AWS. Moved to their own cloud (zcloud). Used to deploy on EC2 and do reverse cloud bursting. Now “owning the base and renting the peak”. Zynga can add as many as 1,000 new servers to accommodate a surge of users in a 24-hour period. The company’s servers can deliver a petabyte of data to users each day.

• http://www.wired.com/cloudline/2012/03/zynga-zcloud/

Page 16: Chep2012

Proven scalability• Cyclecloud provisioned a 50,000 cores on EC2 (April 2012).

• LXCLOUD@CERN demonstrated management of 16,000 virtual machines using Opennebula (Summer 2010).

• Cloudstack (now an Apache incubator project) planning scalability to 50,000 hypervisors by the end of 2012.

• Hadoop scales to 30 PB (at Facebook, ~March 2011)

• Q1 2012, Amazon S3 was 905 Billion objects, routinely accessed at 650,000 requests per second.

Page 17: Chep2012

Yet,We do not seem to

embrace it•50 hosts in LXCLOUD running VMs for batch processing

•FermiCloud, for internal use only, ~200 VMs

•Pales in comparison to the scale seen in industry

Thanks to Ulrich Schwickerath and Steve Timm for figuresDisclaimer: Opinion is not theirs :)

Page 18: Chep2012

A clue from Industry•KPMG Survey, “Clarity in the Cloud:

Business Adoption”

Text “I don’t believe everyone yet fully realizes how much this stimulates innovation, how many opportunities will be presented, how many new challenges will need to be addressed, and how much

change is coming”, Pat Howard, VP Global Services,

IBM Partner

Page 19: Chep2012

SaaS• 1.8 PB transferred in last 6 months

•GO on ESnet: 643 TBs in last 6 months, 25 sites exceeded 3 Gbps.

•GO for XSEDE: 607 TBs transferred, 4 sites exceeded 3 Gbps.

• Leveraged Cloud APIs to provide a new service

Text

Thanks to Raj Kettimuthu and Lee Liming for the data

Page 20: Chep2012

PaaS• Azure

• Amazon Bean Stalk

•Heroku, PaaS for facebook applications

•Openshift, now open source (May 2012)

PaaS have not really seen much success in the scientific community,

but could be used to create new types of scalable applications

Page 21: Chep2012

A PaaS for personal Condor

and GO •Globus Provision by Borja Sotomayor

•http://www.globus.org/provision/[general]deploy: ec2domains: simple

[domain-simple]users: gp-usergridftp: yesnis: yesfilesystem: nfscondor: yescondor-nodes: 4go-endpoint: go-user#gp-testgo-auth: go

[ec2]keypair: gp-keykeyfile: ~/.ec2/gp-key.pemusername: ubuntuami: latest-32bitinstance-type: t1.micro

[globusonline]ssh-key: ~/.ssh/id_rsa

This solves my problem from 1999

when I wanted a batch farm at hand...utility computing

Page 22: Chep2012

IaaS on OSG at Clemson• Transform the Grid

into a Cloud

• All sites deploy an hypervisor

• Provision VMs depending on jobs queued in batch

• Keep normal grid workflow

• Move to Cloud by offering an “EC2” interface

VTDC08, PDP09, CCGRID09, ICAC10, JGC 10,FGCS10

Page 23: Chep2012

First Gen IaaS@CU• Campus firewall

• NATed cluster

• Fear or VMs: no bridge network, no NAT even, only userland net on the hypervisors

• Meant:

• Developed a pull base task dispatcher (Kestrel, XMPP based used by STAR)

• Created image in DMZ (See Tony Cass’s talk for VM exchange, build trust in VM provenance, HEPiX wg on virtualization)

• Started VMs as regular batch jobs

• No interactive access

Page 24: Chep2012

STAR with Kestrel• http://wiki.github.com/

legastero/Kestrel/

• Built to deal with Clemson’s “adverse” networking environment

• Started as a student project based on the idea that XMPP ( Jabber ) was a scalable, production proven messaging protocol.

• Run IM client in VM and send IM messages to manage jobs

• All VM instances are buddies in a Jabber server

Lots of french non-sense and then: “...To simulate the equivalent sample of 12.2 Billion Monte-Carlo events with ~ 10 Million accepted by event triggering after full event reconstruction, we would have taken 3 years at BNL on 50 machines This Monte-Carlo event generation would essentially not have been done. With the resources from cloud, we took 3-4 weeks.”

–Jerome Lauret BNL, STAR

Page 25: Chep2012

Second Gen IaaS@CU

•Sub-interface created on all nodes (only one NIC per node)

•VLAN provisioned to isolate VM traffic

•Bridge networking enabled, VM get address via DHCP

•Demonstrated thousands VMs scale.

•Opennebula provisioning + Cumulus S3 storage to upload images.

Page 26: Chep2012

Onecloud•Opennebula based IaaS

at Clemson developed through the NSF EXTENCI project (OSG+XSEDE)

•Used by STAR (see ACAT 2010, CHEP 2010)

•CERNVM can be used as a client or a batch image.

• https://sites.google.com/site/cuonecloud/

Page 27: Chep2012

Cloud back to Networking: Openflow• Onecloud, integrates Openflow to provide dynamic network services to

solve NAT and firewall issues. Developed an implementation of Amazon Security Groups and Elastic IP using openflow. Avoids the use of complex and failure prone Network Overlays, once rule are set, switch operates at line rate.

• Software Defined Networking, aims at bringing the control plane of the networking in the hands of the developers. Opens the door for network aware applications, dynamic network topologies according to load. A low level API for the network, we can now program the network

• Google announced that their network ran using Openflow at the Open Networking Summit (http://opennetsummit.org/):

• http://www.eetimes.com/electronics-news/4371179/Google-describes-its-OpenFlow-network

• There is work to support MPLS with Openflow, so that any Openflow switch could be used as an MPLS switch. This means that OSCARS could be used to provision circuits on an openflow network.

Page 28: Chep2012

3rd Gen Iaas@CU• Move OneCloud in the “ESNet” Science DMZ

• Deploy PaaS with OpenShift

• 100Gbps link

• 10Gbps to Amazon via I2 Commercial peering service

• Fully configurable via Openflow and maybe OSCARS ....

• Provide on-demand resources and on-demand data paths.

See: ARCHSTONE + VNODDOE ASCR funded project

Dimitrios Katramatos (BNL)

Page 29: Chep2012

Conclusions• Virtualization has matured to the point of seeing

fruitful competition in Cloud IaaS solutions (both academically and in industry)

• Cloud (and APIs) give us great agility to create new services to serve the community. Reduce “time to use” of these services and sustain scale.

• Clouds probably fulfilling the true vision of Grids

• Advanced VM provisioning and network services mean that on-demand, elastic data centers are possible today.

• This work was possible through support from NSF OCI-0753335, OCI-1007115 and BMW

Page 30: Chep2012

Questions ?

[email protected]

•http://sites.google.com/site/runseb