chep2012
DESCRIPTION
Plenary talk from CHEP 2012 conference.TRANSCRIPT
Middleware Evolution:
from Grids to Clouds, a non-HEP
Perspective.Dr. Sebastien Goasguen
Clemson University
A non-exhaustive Middleware timeline...
1999
Purdue University Network Computing Hubs (PUNCH)Kapadia, Fortes, Lundstrom, Figueiredo et al.
Powered nanoHUBVirtual file systemShadow Accounts
Access to batch queuesInteractive applications via VNC
A Middleware timeline...
1999
Purdue University Network Computing Hubs (PUNCH)Powered nanoHUBVirtual file systemShadow Accounts
Access to batch queuesInteractive applications via VNC
2001
The Grid
The Grid
“Anatomy of the Grid”• “Why do we also consider application programming interfaces
(APIs) and software development kits (SDKs)? There is, of course, more to VOs than interoperability, protocols, and services. Developers must be able to develop sophisticated applications in complex and dynamic execution environments. Users must be able to operate these applications. Application robustness, correctness, development costs, and maintenance costs are all important concerns. Standard abstractions, APIs, and SDKs can accelerate code development, enable code sharing, and enhance application portability. APIs and SDKs are an adjunct to, not an alternative to, protocols.”
• “In summary, our approach to Grid architecture emphasizes the identification and definition of protocols and services, first, and APIs and SDKs, second.”
• “The anatomy of the grid” Foster, Kesselman, Tuecke, published in 2001
A Middleware timeline...
1999
PUNCHPowered nanoHUBVirtual file systemShadow Accounts
Access to batch queuesInteractive applications via VNC
2001
The Grid
InVIGOVirtual file systemVirtual machinesOverlay Networks
Access to batch queuesInteractive applications via VNC
2003
InVIGO• Fortes and
Figueiredo, circa 2004/2005
• Virtual machines, virtual file system, virtual networks
• In 2012: Only ViNE, IPOP remains...maybe because it was created as a single system rather than a composition of services with multiple providers.
Virtualization•Create an isolated and portable
execution environment which:
•Guarantees execution
•Isolates users
•Hides WAN complexities
•Delivers the data where it is needed
•Deploys application on-demand
A Middleware timeline...
1999
PUNCH
2001
The Grid
InVIGO
20032004
Dynamic Virtual EnvironmentsKate Keahey, mother of Nimbus
A Middleware timeline...
1999
PUNCH
2001
The Grid
InVIGO
20032004
DVE
2008-2012
Eucalyptus, UCSBNimbus, UC
Opennebula, MadridOpenstackCloudstack
A Middleware timeline...
1999
PUNCH
2001
The Grid
InVIGO
20032004
DVE
2008-2012
Eucalyptus, UCSBNimbus, UC
Opennebula, MadridOpenstackCloudstack
Virtual Organization Clusters
Clusters of Virtual Machines provisioned on the grid to create a personal condor
cluster.
A Middleware timeline...
1999
PUNCH
2001
The Grid
InVIGO
20032004
DVE
2008-2012
Eucalyptus, UCSBNimbus, UC
Opennebula, MadridOpenstackCloudstack
Virtual Organization Clusters
Clusters of Virtual Machines provisioned on the grid to create a personal condor
cluster.
Google trends
•Cloud computing trending down, while “Big Data” is booming. Virtualization remains “constant”.
VOCs
Careful, Head Winds Ahead
•Cloud Computing Going down to the “through of Disillusionment”
• “Big Data” on the Technology Trigger
Clouds are in Production
• Amazon Web Services (AWS), reported to reach $1B business in 2012.
• http://www.geekwire.com/2011/amazon-web-services-billiondollar-business/
• Zynga, reportedly spent $100M per year on AWS. Moved to their own cloud (zcloud). Used to deploy on EC2 and do reverse cloud bursting. Now “owning the base and renting the peak”. Zynga can add as many as 1,000 new servers to accommodate a surge of users in a 24-hour period. The company’s servers can deliver a petabyte of data to users each day.
• http://www.wired.com/cloudline/2012/03/zynga-zcloud/
Proven scalability• Cyclecloud provisioned a 50,000 cores on EC2 (April 2012).
• LXCLOUD@CERN demonstrated management of 16,000 virtual machines using Opennebula (Summer 2010).
• Cloudstack (now an Apache incubator project) planning scalability to 50,000 hypervisors by the end of 2012.
• Hadoop scales to 30 PB (at Facebook, ~March 2011)
• Q1 2012, Amazon S3 was 905 Billion objects, routinely accessed at 650,000 requests per second.
Yet,We do not seem to
embrace it•50 hosts in LXCLOUD running VMs for batch processing
•FermiCloud, for internal use only, ~200 VMs
•Pales in comparison to the scale seen in industry
Thanks to Ulrich Schwickerath and Steve Timm for figuresDisclaimer: Opinion is not theirs :)
A clue from Industry•KPMG Survey, “Clarity in the Cloud:
Business Adoption”
Text “I don’t believe everyone yet fully realizes how much this stimulates innovation, how many opportunities will be presented, how many new challenges will need to be addressed, and how much
change is coming”, Pat Howard, VP Global Services,
IBM Partner
SaaS• 1.8 PB transferred in last 6 months
•GO on ESnet: 643 TBs in last 6 months, 25 sites exceeded 3 Gbps.
•GO for XSEDE: 607 TBs transferred, 4 sites exceeded 3 Gbps.
• Leveraged Cloud APIs to provide a new service
Text
Thanks to Raj Kettimuthu and Lee Liming for the data
PaaS• Azure
• Amazon Bean Stalk
•Heroku, PaaS for facebook applications
•Openshift, now open source (May 2012)
PaaS have not really seen much success in the scientific community,
but could be used to create new types of scalable applications
A PaaS for personal Condor
and GO •Globus Provision by Borja Sotomayor
•http://www.globus.org/provision/[general]deploy: ec2domains: simple
[domain-simple]users: gp-usergridftp: yesnis: yesfilesystem: nfscondor: yescondor-nodes: 4go-endpoint: go-user#gp-testgo-auth: go
[ec2]keypair: gp-keykeyfile: ~/.ec2/gp-key.pemusername: ubuntuami: latest-32bitinstance-type: t1.micro
[globusonline]ssh-key: ~/.ssh/id_rsa
This solves my problem from 1999
when I wanted a batch farm at hand...utility computing
IaaS on OSG at Clemson• Transform the Grid
into a Cloud
• All sites deploy an hypervisor
• Provision VMs depending on jobs queued in batch
• Keep normal grid workflow
• Move to Cloud by offering an “EC2” interface
VTDC08, PDP09, CCGRID09, ICAC10, JGC 10,FGCS10
First Gen IaaS@CU• Campus firewall
• NATed cluster
• Fear or VMs: no bridge network, no NAT even, only userland net on the hypervisors
• Meant:
• Developed a pull base task dispatcher (Kestrel, XMPP based used by STAR)
• Created image in DMZ (See Tony Cass’s talk for VM exchange, build trust in VM provenance, HEPiX wg on virtualization)
• Started VMs as regular batch jobs
• No interactive access
STAR with Kestrel• http://wiki.github.com/
legastero/Kestrel/
• Built to deal with Clemson’s “adverse” networking environment
• Started as a student project based on the idea that XMPP ( Jabber ) was a scalable, production proven messaging protocol.
• Run IM client in VM and send IM messages to manage jobs
• All VM instances are buddies in a Jabber server
Lots of french non-sense and then: “...To simulate the equivalent sample of 12.2 Billion Monte-Carlo events with ~ 10 Million accepted by event triggering after full event reconstruction, we would have taken 3 years at BNL on 50 machines This Monte-Carlo event generation would essentially not have been done. With the resources from cloud, we took 3-4 weeks.”
–Jerome Lauret BNL, STAR
Second Gen IaaS@CU
•Sub-interface created on all nodes (only one NIC per node)
•VLAN provisioned to isolate VM traffic
•Bridge networking enabled, VM get address via DHCP
•Demonstrated thousands VMs scale.
•Opennebula provisioning + Cumulus S3 storage to upload images.
Onecloud•Opennebula based IaaS
at Clemson developed through the NSF EXTENCI project (OSG+XSEDE)
•Used by STAR (see ACAT 2010, CHEP 2010)
•CERNVM can be used as a client or a batch image.
• https://sites.google.com/site/cuonecloud/
Cloud back to Networking: Openflow• Onecloud, integrates Openflow to provide dynamic network services to
solve NAT and firewall issues. Developed an implementation of Amazon Security Groups and Elastic IP using openflow. Avoids the use of complex and failure prone Network Overlays, once rule are set, switch operates at line rate.
• Software Defined Networking, aims at bringing the control plane of the networking in the hands of the developers. Opens the door for network aware applications, dynamic network topologies according to load. A low level API for the network, we can now program the network
• Google announced that their network ran using Openflow at the Open Networking Summit (http://opennetsummit.org/):
• http://www.eetimes.com/electronics-news/4371179/Google-describes-its-OpenFlow-network
• There is work to support MPLS with Openflow, so that any Openflow switch could be used as an MPLS switch. This means that OSCARS could be used to provision circuits on an openflow network.
3rd Gen Iaas@CU• Move OneCloud in the “ESNet” Science DMZ
• Deploy PaaS with OpenShift
• 100Gbps link
• 10Gbps to Amazon via I2 Commercial peering service
• Fully configurable via Openflow and maybe OSCARS ....
• Provide on-demand resources and on-demand data paths.
See: ARCHSTONE + VNODDOE ASCR funded project
Dimitrios Katramatos (BNL)
Conclusions• Virtualization has matured to the point of seeing
fruitful competition in Cloud IaaS solutions (both academically and in industry)
• Cloud (and APIs) give us great agility to create new services to serve the community. Reduce “time to use” of these services and sustain scale.
• Clouds probably fulfilling the true vision of Grids
• Advanced VM provisioning and network services mean that on-demand, elastic data centers are possible today.
• This work was possible through support from NSF OCI-0753335, OCI-1007115 and BMW