facultatea de automatica si calculatoare universitatea “politehnica“ din bucuresti current...

67
Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro The Internet is about getting computers to talk together; The Internet is about getting computers to talk together; grid computing is about getting computers to work together. grid computing is about getting computers to work together. (from IBM’s Grid definition) (from IBM’s Grid definition)

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Facultatea de Automatica si Calculatoare

Universitatea “Politehnica“ din Bucuresti

Current trends in Grid computing

Dobre Ciprian Mihai

cipsm {at} cs.pub.ro

““The Internet is about getting computers to talk together; The Internet is about getting computers to talk together; grid computing is about getting computers to work together.” grid computing is about getting computers to work together.”

(from IBM’s Grid definition)(from IBM’s Grid definition)

Page 2: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Outline of the presentation

What is Grid computing – sorting out the alphabet soup. Impact of Grid computing to science. CERN as a driving force in Grid computing. Grids – where to ?

Page 3: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

What is Grid ?

Many definitions of Grid computing Term coined as analogy to electrical power grid According to Ian Foster, the “father of grid computing”, the

term grid has been hijacked to “embrace everything from advanced networking to artificial intelligence”

Marketers are applying grid labels to all sorts of products and services, adding to the confusion and hype

“From the wide ranging definitions of Grid, to the volume of standards bodies and organizations -- it can be a real challenge to distinguish the significant developments from the hype.” (Ian Foster, 2005)

Electrical Power Grid Computing Grid

Pervasive Everywhere, wall socket Everywhere, any “net-thing”

Transparent Power “just happens” Power “just happens”

Infrastructure Power stations, transformers, powerlines, transmission hubs CPUs, servers, storage, networks, archives, middleware

Utility Pay-for-use, accounting, reporting, settlement Pay-for-use, accounting, reporting, metering

Page 4: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Ian Foster’s Evolving definitions

“A computational grid is a hardware and software infrastructure that provides dependable, pervasive, and inexpensive access to high-end computing capabilities”

Ian Foster and Carl Kesselman, editors, “The GRID: Blueprint for a New Computing Infrastructure”, Morgan-Kaufman Publishers, 1999.

“The grid infrastructure consists of protocols, application programming interfaces, and software development kits to provide authentication, authorization, and resource location/access”

2001: Foster, Kesselman, Tuecke: “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, http://www.globus.org/research/papers.html

“The grid integrates services across distributed, heterogeneous, dynamic ‘virtual organizations’ formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships in both e-business and e-science”

2002: Foster, Kesselman, Nick, Tuecke: “The Physiology of the Grid”, http://www.globus.org/research/papers/ogsa.pdf

GGF: “A system that is concerned with the integration, virtualization, and management of services and resources in a distributed,

heterogeneous environment that supports collections of users and resources (virtual organizations) across traditional

administrative and organizational domains (real organizations).”

GGF: “A system that is concerned with the integration, virtualization, and management of services and resources in a distributed,

heterogeneous environment that supports collections of users and resources (virtual organizations) across traditional

administrative and organizational domains (real organizations).”

CoreGRID: “A fully distributed, dynamically reconfigurable, scalable and autonomous infrastructure

to provide location independent, pervasive, reliable, secure and efficientaccess to a coordinated set of services encapsulating and virtualizing resources

(computing power, storage, instruments, data, etc.) in order to generateknowledge.”

CoreGRID: “A fully distributed, dynamically reconfigurable, scalable and autonomous infrastructure

to provide location independent, pervasive, reliable, secure and efficientaccess to a coordinated set of services encapsulating and virtualizing resources

(computing power, storage, instruments, data, etc.) in order to generateknowledge.”

IDC: “Set of independent computers combined intounified system through systems software and

networking technologies”

IDC: “Set of independent computers combined intounified system through systems software and

networking technologies”

Gartner: “a collection of resources owned by multiple organizations coordinated in such a way as to allow them

to solve a single common problem.”

Gartner: “a collection of resources owned by multiple organizations coordinated in such a way as to allow them

to solve a single common problem.”

IBM: “ability, using a set of open standardsand protocols, to gain access to applications and data,

processing power, storage capacity, and a vast array of othercomputing resources over the Internet”

Grid computing is a network of computation: tools andprotocols for coordinated resource sharing and problem

solving among pooled assetsApplication processing, distributed across multiple

locations, and interconnected through a shared networksuch as the Internet

IBM: “ability, using a set of open standardsand protocols, to gain access to applications and data,

processing power, storage capacity, and a vast array of othercomputing resources over the Internet”

Grid computing is a network of computation: tools andprotocols for coordinated resource sharing and problem

solving among pooled assetsApplication processing, distributed across multiple

locations, and interconnected through a shared networksuch as the Internet

SUN: “A way of managing and dynamically sharingdisparate sets of resources”

A hardware and software infrastructure that connectsdistributed computers, storage devices, databases, and

software applications through a network, and is managedby distributed resource management software

A dependable, universal information infrastructure thatbuilds on the power of the Internet and enables more

efficient computation, collaboration, and communication

SUN: “A way of managing and dynamically sharingdisparate sets of resources”

A hardware and software infrastructure that connectsdistributed computers, storage devices, databases, and

software applications through a network, and is managedby distributed resource management software

A dependable, universal information infrastructure thatbuilds on the power of the Internet and enables more

efficient computation, collaboration, and communication

“ A Grid is a large, heterogeneous, system that allows sharing and coordinating resources

in a dependable and pervasive manner “

“ A Grid is a large, heterogeneous, system that coordinates resources spread over wide ares ““ A Grid is a heterogeneous system that allowsmultiple entities to share and use resources,

under various administrative policies,offering a transparent access to the user “

“ A Grid is a heterogeneous system spreadover a wide geographical area, which allowsmultiple entities to share and use resources,

under various administrative policies,offering a transparent access to the user, through

the use of consistent access protocols and interfaces “

Page 5: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Why so many definitions?

Computer science and software engineering sometimes do not have definitions as strict as those in the fields of physics or mathematics – this “lack of definitions” leads to many Grid researchers or people working with Grid technology having different views on what a Grid is.

Hardware discrepancies: for some a local cluster with a middleware system on top is a Grid whereas others believe that a wide-are network connection has to be involved.

Software problems: What actually makes a piece of software a “Grid software”? Is any kind of middleware using Grid security already Grid software?

Due to the recent advanced in Web and Grid service technologies, where to draw the line between Web services and Grid services?

Page 6: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

So what is Grid after all?

In this Soup of grid definitions there are two that were widely accepted by the community:

I. Foster, Research view: “A Grid is a system that (1) coordinates resources that are

not subject to centralized control (2) using standard, open, general-purpose protocols and interfaces (3) to deliver nontrivial qualities of service”

A. Grimshaw, Industry view: “From a hardware perspective, a Grid is a collection of

distributed resources connected by a network. From a user perspective a Grid gathers together resources and makes them accessible in a secure manner to users and applications”

Page 7: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Describing the elephant

A Grid infrastructure must provide a set of technical capabilities:1. Resource modeling – describes available resources, their capabilities, and

the relationships between them to facilitate discovery, provisioning, and quality of service management.

2. Monitoring and notification – provides visibility into the state of resources to enable discovery and maintain quality of service. Logging of significant events and state transitions is also needed to support accounting and auditing functions.

3. Allocation – Assures quality of service across an entire set of resources for the lifetime of their use by an application. This is enabled by negotiating the required level(s) of service and ensuring the availability of appropriate resources through some form of reservation—essentially, the dynamic creation of a service-level agreement.

4. Provisioning, life-cycle management, and decommissioning - enables an allocated resource to be configured automatically for application use, manages the resource for the duration of the task at hand, and restores the resource to its original state for future use.

5. Accounting and auditing - tracks the usage of shared resources and provides mechanisms for transferring cost among user communities and for charging for resource use by applications and users.”

6. In addition to that security is an important aspect.

Foster, Tuecke, “Describing the elephant: the different faces of IT as services”, ACM Queue, 2005.

Page 8: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

The two key Grid computing groups

The Globus Alliance (www.globus.org) Composed of people from: Argonne National Labs, University of

Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others.

OGSA/I standards initially proposed by the Globus Group Based off papers “Anatomy of the Grid” & “Physiology of the

Grid”

The Global Grid Forum (www.ggf.org) First meeting in June of 1999, Based off the IETF charter Heavy involvement of Academic Groups and Industry (e.g. IBM

Grid Computing, HP, United Devices, Oracle, UK e-Science Programme, US DOE, US NSF, Indiana University, and many others)

Meets three times annually Solicits involvement from industry, research groups, and

academics

Page 9: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

More on Grids

The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world.

The Grid search engine finds the data the scientist needs, but also the data processing techniques and the computing power to carry them out.

It then distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist.

Why use the Grids? Industrial and academic partners form an “extended enterprise”

in which resources are intrinsically distributed, and only partially shared.

Partners may be prepared to share data, but not the hardware and proprietary software that produces the data.

Page 10: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Why Grid computing now? Let us look at the evolution of ICT

Page 11: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid-like Vision

In 1969, Leonard Kleinrock, one of the chief scientists of the original ARPA project which seeded the Internet, wrote: "As of now, computer networks are still in their infancy,

but as they grow up and become sophisticated, we will probably see the spread of "computer utilities", which, like present electric and telephone utilities, will service individual homes and offices across the country“

Despite major advances in hardware and software systems over the past 35 years, we are yet to realize this vision. How far are we still from delivering computing as a utility? Let us look into the ICT evolution and project the future.

Page 12: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Computing and Communication Technologies Evolution: 1960-

2010!

* Sputnik

1960 1970 1975 1980 1985 1990 1995 2000

* ARPANET

* Email* Ethernet

* TCP/IP* IETF

* Internet Era * WWW Era

* Mosaic

* XML

* PC Clusters* Crays * MPPs

* Mainframes

* HTML

* W3C

* P2P

* Grids

* XEROX PARC wormCO

MP

UTIN

GC

om

mu

nic

ati

on

* Web Services

* Minicomputers

* PCs

* WS Clusters

* PDAs* Workstations

* HTC

2010

* e-Science

* Computing as Utility

* e-Business

* SocialNet

ControlCentralised Decentralised

Page 13: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Computing is Scaling: Towards Inter-Planetary Level

2100

2100 2100 2100 2100

2100 2100 2100 2100

Personal Device SMPs or SuperComputers

LocalCluster

GlobalGrid

SERV ICES

+

PERFORMANCE

Inter PlanetGrid

•Individual•Group•Department•Campus•State•National•Globe•Inter Planet•Universe

Administrative Barriers

EnterpriseCluster/Grid

Page 14: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

A little bit more…

Benefits for Science: More effective and seamless collaboration of dispersed

communities, both scientific and commercial Ability to run large-scale applications comprising thousands of

computers, for wide range of applications. Transparent access to distributed resources from your desktop,

or even your mobile phone The term “e-Science” has been coined to express these

benefits – the application domain “Science” of Grid & Web Impact : e-Science

From the EPSRC e-Science web site:

"In the future, e-Science will refer to the large-scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists."

Page 15: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Healthy, Wealthy, and Wise?

e-Health: electronic patient records, distributed and/or remote diagnosis, collaborative surgical planning.

e-Business: streamline, distribute, and enhance business processes.

e-Commerce: use the Grid as a marketplace for both traditional and innovative goods and services.

e-Learning: remove barriers to education and training.

Grid applications for Science: Medical/Healthcare (imaging, diagnosis and treatment ). Bioinformatics (study of the human genome and proteome to

understand genetic diseases). Nanotechnology (design of new materials from the molecular

scale). Engineering (design optimization, simulation, failure analysis and

remote Instrument access and control). Natural Resources and the Environment (weather forecasting,

earth observation, modeling and prediction of complex systems)

Page 16: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid and Web Services Convergence

Definition of Web Service Resource Framework (WSRF) makes explicit distinction between “service” and stateful entities acting upon service i.e. the resources

Means that Grid and Web communities can move forward on a common base!!!

Grid

OGSi

GT2

GT1

Web HTTPWSDL,

SOAP

WS-*

Have beenconverging

WSRF

Started far apart in

applications &

technology

XML

BPEL

WS-I Compliant

TechnologyStack

Page 17: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid and Web Services

The Globus Grid Forum (GGF) standard was (2004) divided into: Open Grid Services Architecture (OGSA)

Defines standard mechanisms for creating, naming, and discovering Grid service instances.

Addresses architectural issues relating to interoperable Grid services. An open, service-oriented architecture (SOA): resources as first-class

entities, dynamic service/resource creation and destruction Built on a Web service infrastructure Resource virtualization at the core Build grids from small number of standards-based components

(replaceable, coarse-grained) Customizable: Support for dynamic, domain-specific content… within the

same standardized framework Described in “The Physiology of the Grid”

http://www.globus.org/research/papers/ogsa.pdf Open Grid Services Infrastructure (OGSI)

It was based upon Grid Service specification. It specifies the way clients interact with a grid service (service invocation management, data interface, security interface, ...).

In the new draft (2005-06) some mandatory specifications of OGSI are merged with OGSA and new WSRF is introduced (GT4)

WSRF : Web Services Resource Framework : defines a generic and open framework for modeling and accessing stateful resources using web services

Page 18: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

The core elements of the Open Grid Services Architecture

This layer eliminated in recent version of standard

Page 19: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Pre-GT4

Page 20: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

GT4

Page 21: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Virtualizing Resources

Resources

Webservices

AccessAccess

StorageStorage SensorsSensors ApplicationsApplications InformationInformationComputersComputers

Resource-specific InterfacesResource-specific Interfaces

Common Interfaces

Type-specific interfaces

Page 22: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

A Service-Oriented Grid

Virtualized resources

Grid middleware

services

Brokering Service

Brokering Service

Registry Service

Registry Service

DataService

DataService

CPU ResourceCPU ResourcePrinter Service

Printer Service

Job-Submit Service

Job-Submit Service

ComputeService

ComputeService

No

tify

Ad

vertise

ApplicationService

ApplicationService

Page 23: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Global Grid Community

Page 24: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CERN is the world's largest particle physics centre

Particle physics is about: elementary particles which all matter in the Universe is made of fundamental forces which hold matter together

Particles physics requires: special tools to create and study new particles

With its 27 km circumference, the LHC accelerator will be the largest superconducting installation in the world.

CERN?

CERN is:-~ 2500 staff scientists (physicists, engineers, …)- Some 6500 visiting scientists (half of the world's particle physicists)

They come from 500 universities representing

80 nationalities.

Page 25: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Computing @ CERN

Latest trend is to federate national Grids to achieve a global Grid infrastructure – High Energy Physics is a driving force to this.

High-throughput computing based on reliable “commodity” technology

LHC Data Analysis requires a computing power equivalent to ~100,000 of today’s fastest PC processors !

More than 2500 dual processor PCs About 3 million Gigabytes of data on disk and tapes PROBLEM: nowhere near enough!• SOLUTION: use the Grid to unite computing resources of particle

physics institutes around the world.• CERN leads two major global Grid projects:

• WLCG: World-wide LHC Computing Grid Collaboration• EGEE: Enabling Grid for E-sciencE project for all sciences

WLCG: All the Institutions participating in the provision of the Worldwide LHC Computing Grid with a Tier-1 and/or Tier-2 Computing Centre form the WLCG Collaboration.

The LHC Computing Grid project launched a service with 12 sites in 2003. Today 200 sites in 30 countries with 16,000 PCs.

Page 26: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Computing @ CERN

The LCG architecture consists of an agreed set of services and applications running on the Grid infrastructures provided by the LCG partners.

These infrastructures at the present consist of those provided by the Enabling Grids for E-sciencE (EGEE) project in Europe, the Open Science Grid (OSG) project in the U.S.A. and the Nordic Data Grid Facility in the Nordic countries.

Grid3 was the start-up of OSG The LCG Project builds and maintains computing infrastructure for

LHC experiments Original (’02) LCG plan: “The LCG is not a middleware project” Was to be delivered... too little, too late Feature set, performance, scalability disappointing New (’04) plan: Middleware “re-engineering” as part of the LCG

program, in collaboration with EGEE

Page 27: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

EGEE-II: Fast description of the project

EGEE launched in 2004, already supports 20 applications in six scientific domains (biomedicine, geophysics, quantum chemistry…)

EGEE brings together scientists and engineers of 90 institutions

In over 30 countries worldwide To provide seamless GRID

infrastructure for e-Science Available 24 h/day x 7days/week Funded by EU (European Commission) Two original scientifically fields: HEP

and Life Sciences; but it integrates many other fields: from Geology up to Computing Chemistry

Infrastructure: 30.000 CPUS , 5 PBbytes storage, 200 sites in 39 countries, 60 Virtual Organizations

Maintains 10.000 concurrent jobs on average

Page 28: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Computing @ CERN

Page 29: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Three Generations of Grid

Source: Charlie Catlett

StandardizationStandardization is key for third-generation grids! is key for third-generation grids!StandardizationStandardization is key for third-generation grids! is key for third-generation grids!

• Local “metacomputers“– Distributed file systems– Site-wide single sign-on

• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs of concept

• Local “metacomputers“– Distributed file systems– Site-wide single sign-on

• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs of concept

• Utilize software services and communications protocols developed by grid projects:

– Condor, Globus, UNICORE, Legion, etc.

• Need significant customization to deliver complete solution• Interoperability is still very difficult!

• Utilize software services and communications protocols developed by grid projects:

– Condor, Globus, UNICORE, Legion, etc.

• Need significant customization to deliver complete solution• Interoperability is still very difficult!

• Common interface specifications support interoperability of discrete, independently developed services

• Competition and interoperability among applications, toolkits, and implementations of key services

• Common interface specifications support interoperability of discrete, independently developed services

• Competition and interoperability among applications, toolkits, and implementations of key services

We are here!

We are here!

Page 30: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grids – Where to ?

The commercial interest in Grids systems and related technologies is increasing.

Companies such as Sun Microsystems, IBM, Oracle, Intel, Microsoft, HP show particular interest in getting a piece of the $12 billion market predicted by IPC for 2007 (according to IDC).

0

2000

4000

6000

8000

10000

12000

14000

2003 2004 2005 2006 2007

Year

Mill

ion

Page 31: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grids – Where to ?

After the year 2007, business popularity of Grid computing is expected to accelerate: Especially, the financial services and ERP services is

expected to take major parts in the expense (Source: Insight Research Corp.)

Bill

ion

s

Page 32: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grids – Where to ?

An interesting prediction (the 451 Group analysts) is that grid technology will be slowly absorbed into enterprise fabrics…

One consequence for grid computing might be that term grid computing "will become both more relevant and less used […] It will be more relevant as grids are used to support far more than HPC tasks, but less used as vendors seek to be associated with far more activity, and far higher up the stack, than grid computing."

IBM and Oracle could drop "grid" from their products in favour of a broader term, while Microsoft has made it very clear that it will not use the term “grid”.

In the new era of Grid computing grids must support automated data, storage and service activities just as capably as handling computational tasks.

These challenges are being addressed by a new paradigm called “Grid 2.0”

Page 33: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grids – Where to ?

Grid 1.0 – concerned with the virtualization, aggregation and sharing or compute resources

Grid 2.0 – focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources

The key term is “virtualization” (encapsulation behind a common interface of diverse implementations) is being driven by the need to various enterprises to create a virtual resource market to allocate resources based on business demand.

Virtualization introduces a layer of abstraction: instead of having to snoop out what resources are available and try to adapt a problem to use them, a user can describe a resource environment (virtual workspace) and expect it to be deployed on the grid. The mapping between the physical resources and the virtual workspace will be handled using virtual machines, virtual appliances, distributed storage facilities and network overlays (“virtual grids”).

The promise is that in Grid 2.0 the resources will be easier to define, test, install, transport and adjust on demand.

Page 34: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CSComputer Science

Department

Web 2.0: Evolution Towards a Read/Write Platform

Web 1.0(1993-2003)

Pretty much HTML pages viewed through a browser

Web 2.0(2003- beyond)

Web pages, plus a lot of other “content” shared over the web, with more interactivity; more like an application than a

“page”

“Read” Mode “Write” & Contribute

“Page” Primary Unit of content

“Post / record”

“static” State “dynamic”

Web browser Viewed through… Browsers, RSS Readers, anything

“Client Server” Architecture “Web Services”

Web Coders Content Created by… Everyone

“geeks” Domain of… “mass amatuerization”

Page 35: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CSComputer Science

Department

Web 2.0 By Example

Web 1.0 Napster Britannica On Line Akamai MP3.com Double Click Content Management

Web 2.0 Google Wikipedia BitTorrent iTUNES or Napster Adsense Wikis

Tim O’Reilly

Page 36: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CSComputer Science

Department

Google Earth™ a Mega API for Web 2.0

Illustrates the Benefits of SOA and GRID with a Web 2.0 Delivery Model

• Distributed, re-usable core services on shared infrastructure

• Shared data

• Exposed interfaces

• Application is streamed to client and works offline

Page 37: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CSComputer Science

Department

Google Earth™ a Mega API for Web 2.0

Page 38: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CSComputer Science

Department

Wikipedia is a Collaborative Dictionary Being Edited in

Realtime by Anyone

Page 39: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid 2.0 EmergingGrid 2.0 Emerging

Grid 1.0Grid 1.0Compute Intensive Compute Intensive Cycle AggregationCycle Aggregation

SOASOASoftware Services Software Services with SLA & QoS with SLA & QoS

MetricsMetrics

Virtualization Virtualization Consolidation of Consolidation of

ResourcesResources

Grid 2.0*Grid 2.0* Virtualized Compute, Virtualized Compute,

Storage, Network, DataStorage, Network, Data

Service OrientedService Oriented

Policy Driven AutomationPolicy Driven Automation

Distributed across Distributed across firewallsfirewalls

Parallel, stateless, Parallel, stateless, stateful and transactional stateful and transactional appsapps

*The 451 Group: 'grid 2.0' is focused on the virtualization, aggregation and sharing of all *The 451 Group: 'grid 2.0' is focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources. It is both Service-oriented and automated.compute, storage, network and data resources. It is both Service-oriented and automated.

Page 40: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Virtualization

Virtualization covers both, data (flat files, databases etc.) and computing resources. Grid as workflow virtualization — the Grid computing services are used to execute and

manage processes across multiple compute platforms. Data Grid as data virtualization — the management of shared collections

independently of the remote storage systems where the data is stored. Semantic Grid as information virtualization — the ability to reason on inferred

attributes from multiple independent information repositories. Name space virtualization, logical names for resources, users, files, and metadata that

are independent of the name spaces used on the remote resource. Trust virtualization, the ability to manage authentication and authorization

independently of the remote resource. Constraint virtualization, the ability to manage access controls independently of the

remote resource. Access virtualization, the ability to port an arbitrary access mechanism on top of the

Grid middleware. For Data Grids, this is the ability to support access through multiple loadable libraries, Java, Digital libraries, workflow actors, Web browsers, etc.

Network virtualization, the ability to manage transport in the presence of network devices such as firewalls, load levelers, private virtual networks. This typically requires multiple protocols to support client-initiated versus server-initiated I/O, bulk operations versus single-file operations.

Latency management, the ability to minimize the number of messages sent over wide area networks. Examples include execution of procedures at the remote resource when the complexity (ratio of operations to bytes transmitted) is sufficiently small. The standard case is data filtering or sub-setting.

Federation, the ability to interoperate across multiple grid environments. This requires the ability to share logical name spaces, and Shibboleth-style authentication. Grids establish trust mechanisms to allow assertions about the authenticity of an individual to be verified from the “home” Grid.

Page 41: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

So, are we there yet ?

Will the Grid be available to all of you ? Hard to predict…

Jules Piccard, a professor at the University of Basel, installed the first telephone in the city, around 1880, between his home and his institute. He showed it proudly to other scientists and got the comment: “Looks very good, but I doubt it will ever have any practical use”.

"The world will only need five computers"attributed to Thomas J. Watson, IBM

"640 kilobytes is all the memory you will ever need"attributed to Bill Gates, Microsoft

"There is absolutely no need for a computer in the home"attributed to Ken Olsen, DEC (once a leading minicomputer manufacturer)

Page 42: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

So, are we there yet ?

The complete success of the Grid hype depends on at least three conditions:

The Grid can be considered a success when there are no more “Grid papers”, but only a footnote in the work that states, “This work was achieved using the Grid”.

The Grid can be considered a success when supercomputer centres don't give a user the choice of using their machines or using the Grid, they just use the Grid.

The Grid can be considered a success when a SuperComputing demo can be run any time of the year. We are not yet there…

Page 43: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

What’s holding us ?

Organizational politics act very much like a barrier to implementing Grid computing:

“server-hugging” – organizations have a sense of ownership over the resources bought or allocated for their use.

unrealistic expectations from Grid computing – marketing departments have run amuck and have marketed the grid “nirvana” and not the grid that exists and is possible today.

perceived loss of control or access over resources. loss or reduction of budget dollars. lack of data security among departments. fear of external data leaks, reduced priority of projects - sometimes users believe that they

need dedicated IT resources to complete their work accurately and efficiently.

risks associated with enterprise-wide deployment - how do different geographies and cultures come together to agree on global priorities, configurations, standards, and policies.

Page 44: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

In the end…

One of the biggest fears for Grid computing is that it might be seen as today’s sexy technology that will quickly get replaced by tomorrow’s sexy technology.

The Grid researchers and technologists have to start to point to results/applications that utilize the Grid to solve problems or enable new applications that would have be unachievable without the Grid.

Contemporary Grid implementations are still far from initially described image and from being widely adopted.

Page 45: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid computing in pictures

Thanks to GridCafe (http://gridcafe.web.cern.ch/gridcafe - i strongly recommend that you also visit this link), it is now MOVIE time.

Page 46: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Thank you !

Questions? Observation?Questions? Observation?

Page 47: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Additional slides

Page 48: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid characteristics

Collaboration - Grid is sharing of resources in a distributed fashion. A Grid spans multiple administrative domains seamlessly.

Aggregation - A Grid is more than the sum of all parts. A Grid aggregates many resources and therefore provides an aggregation of the capacity of the individual resources into a higher capacity virtual resource. The capability of individual resources is preserved. As a consequence, from a global standpoint the Grid enables running larger applications faster (aggregation capacity), while from a local standpoint the Grid enables running new applications

Virtualization – Grid services are often provided with a certain interface that hides the complexity of the underlying resources. Virtualization provides an abstract “layer” between clients and resources, Therefore, a Grid provides the ability to virtualize the sum of parts into a singular wide-area programming model.

Page 49: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid characteristics

Service orientation - Grids provide services, following the concept of a service orient architecture. In the widest sense all large scale collections of services can be viewed as Grids.

Heterogeneity - A Grid typically consists of heterogeneous computing resources, i.e. there is a variety of different hardware and software components with different performance and latency characteristics.

Decentralized control - components are under control of multiple entities, i.e. the key difficulties in Grids lay exactly in not having a single ‘owner’ of the whole system. One of the requirements of a Grid is the use of distributed control mechanisms

Standardization and interoperability - A Grid promotes standard interface definitions for services that need to interoperate to create a general distributed infrastructure to fulfill users’ tasks and provide user level utilities. Grid is exposing the need for increased levels of integration of distinct technologies and for increased agreements in the standardization of services. The success of the implementation of the Grid very much depends on these aspects. Furthermore, the Grid should provide uniform access to heterogeneous resources through virtualization.

Page 50: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid characteristics

Access transparency - The Grid should allow its users to access the computing infrastructure without having to be intimately aware of the underlying architecture or network topology]. This is sometimes considered the most distinctive aspect of Grid Computing, that is, the levels of transparency provided for the end-user, through the virtualization of resources.

Scalability - Even if Grid implementations and infrastructures sometimes do not solve a new problem, it is often the scale of data, resources and users that contributes to the additional complexity of a Grid.

Reconfigurability - A Grid should be “dynamically reconfigurable” (CoreGRID definition).

Security - Grid security is one of the first things that real Grid users have to deal with and therefore is essential for any Grid software system that spans multiple administrative domains.

Application support – Applications should also be part of the Grid and the whole Grid environment (where for environment I mean the hardware, middleware, and applications) should be data-driven. In particular, it should be able to react to changes of the system and application behaviors captured by application and system data.

Page 51: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Grid characteristics

Computing model - a Grid supports several computational models (e.g., batch, interactive, distributed and parallel computing. . . ).

Licensing model - Since Grids originate from the academic community, there is a global emphasis on open source software, which is also followed by several companies that are involved in Grid development.

Procedures and policies - Grid users and service providers interact with each other in a similar way like on the open market where certain rules have to be followed. Therefore, procedure and polices need to be in place to allow for (coordinated) sharing of resources.

Auditing - Tracking the usage of shared resources and providing mechanisms for transferring cost among user communities and for charging for resource use by applications and users.

Page 52: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Comparison of Middleware Technologies

MiddlewareProperty UNICORE Globus Legion Gridbus

Focus High level Programming models

Low level services High level Programming models Abstractions and market models

Category Mainly uniform job submission and monitoring

Generic computational Generic computational Generic computational

Architecture Vertical multi tiered system Layered and modular toolkit Vertically integrated system Layered component and utility model

Implementation Model

Abstract Job Object Hourglass model at system level Object-oriented metasystem Hourglass model at user level

Implementation Technologies

Java C and Java C++ C, Java, C# and Perl

Runtime Platform Unix Unix Unix Unix and Windows with .NET

Programming Environment

Workflow environment Replacement libraries for Unix & C libraries. Special MPI library (MPICH –G), CoG (Commodity Grid) kits in Java, Python, CORBA, Matlab, Java Server Pages, Perl and Web Services

Legion Application Programming Interfaces (API).Command line utilities

Broker Java APIXML-based parameter-sweep language Grid Thread model via Alchemi.

Some Users and Applications

EuroGrid], Grid Interoperability Project, OpenMolGrid and Japanese NAREGI.

AppLeS, Ninf], Nimrod-G, NASA IPG, Condor-G, Gridbus Broker, UK eScience Project], GriPhyN], and EU Data Grid.

NPACI Testbed, Nimrod-L, and NCBioGrid. Additionally, it has been used in the study of axially symmetric steady flow and protein folding applications.

ePhysics, Belle Analysis Data Grid], NeuroGrid], Natural Language Engineering, HydroGrid, and Amsterdam Private Grid].

Page 53: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Toolkit Components

Page 54: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Common Runtime

Python Web Services Core Allows one to create WSRF-

compliant web services in Python

C Web Services Core Allows one to create WSRF-

compliant web services in C Java Web Services Core

WSRF APIs in Java C common libraries

C Abstraction layer for Globus data types, libc, etc

XIO : Extensible IO Superset of basic file I/O

library open/close/read/write/etc

Supports multiple wire protocols transparently

TCP/UDP/File/HTTP/GSI/GSSAPI_FTP/telnet/queuing

Page 55: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus XIO

Open file for reading/writing inFile = fopen( file, “w+” )

Do some reading… fread(buffer,1,sizeof(buffer

),inFile) Do some writing…

fprintf(inFile, “%s\n”, “HELLO!” )

Close the file fclose( inFile )

Disk Storage

[Mountain View, CA]

Student Workstation [Denton, TX]

Page 56: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Information Services

WebMDS Allows one to view monitoring

information about grid resources from a web browser

Index Collects monitoring and discovery

information grid resources. Publishes the information to a single

point so other resources/peoples can discover resources

Trigger Collects various pieces of data from

grid resources Can be configured to perform actions Ex : when a disk is 80% full, send an

email to the administration staff Monitoring and Discovery [MDS2]

Provides method to publish and discover resources on the grid.

Also allows the collection of resource status and configuration information

Deprecated component

Page 57: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Indexing Service

Globus Index Resource Query

Computational

Resources

Database

Resources

Archival Resources

Storage Resources

Page 58: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Execution Management

Grid Resource Allocation and Management [GRAM]

WSRF compliant device to submit, monitor, and cancel jobs on grid computing resources

Not a scheduler, but rather communicates with other, local schedulers

Pre-WS Grid Resource Allocation and Management

Same as above… not a web service Community Scheduler Framework

WSRF compliant meta-scheduler. Actually schedules jobs to other

batch scheduler. Supports advanced reservations and

advanced scheduling policies Grid Telecontrol Protocol

WSRF protocol for telecontrol Ex : focusing an electron microscope

remotely Workspace Management

Dynamically create and manage workspaces

Page 59: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus GRAM

GRAM User Job Submission

PBS

SGE

LSF Condor

Site A

Site B

Site C

Site D

Geographically Disparate Computational Resources

Page 60: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Data Management

Data Replication Allows for the local replication of

pertinent data across grid environments Commonly used files are replicated

locally to reduce transfer delays OGSA-DAI

Supports the exposure of data resources such as relational databases or XML databases onto the grid

Single point of query for multiple databases

Reliable File Transfer Handles third-party messages to control

GridFTP transfers Submission of transfer requests

GridFTP High performance, reliable data transfer

protocol for high bandwidth, wide area networks.

Used to perform the data transfer test that became the LAN Speed Record

Replica Location Allows discovery and registration of

data replicas on the grid Maintains the correlation between

logical names and target names

Page 61: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Traditional FTP

Single Client/Server connection

Single data stream Limited by computational

resources and network bandwidth

Very inefficient

Gigabit E

thernet

Data Channel

Page 62: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus GridFTP

Single Server/Client

Gigabit E

thernet

Data Channel

Multiple Server/Client [single file]

Storage

StorageEach machine

transfers ¼ of the file

Page 63: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Security

Community Authorization Allows a virtual organization to express

policy regarding resources across sites Despite the local authorization, granting

and revoking access to resources is possible

Delegation Allows the sharing of a single credential

across multiple invocations of services I need to submit multiple jobs, now I can

use the same certificate for each one Authentication/Authorization

Message & Transport Level Security SSL/TLS/X.509 encryption standard for

message traffic Authorization Framework

Provides multiple different authorization mechanisms : gridmap, SAML, NIS, PAM, LDAP

Pre-WS Authentication/Authorization Credential Management

SimpleCA Simplified certificate authority

MyProxy Online credential repository for X.509 proxy

credentials

Page 64: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Globus Authentication

Globus Authentication

NIS

LDAP

PAM

Site A : Washington DC

Site B : UNT LDAP Student Login

Site C : Custom Authentication

User Authentication

Credential Repository

Page 65: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

Context Services Info

Services

InfraServices

SecurityServices

Rsrc Mgmt Services

Execution Mgmt

Services

DataServices

PolicyMgmt

VOMgmt

Access

Integration

Transfer

Replication

BoundaryTraversal

Integrity

Authorization

Authentication

WSRF WSN WSDM

EventMgmt

Monitoring Discovery

JobMgmt

Logging

ExecutionPlanning

WorkflowMgmt

WorkloadMgmt

Provisioning

Execution

DeploymentConfigurationReservation

Naming

Self MgmtServices

HeterogeneityMgmt

Service LevelAttainment

QoSMgmt

Optimization

Information Services

Infrastructure Services

SelfMgmtServices

SecurityServices

Resource Mgmt Services

Execution Mgmt Services

DataServices

Context Services

Page 66: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CERN

Page 67: Facultatea de Automatica si Calculatoare Universitatea “Politehnica“ din Bucuresti Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro

CERN