grid computing

89
1 GRID COMPUTING Faisal N. Abu-Khzam & Michael A. Langston University of Tennessee

Upload: sharmili-priyadarsini

Post on 10-May-2015

3.487 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Grid Computing

1

GRID COMPUTING

Faisal N. Abu-Khzam

&

Michael A. Langston

University of Tennessee

Page 2: Grid Computing

2

Outline

Hour 1: Introduction

Break Hour 2: Using the Grid

Break Hour 3: Ongoing Research

Q&A Session

Page 3: Grid Computing

3

Hour 1: IntroductionWhat is Grid Computing?Who Needs It?An Illustrative ExampleGrid UsersCurrent Grids

Page 4: Grid Computing

4

What is Grid Computing?Computational Grids

– Homogeneous (e.g., Clusters)

– Heterogeneous (e.g., with one-of-a-kind instruments)

Cousins of Grid ComputingMethods of Grid Computing

Page 5: Grid Computing

5

Computational GridsA network of geographically distributed

resources including computers, peripherals, switches, instruments, and data.

Each user should have a single login account to access all resources.

Resources may be owned by diverse organizations.

Page 6: Grid Computing

6

Computational GridsGrids are typically managed by

gridware. Gridware can be viewed as a special

type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability…)

Page 7: Grid Computing

7

Cousins of Grid ComputingParallel ComputingDistributed ComputingPeer-to-Peer ComputingMany others: Cluster Computing,

Network Computing, Client/Server Computing, Internet Computing, etc...

Page 8: Grid Computing

8

Distributed ComputingPeople often ask: Is Grid Computing a

fancy new name for the concept of distributed computing?

In general, the answer is “no.” Distributed Computing is most often concerned with distributing the load of a program across two or more processes.

Page 9: Grid Computing

9

PEER2PEER ComputingSharing of computer resources and

services by direct exchange between systems.

Computers can act as clients or servers depending on what role is most efficient for the network.

Page 10: Grid Computing

10

Methods of Grid ComputingDistributed SupercomputingHigh-Throughput ComputingOn-Demand ComputingData-Intensive ComputingCollaborative ComputingLogistical Networking

Page 11: Grid Computing

11

Distributed SupercomputingCombining multiple high-capacity

resources on a computational grid into a single, virtual distributed supercomputer.

Tackle problems that cannot be solved on a single system.

Page 12: Grid Computing

12

High-Throughput ComputingUses the grid to schedule large numbers

of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work.

Page 13: Grid Computing

13

On-Demand ComputingUses grid capabilities to meet short-term

requirements for resources that are not locally accessible.

Models real-time computing demands.

Page 14: Grid Computing

14

Data-Intensive ComputingThe focus is on synthesizing new

information from data that is maintained in geographically distributed repositories, digital libraries, and databases.

Particularly useful for distributed data mining.

Page 15: Grid Computing

15

Collaborative ComputingConcerned primarily with enabling and

enhancing human-to-human interactions.

Applications are often structured in terms of a virtual shared space.

Page 16: Grid Computing

16

Logistical NetworkingGlobal scheduling and optimization of data

movement.Contrasts with traditional networking, which

does not explicitly model storage resources in the network.

Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels.

Page 17: Grid Computing

17

Who Needs Grid Computing?A chemist may utilize hundreds of

processors to screen thousands of compounds per hour.

Teams of engineers worldwide pool resources to analyze terabytes of structural data.

Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands.

Page 18: Grid Computing

18

An Illustrative ExampleTiffany Moisan, a NASA research

scientist, collected microbiological samples in the tidewaters around Wallops Island, Virginia.

She needed the high-performance microscope located at the National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego.

Page 19: Grid Computing

19

Example (continued)She sent the samples to San Diego and

used NPACI’s Telescience Grid and NASA’s Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. Thus, in addition to viewing the samples, she could move the platform holding them and make adjustments to the microscope.

Page 20: Grid Computing

20

Example (continued)The microscope produced a huge

dataset of images.This dataset was stored using a storage

resource broker on NASA’s IPG.Moisan was able to run algorithms on

this very dataset while watching the results in real time.

Page 21: Grid Computing

21

Grid UsersGrid developersTool developersApplication developersEnd UsersSystem Administrators

Page 22: Grid Computing

22

Grid DevelopersVery small group.Implementers of a grid “protocol” who

provides the basic services required to construct a grid.

Page 23: Grid Computing

23

Tool DevelopersImplement the programming models

used by application developers.Implement basic services similar to

conventional computing services:– User authentication/authorization– Process management– Data access and communication

Page 24: Grid Computing

24

Tool DevelopersAlso implement new (grid) services

such as:– Resource locations– Fault detection– Security– Electronic payment

Page 25: Grid Computing

25

Application DevelopersConstruct grid-enabled applications for

end-users who should be able to use these applications without concern for the underlying grid.

Provide programming models that are appropriate for grid environments and services that programmers can rely on when developing (higher-level) applications.

Page 26: Grid Computing

26

System AdministratorsBalance local and global concerns.Manage grid components and

infrastructure.Some tasks still not well delineated due

to the high degree of sharing required.

Page 27: Grid Computing

27

Some Highly-Visible GridsThe NSF PACI/NCSA Alliance Grid.The NSF PACI/SDSC NPACI Grid.The NASA Information Power Grid

(IPG).The Distributed Terascale Facility

(DTF) Project.

Page 28: Grid Computing

28

DTFCurrently being built by NSF’s

Partnerships for Advanced Computational Infrastructure (PACI)

A collaboration: NCSA, SDSC, Argonne, and Caltech will work in conjunction with IBM, Intel, Quest Communications, Myricom, Sun Microsystems, and Oracle.

Page 29: Grid Computing

29

DTF ExpectationsA 40-billion-bits-per-second optical

network (Called TeraGrid) is to link computers, visualization systems, and data at four sites.

Performs 11.6 trillion calculations per second.

Stores more than 450 trillion bytes of data.

Page 30: Grid Computing

30

GRID COMPUTING

BREAK

Page 31: Grid Computing

31

Hour 2: Using the GridGlobus CondorHarnessLegionIBPNetSolveOthers

Page 32: Grid Computing

32

GlobusA collaboration of Argonne National

Laboratory’s Mathematics and Computer Science Division, the University of Southern California’s Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory.

Started in 1996 and is gaining popularity year after year.

Page 33: Grid Computing

33

GlobusA project to develop the underlying

technologies needed for the construction of computational grids.

Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth.

Page 34: Grid Computing

34

The Globus ToolkitThe Globus Resource Allocation

Manager (GRAM) – Creates, monitors, and manages services.– Maps requests to local schedulers and

computers.

The Grid Security Infrastructure (GSI)– Provides authentication services.

Page 35: Grid Computing

35

The Globus ToolkitThe Monitoring and Discovery Service

(MDS)– Provides information about system status,

including server configurations, network status, and locations of replicated datasets, etc.

Nexus and globus_io – provides communication services for

heterogeneous environments.

Page 36: Grid Computing

36

The Globus ToolkitGlobal Access to Secondary Storage

(GASS)– Provides data movement and access

mechanisms that enable remote programs to manipulate local data.

Heartbeat Monitor (HBM) – Used by both system administrators and

ordinary users to detect failure of system components or processes.

Page 37: Grid Computing

37

CondorThe Condor project started in 1988 at

the University of Wisconsin-Madison.The main goal is to develop tools to

support High Throughput Computing on large collections of distributively owned computing resources.

Page 38: Grid Computing

38

CondorRuns on a cluster of workstations to glean

wasted CPU cycles. A “Condor pool” consists of any number of

machines, of possibly different architectures and operating systems, that are connected by a network.

Condor pools can share resources by a feature of Condor called flocking.

Page 39: Grid Computing

39

The Condor Pool SoftwareJob management services:

– Supports requests about the job queue .– Puts a job on hold.– Enables the submission of new jobs.– Provides information about jobs that are

already finished.A machine with job management

installed is called a submit machine.

Page 40: Grid Computing

40

The Condor Pool SoftwareResource management:

– Keeps track of available machines.– Performs resource allocation and

scheduling.Machines with resource management

installed are called execute machines.A machine could be a “submit” and an

“execute” machine simultaneously.

Page 41: Grid Computing

41

Condor-GA version of Condor that uses Globus to

submit jobs to remote resources.Allows users to monitor jobs submitted

through the Globus toolkit.Can be installed on a single machine.

Thus no need to have a Condor pool installed.

Page 42: Grid Computing

42

LegionAn object-based metasystems software

project designed at the University of Virginia to support millions of hosts and trillions of objects linked together with high-speed links.

Allows groups of users to construct shared virtual work spaces, to collaborate research and exchange information.

Page 43: Grid Computing

43

LegionAn open system designed to

encourage third party development of new or updated applications, run-time library implementations, and core components.

The key feature of Legion is its object-oriented approach.

Page 44: Grid Computing

44

HarnessA Heterogeneous Adaptable

Reconfigurable Networked SystemA collaboration between Oak Ridge

National Lab, the University of Tennessee, and Emory University.

Conceived as a natural successor of the PVM project.

Page 45: Grid Computing

45

Harness

An experimental system based on a highly customizable, distributed virtual machine (DVM) that can run on anything from a Supercomputer to a PDA.

Built on three key areas of research: Parallel Plug-in Interface, Distributed Peer-to-Peer Control, and Multiple DVM Collaboration.

Page 46: Grid Computing

46

IBPThe Internet Backplane Protocol (IBP)

is a middleware for managing and using remote storage.

It was devised at the University of Tennessee to support Logistical Networking in large scale, distributed systems and applications.

Page 47: Grid Computing

47

IBPNamed because it was designed to

enable applications to treat the Internet as if it were a processor backplane.

On a processor backplane, the user has access to memory and peripherals, and can direct communication between them with DMA.

Page 48: Grid Computing

48

IBP

IBP gives the user access to remote storage and standard Internet resources (e.g. content servers implemented with standard sockets) and can direct communication between them with the IBP API.

Page 49: Grid Computing

49

IBPBy providing a uniform, application-

independent interface to storage in the network, IBP makes it possible for applications of all kinds to use logistical networking to exploit data locality and more effectively manage buffer resources.

Page 50: Grid Computing

50

NetSolveA client-server-agent model.Designed for solving complex scientific

problems in a loosely-coupled heterogeneous environment.

Page 51: Grid Computing

51

The NetSolve Agent

A “resource broker” that represents the gateway to the NetSolve system

Maintains an index of the available computational resources and their characteristics, in addition to usage statistics.

Page 52: Grid Computing

52

The NetSolve Agent

Accepts requests for computational services from the client API and dispatches them to the best-suited sever.

Runs on Linux and UNIX.

Page 53: Grid Computing

53

The NetSolve ClientProvides access to remote resources

through simple and intuitive APIs.Runs on a user’s local system.Contacts the NetSolve system through

the agent, which in turn returns the server that can best service the request.

Runs on Linux, UNIX, and Windows.

Page 54: Grid Computing

54

The NetSolve ServerThe computational backbone of the

system.A daemon process that awaits client

requests.Runs on different platforms: a single

workstation, cluster of workstations, symmetric multiprocessors (SMPs), or massively parallel processors (MPPs).

Page 55: Grid Computing

55

The NetSolve ServerA key component of the server is the

Problem Description File (PDF). With the PDF, routines local to a given

server are made available to clients throughout the NetSolve system.

Page 56: Grid Computing

56

The PDF Template

PROBLEM Program Name …

LIB Supporting Library Information …

INPUT specifications …

OUTPUT specifications …

CODE

Page 57: Grid Computing

57

Network Weather ServiceSupports grid technologies.Uses sensor processes to monitor cpu

loads and network traffic. Uses statistical models on the collected

data to generate a forecast of future behavior.

NetSolve is currently integrating NWS into its agent.

Page 58: Grid Computing

58

Gridware CollaboarationsNetSolve is using Globus' "Heartbeat

Monitor" to detect failed servers. A NetSolve client is now in testing that

allows access to Globus.Legion has adopted NetSolve’s client-user

interface to leverage its metacomputing resources.

The NetSolve client uses Legion’s data-flow graphs to keep track of data dependencies.

Page 59: Grid Computing

59

Gridware Collaboarations

NetSolve can access Condor pools among its computational resources.

IBP-enabled clients and servers allow NetSolve to allocate and schedule storage resources as part of its resource brokering. This improves fault tolerance.

Page 60: Grid Computing

60

GRID COMPUTING

BREAK

Page 61: Grid Computing

61

Hour 3: Ongoing Research

Motivation.Special Projects.

– Ongoing work at Tennessee

General Issues.– Open questions of interest to the entire

research community

Page 62: Grid Computing

62

MotivationComputer speed

doubles every 18 months

Network speed doubles every 9 months

Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins

Page 63: Grid Computing

63

Special ProjectsThe SInRG Project.

– Grid Service Clusters (GSCs)– Data Switches

Incorporating Hardware Acceleration.Unbridled Parallelism

– SETI@home and Folding@home– The Vertex Cover Solver

Security.

Page 64: Grid Computing

64

The SInRG Project

Page 65: Grid Computing

65

The Grid Service ClusterThe basic grid building block.Each GSC will use the same software

infrastructure as is now being deployed on the national Grid, but tuned to take advantage of the highly structured and controlled design of the cluster.

Some GSCs are general-purpose and some are special-purpose.

Page 66: Grid Computing

66

The Grid Service Cluster

Page 67: Grid Computing

67

An advanced data switchThe components that make up a

GSC must be able to access each other at very high speeds and with guaranteed Quality of Service (QoS).

Links of at least1Gbps assure QoS in many circumstances simply by over provisioning.

Page 68: Grid Computing

68

Computational Ecology GSC Collaboration between computer

science and mathematical ecology. 8-processor Symmetric Multi-

Processor (SMP). Initial in-core memory (RAM) is

approximately 4 gigabytes. Out-of-core data storage unit provides

a minimum of 450 gigabytes.

Page 69: Grid Computing

69

Medical Imaging GSC Collaboration between computer

science and the medical school. High-end graphics workstations. Distinguished by the need to have

these workstations attached as directly as possible to the switch to facilitate interactive manipulation of the reconstructed images.

Page 70: Grid Computing

70

Molecular Design GSC Collaboration between computer

science and chemical engineering.Data visualization laboratory32 dual processorsHigh performance switch

Page 71: Grid Computing

71

Machine Design GSC

Collaboration between computer science and electrical engineering.

12 Unix-based CAD workstations.8 Linux boxes with Pilchard boards.Investigating the potential of

reconfigurable computing in grid environments.

Page 72: Grid Computing

72

Machine Design GSC

Page 73: Grid Computing

73

Types of HardwareGeneral purpose hardware – can

implement any functionASICs – hardware that can implement

only a specific applicationFPGAs – reconfigurable hardware that

can implement any function

Page 74: Grid Computing

74

The FPGAFPGAs offer reprogrammabilityAllows optimal logic design of each

function to be implementedHardware implementations offer

acceleration over software implementations which are run on general purpose processors

Page 75: Grid Computing

75

The Pilchard EnvironmentDeveloped at Chinese University in

Hong Kong.Plugs into 133MHz RAM DIMM slot

and is an example of “programmable active memory.”

Pilchard is accessed through memory read/write operations.

Higher bandwidth and lower latency than other environments.

Page 76: Grid Computing

76

ObjectivesEvaluate utility of NetSolve gridware.Determine effectiveness of hardware

acceleration in this environment.Provide an interface for the remote use

of FPGAs.Allow users to experiment and gauge

whether a given problem would benefit from hardware acceleration.

Page 77: Grid Computing

77

Sample ImplementationsFast Fourier Transform (FFT)Data Encryption Standard algorithm

(DES)Image backprojection algorithmA variety of combinatorial algorithms

Page 78: Grid Computing

78

Implementation TechniquesTwo types of functions are implemented Software version - runs on the PC’s

processorHardware version - runs in the FPGATo implement the hardware version of

the function, VHDL code is needed

Page 79: Grid Computing

79

The Hardware FunctionImplemented in VHDL or some other

hardware description language.The VHDL code is then mapped onto

the FPGA (synthesis). CAD tools help make mapping

decisions based on constraints such as: chip area, I/O pin counts, routing resources and topologies, partitioning, resource usage minimization.

Page 80: Grid Computing

80

The Hardware FunctionResult of synthesis is a configuration

file (bit stream).This file defines how the FPGA is to be

reprogrammed in order to implement the new desired functionality.

To run, a copy of the configuration file must be loaded on the FPGA.

Page 81: Grid Computing

81

Behind the Scenes

VHDLprogrammer

Serveradministrator

SoftwareprogrammerSynthesis

Software and Hardware functions

PDFs,Libraries

Client NetSolveserver

Request

Requestresults

VHDLcode

Configurationfile

Page 82: Grid Computing

82

ConclusionsHardware acceleration is offered to both

local and remote users.Resources are available through an

efficient and easy-to-use interface.A development environment is provided

for devising and testing a wide variety of software, hardware and hybrid solutions.

Page 83: Grid Computing

83

Unbridled ParallelismSometimes the overhead of gridware is

unneeded. Well known examples include SETI@home and Folding@home.

We’re currently building a Vertex Cover solver with multiple levels of acceleration.

Page 84: Grid Computing

84

A Naked SSH ApproachA bit of blasphemy: the anti-gridware

paradigm Our work begs several questions.

– When does it make sense?

– How much efficiency are we gaining?

– What are its limitations?

Page 85: Grid Computing

85

Grid SecurityAlgorithm complexity theory

– Verifiability– Concealment

Cryptography and checkpointing– Corroboration – Scalability

Voting and spot-checking– Fault tolerance– Reliability

Page 86: Grid Computing

86

Some General IssuesGrid architecture.Resource management.QoS mechanisms.Performance monitoring.Fault tolerance.

Page 87: Grid Computing

87

ReferencesURL to these slides:

http://www.cs.utk.edu/~abukhzam/grid-tutorial.htm

Condor:http://www.cs.wisc.edu/condor

Globus:http://www.globus.org

Page 88: Grid Computing

88

ReferencesNetSolve:

http://icl.cs.utk.edu/netsolve

Harness:

NWS:

http://nws.cs.ucsb.edu/SInRG:

http://icl.cs.utk.edu/sinrg

Page 89: Grid Computing

89

GRID COMPUTING

END