grid computing

4
Grid Computing Krunal Siddhapathak # Electronics & Communication, Nirma University, Ahmedabad, India [email protected] AbstractThe Grid computing is a service for sharing computer power and data storage capacity over the Internet. How is grid computing different from the World Wide Web? Simple. Grid computing uses the Internet to help us share computer power, while the Web uses the Internet to help us share information. Grid computing is making big contributions to scientific research, helping scientists around the world to analyze and store massive amounts of data. Keywords – Networking, Memory, Storage I. INTRODUCTION Grid computing is a service for sharing computer power and data storage capacity over the internet. How is grid computing different from the World Wide Web? Simple. Grid computing uses the Internet to help us share computer power, while the Web uses the Internet to help us share information. Grid computing is making big contributions to scientific research, helping scientists around the world to analyze and store massive amounts of data. Grid computing systems work on the principle of pooled resources. Let's say you and a couple of friends decide to go on a camping trip. You own a large tent, so you've volunteered to share it with the others. One of your friends offers to bring food and another says he'll drive the whole group up in his SUV. Once on the trip, the three of you share your knowledge and skills to make the trip fun and comfortable. If you had made the trip on your own, you would need more time to assemble the resources you'd need and you probably would have had to work a lot harder on the trip itself. A grid computing system uses that same concept: share the load across multiple computers to complete tasks more efficiently and quickly. Before going too much further, let's take a quick look at a computer's resources: Central processing unit (CPU): A CPU is a microprocessor that performs mathematical operations and directs data to different memory locations. Computers can have more than one CPU. Memory: In general, a computer's memory is a kind of temporary electronic storage. Memory keeps relevant data close at hand for the microprocessor. Without memory, the microprocessor would have to search and retrieve data from a more permanent storage device such as a hard disk drive. Storage: In grid computing terms, storage refers to permanent data storage devices like hard disk drives or databases. Normally, a computer can only operate within the limitations of its own resources. There's an upper limit to how fast it can complete an operation or how much information it can store. Most computers are upgradeable, which means it's possible to add more power or capacity to a single computer, but that's still just an incremental increase in performance. Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it's as if the user's computer has transformed into a supercomputer.

Upload: krunal-siddhapathak

Post on 03-Jul-2015

159 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Grid computing

Grid ComputingKrunal Siddhapathak

#Electronics & Communication, Nirma University,Ahmedabad, India

[email protected]

Abstract— The Grid computing is a service forsharing computer power and data storage capacity over theInternet. How is grid computing different from the World WideWeb? Simple. Grid computing uses the Internet to help us sharecomputer power, while the Web uses the Internet to help usshare information. Grid computing is making big contributionsto scientific research, helping scientists around the world toanalyze and store massive amounts of data.Keywords – Networking, Memory, Storage

I. INTRODUCTION

Grid computing is a service for sharing computer powerand data storage capacity over the internet. How is gridcomputing different from the World Wide Web? Simple.Grid computing uses the Internet to help us share computerpower, while the Web uses the Internet to help us shareinformation.Grid computing is making big contributions to scientificresearch, helping scientists around the world to analyze andstore massive amounts of data.

Grid computing systems work on the principle of pooledresources. Let's say you and a couple of friends decide to goon a camping trip. You own a large tent, so you'vevolunteered to share it with the others. One of your friendsoffers to bring food and another says he'll drive the wholegroup up in his SUV. Once on the trip, the three of you shareyour knowledge and skills to make the trip fun andcomfortable. If you had made the trip on your own, you wouldneed more time to assemble the resources you'd need and youprobably would have had to work a lot harder on the tripitself.

A grid computing system uses that same concept: share theload across multiple computers to complete tasks moreefficiently and quickly. Before going too much further, let'stake a quick look at a computer's resources:

Central processing unit (CPU): A CPU isa microprocessor that performs mathematical operations and

directs data to different memory locations. Computers canhave more than one CPU.

Memory: In general, a computer's memory is a kindof temporary electronic storage. Memory keeps relevant dataclose at hand for the microprocessor. Without memory, themicroprocessor would have to search and retrieve data from amore permanent storage device such as a hard disk drive.

Storage: In grid computing terms, storage refers topermanent data storage devices like hard disk drives ordatabases.

Normally, a computer can only operate within the limitationsof its own resources. There's an upper limit to how fast it cancomplete an operation or how much information it can store.Most computers are upgradeable, which means it's possible toadd more power or capacity to a single computer, but that'sstill just an incremental increase in performance.

Grid computing systems link computer resources together in away that lets someone use one computer to access andleverage the collected power of all the computers in thesystem. To the individual user, it's as if the user's computerhas transformed into a supercomputer.

Page 2: Grid computing

II. GRID COMPUTING LEXICON

Cluster: A group of networked computers sharing the sameset of resources.

Extensible Mark-up Language (XML): A computerlanguage that describes other data and is readable bycomputers. Control nodes (a node is any device connected toa network that can transmit, receive and reroute data) rely onXML languages like the Web Services DescriptionLanguage (WSDL). The information in these languages tellsthe control node how to handle data and applications.

Hubs: A point within a network where various devicesconnect with one another.

Integrated Development Environment (IDE): The tools andfacilities computer programmers need to create applicationsfor a platform. The term for an application testing groundis sandbox.

Interoperability: The ability for software to operate withincompletely different environments. For example, a computernetwork might include both PCs and Macintosh computers.Without interoperable software, these computers wouldn't beable to work together because of their different operatingsystemsand architecture.

Open standards: A technique of creating publically availablestandards. Unlike proprietary standards, which can belongexclusively to a single entity, anyone can adopt and use anopen standard. Applications based on the same open standardsare easier to integrate than ones built on different proprietarystandards.

Parallel processing: Using multiple CPUs to solve a singlecomputational problem. This is closely related to sharedcomputing, which leverages untapped resources on a networkto achieve a task.

Platform: The foundation upon which developers can createapplications. A platform can be an operating system, acomputer's architecture, a computer language or evena Web site.

Server farm: A cluster of servers used to perform tasks toocomplex for a single server.

Server virtualization: A technique in which a softwareapplication divides a single physical server into multipleexclusive server platforms (the virtual servers). Each virtualserver can run its own operating system independently of theother virtual servers. The operating systems don't have to bethe same system -- in other words, a single machine couldhave a virtual server acting as a Linux server and another onerunning a Windows platform. It works because most of thetime, servers aren't running anywhere close to full capacity.Grid computing systems need lots of servers to handle varioustasks and virtual servers help cut down on hardware costs.

Service: In grid computing, a service is any software systemthat allows computers to interact with one another over anetwork.

Simple Object Access Protocol (SOAP): A set of rules forexchanging messages written in XML across anetwork. Microsoft is responsible for developing the protocol.

State: In the IT world, a state is any kind of persistent data.It's information that continues to exist in some form even afterbeing used in an application. For example, when you selectbooks to go into anAmazon.com shopping cart, theinformation is stateful -- Amazon keeps track of yourselection as you browse other areas of the Web site. Statefulservices make it possible to create applications that havemultiple steps but rely on the same core data.

Page 3: Grid computing

Transience: The ability to activate or deactivate a serviceacross a network without affecting other operations.

III. SHARING RESOURCES

Several companies and organizations are working together tocreate a standardized set of rules called protocols to make iteasier to set up grid computing environments. It's possible tocreate a grid computing system right now and several alreadyexist. But what's missing is an agreed-upon approach. Thatmeans that two different grid computing systems may not becompatible with one another, because each is working with aunique set of protocols and tools.

In general, a grid computing system requires:

At least one computer, usually a server, whichhandles all the administrative duties for the system. Manypeople refer to this kind of computer as a control node. Otherapplication and Web servers (both physical and virtual)provide specific services to the system.

A network of computers running special gridcomputing network software. These computers act both as apoint of interface for the user and as the resources the systemwill tap into for different applications. Grid computingsystems can either include several computers of the samemake running on the same operating system (called ahomogeneous system) or a hodgepodge of different computersrunning on every operating system imaginable (aheterogeneous system). The network can be anything from ahardwired system where every computer connects to thesystem with physical wires to an open system wherecomputers connect with each other over the Internet.

A collection of computer software calledmiddleware. The purpose of middleware is to allow differentcomputers to run a process or application across the entirenetwork of machines. Middleware is the workhorse of the gridcomputing system. Without it, communication across thesystem would be impossible. Like software in general, there'sno single format for middleware.

If middleware is the workhorse of the grid computing system,the control node is the dispatcher. The control node mustprioritize and schedule tasks across the network. It's the

control node's job to determine what resources each task willbe able to access. The control node must also monitor thesystem to make sure that it doesn't become overloaded. It'salso important that each user connected to the network doesn'texperience a drop in his or her computer's performance. Agrid computing system should tap into unusedcomputer without impacting everything else.

The potential for grid computing applications is limitless,providing everyone agrees on standardized protocols andtools. That's because without a standard format, third-partydevelopers -- independent programmers who want to createapplications on the grid computing platform -- often lack theability to create applications that work on different systems.While it's possible to make different versions of the sameapplication for different systems, it's time consuming andmany developers don't want to do the same work twice. Astandardized set of protocols means that developers couldconcentrate on one format while creating applications.

IV. GRID COMPUTING APPLICATIONS

There are several grid computing systems, though most ofthem only fit part of the definition of a true grid computingsystem. Academic and research organization projects accountfor many of the systems currently in operation. These systemstake advantage of unused computer processing power. Themost accurate term for such a network is a shared computingsystem.

The Search for Extra-terrestrial Intelligence (SETI) project isone of the earliest grid computing systems to gain popularattention. The mission of the SETI project is to analyze datagathered by radio telescopes in search of evidence forintelligent alien communications. There's far too muchinformation for a single computer to analyze effectively. TheSETI project created a program called SETI home, whichnetworks computers together to form a virtual supercomputerinstead.

A similar program is the Folding home project administeredby the Pande Group, a non-profit institution in StanfordUniversity's chemistry department. The Pande Group isstudying proteins. The research includes the way proteins takecertain shapes, called folds, and how that relates to what

Page 4: Grid computing

proteins do. Scientists believe that protein "misfolding" couldbe the cause of diseases like Parkinson's or Alzheimer's. It'spossible that by studying proteins, the Pande Group maydiscover new ways to treat or even cure these diseases.

There are dozens of similar active grid computing projects.Many of these projects aren't persistent, which means thatonce the respective project's goals are met, the system willdissolve. In some cases, a new, related project could take theplace of the completed one.

While each of these projects has its own unique features, ingeneral, the process of participation is the same. A userinterested in participating downloads an application from therespective project's Web site. After installation, theapplication contacts the respective project's control node. Thecontrol node sends a chunk of data to the user's computer foranalysis. The software analyzes the data, powered byuntapped CPU resources. The project's software has a verylow resource priority -- if the user needs to activate a programthat requires a lot of processing power, the project softwareshuts down temporarily. Once CPU usage returns to normal,the software begins analyzing data again.

Eventually, the user's computer will complete the requesteddata analysis. At that time, the project software sends the databack to the control node, which relays it to the properdatabase. Then the control node sends a new chunk of data tothe user's computer, and the cycle repeats itself. If the projectattracts enough users, it can complete ambitious goals in arelatively short time span.

V. CONCLUSION

Grid computing is the collection of computer resources frommultiple locations to reach a common goal. The grid can bethought of as a distributed system with non-interactiveworkloads that involve a large number of files. Whatdistinguishes grid computing from conventional highperformance computing systems such as cluster computing isthat grids tend to be more loosely coupled, heterogeneous, andgeographically dispersed. Although a single grid can bededicated to a particular application, commonly a grid is usedfor a variety of purposes. Grids are often constructed withgeneral-purpose grid middleware software libraries.

VI. REFERENCES[1] http://www.gridcafe.org/EN/grid-in-30-sec.html

[2] http://www.gridcafe.org/EN/

[3] http://computer.howstuffworks.com/grid-computing.htm