mygrid: a user-centric approach for grid computing

22
MyGrid: A User-Centric Approach for Grid Computing Walfredo Cirne Universidade Federal da Paraíba

Upload: rhoda-villarreal

Post on 03-Jan-2016

26 views

Category:

Documents


4 download

DESCRIPTION

MyGrid: A User-Centric Approach for Grid Computing. Walfredo Cirne Universidade Federal da Paraíba. High-Performance Computing. High-Performance Computing means running faster than the typical machine du jour - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MyGrid: A User-Centric Approach for Grid Computing

MyGrid: A User-Centric Approach for Grid Computing

Walfredo Cirne Universidade Federal da Paraíba

Page 2: MyGrid: A User-Centric Approach for Grid Computing

High-Performance Computing

• High-Performance Computing means running faster than the typical machine du jour

• Unbeatable price/performance of microprocessors has killed specialized high-performance machines

• Therefore, paralelism currently is the way to do High-Performance Computing– Parallel supercomputers

Page 3: MyGrid: A User-Centric Approach for Grid Computing

Solving a Real Problem

• I had hundreds of thousands of independent simulations to run

• Parallel supercomputers are typically– hard to get acess to – slow (too much time in the queue)

• Since my simulations were independent, I had the perfect application for the Computational Grid

Page 4: MyGrid: A User-Centric Approach for Grid Computing

Grid Computing

• Grid Computing aims to enable the execution of parallel applications over processors that are:– Geographically distributed– Under multiple administrative domains– Not dedicated

• The potential for resource gathering is enormous– “Let´s run over the Internet”

Page 5: MyGrid: A User-Centric Approach for Grid Computing

Grid Applications

• Not all applications can benefit from the Grid

• Loosely coupled applications match the Grid characteristics much better than tightly coupled applications

Page 6: MyGrid: A User-Centric Approach for Grid Computing

State of Art in Grid Computing

• Most services are provided by the Grid Infrastructure– Naming, remote execution/task control, security,

etc

• Scheduling is done at the application level

• Globus

• “Virtual Organizations”

Page 7: MyGrid: A User-Centric Approach for Grid Computing

Back to the Real Problem

• I had hundreds of thousands of independent simulations to run

• I was working in a top research lab in Grid Computing

• I could not manage to use the Grid

• It is hard to get the Grid Infrastructure Software installed everywhere

Page 8: MyGrid: A User-Centric Approach for Grid Computing

The Motivation for MyGrid

• Users of loosely coupled applications could benefit from the Grid now

• However, they don´t run on the Grid today because the Grid Infrastructure is not widely deployed

• What if we build a solution at the user level? That is, a solution that does not depend upon installed infrastructure?

Page 9: MyGrid: A User-Centric Approach for Grid Computing

MyGrid

• MyGrid is a framework to build infrastructure-independent grid applications

• The user provides:– A description of her Grid– A way to do remote execution and file transfer– “The application”

• MyGrid provides:– Grid abstractions– Scheduling

Page 10: MyGrid: A User-Centric Approach for Grid Computing

MyGrid Goals

• open = do not require a particular infrastructure

• self-installable = do not require manual installation on a given machine

• extensible = simple to add refinements

• complete = cover the whole production cycle

Page 11: MyGrid: A User-Centric Approach for Grid Computing

MyGrid Concepts

• Job = set of independent tasks– Tasks have three pieces: init, remote and final

• Home machine Grid machine

• Grid abstractions– remote execution– file transfer– playpen– mirroring

Page 12: MyGrid: A User-Centric Approach for Grid Computing

Defining My Personal Gridbagre.dsc.ufpb.brdsc, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir

traira.dsc.ufpb.brdsc, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir

quidam.ucsd.educse, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir

Page 13: MyGrid: A User-Centric Approach for Grid Computing

Fatoring with MyGrid

• Fatora n gerates tasks, init, remotei, and collect• User runs mygrid.ui.AddTask < tasks• tasks

task:init= initremote= remote1final= collectprocessor= linuxplaypensize= 0cost = 1task:init= initremote= remote2…

Page 14: MyGrid: A User-Centric Approach for Grid Computing

Fatoring with MyGrid

• initjava mygrid.ui.MyGridUI p $PROC ./Fat.class $PLAYPEN

• remote1java Fat 3 18655 34789789798 output-$TASK

• remote2java Fat 18655 37307 34789789798 output-$TASK

• collectjava mygrid.ui.MyGridUI g $PROC "" $PLAYPEN saida-

$TASK .

Page 15: MyGrid: A User-Centric Approach for Grid Computing

Running an MyGrid Task

(3c)(3b)

task-done (4)remote exec (3)

playpen, file xfer, and remote exec (3a)

(2)

add-task (1)

Home Machine

Grid Machine

Task Manager

User Agent Server

home stasks

User Agent

Daemon

grid stask

Page 16: MyGrid: A User-Centric Approach for Grid Computing

User Agent

• User Agent provides the grid abstractions

• User Agent Daemon runs on grid machines

• User Agent Server runs on home machines

• The Daemon and the Server rely upon public-key cryptography to authenticate each other

Page 17: MyGrid: A User-Centric Approach for Grid Computing

Self Instalation

• We are working on having MyGrid install and start-up User Agents everywere

• The user provides a way to do remote execution and file transfer to make that possible

Page 18: MyGrid: A User-Centric Approach for Grid Computing

Scheduling in MyGrid

• Grid scheduling is application dependent and effort intensive

• Most people don´t want to spend months to write good schedulers for their applications

• MyGrid provides a sensible default scheduler – The user can of course replace the default

scheduler

Page 19: MyGrid: A User-Centric Approach for Grid Computing

Default Scheduler

• How to provide good performance with no knowledge about the application or the current state of the Grid– The key is to avoid having the job waiting for a

task that runs in a slow/loaded machine

• Task replication is our answer for this problem– Task replication is only done when the jobs has

no other tasks

Page 20: MyGrid: A User-Centric Approach for Grid Computing

Preliminary Results

• During a 40-day period, we ran 600,000 simulations using 178 processors located in 6 different administrative domains widely spread in the USA

• MyGrid took 16.7 days to run the simulations

• My desktop machine would have taken 5.3 years to do so

• Speed-up is 115.8 for 178 processors

Page 21: MyGrid: A User-Centric Approach for Grid Computing

Conclusions

• Running Grid Applications at the user-level is a viable strategy

• Bag-of-tasks parallel applications can currently benefit from the Grid

• Is “upperware” the way to go for new middleware development?

Page 22: MyGrid: A User-Centric Approach for Grid Computing

Future Work

• Turn MyGrid into a production-quality software

• Investigate the impact of task replication in resource consumption

• Develop a default scheduler for data intensive applications– Such a scheduler should try to minimize data

movement