grid computing or how to make several ordinary computers stronger than a huge and giant...

37
Grid Computing Or how to make several ordinary computers stronger than a huge and giant supercomputer…

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Grid Computing

Or how to make several ordinary computers stronger than a huge

and giant supercomputer…

First Part

Introduction to the Grid…

I)

The supercomputers: several units in the same box

I) Supercomputers, several computers in the same one. A supercomputer is a computer that leads the world in terms of

processing capacity, particularly speed of calculation. It works quite as a classic computer, but the hardware is really more powerful…

Supercomputers are used for highly calculation-intensive tasks such as problems involving quantum mechanical physics, weather forecasting, climate research (including research into global warming), molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), physical simulations (such as simulation of airplanes in wind tunnels, simulation of the detonation of nuclear weapons, and research into nuclear fusion), cryptanalysis, and the like. Major universities, military agencies and scientific research laboratories are heavy users.

The Columbia Supercomputer at NASA's Advanced Supercomputing

Facility at Ames Research Center

II) What is the Grid?& How does it work?

II) What is the « Grid Computing » ?!?

How to get an incredible power of calculation, without buying a great super computer?

To work in sciences, some genius from a Californian university had a idea: make several little computers work together. The first « Grid » was born!

But why should they use their own computers, when they can use the privates pcs?

What is the «Grid Computing» ?!?

A grid of calculation exploits the computing power (processors, memories,…) of thousands of computers in order to give the illusion of a very powerful virtual computer. This model makes it able to solve important problems of calculation requiring of the very long execution times in “traditional” environment.

Grid Computing & Distributed Computing

Distributed computing differs from the farms of calculation (cluster computing), in what the computers which carry out calculation are not typically dedicated to distributed calculation, whereas the farms (clusters) include/understand most of the time material specific and dedicated to this task.

In a company for example, it will be able to use the stations of the employees to carry out calculations: these computers are quite not solicited in the night, they will help the already existing clusters. Calculation can also be carried out during the day, the program being launched with a low priority. It will always let the other software use the CPU. The stations of the employees won’t thus be slowed down. Distributed calculation is also very interesting in term of cost, using calculative resources which would have been lost.

Grid computing: how it works?

Nowadays, there is a lot of reason of using such a technologic solution: medical research, extra-terrestrial contact research, meteorological research, physics research, mathematics, etc…

So, it’s first purpose is research . The domain that needs the most powerful computers in order to calculate.

Grid computing: how it works?

The aim: dispatch the tasks between all the computers available. Some users says “ok, you can use a part of my computer”, so when this computer is online, it get some homeworks to do, and calculate some little things for the main server.

The main server finally uses those little jobs, to find the answer to a really complex question…

Grid computing: how it works?

For example: how to calculate (2*4)+(3*6) with several computers?

The first computer calculates 2*4 and return the answer to the main server.

The second one calculates 3*6 and return his own answer.

There, the main server can calculate (2*4)+(3*6). If it’s too hard also, it can always call another computer to calculate it

Grid computing: how it works?

The secret is the work share. But why should we sacrifice our computers to work for the others?

The nowadays projects uses our computer’s when we don’t care, for example, when our computer is in sleeping mode.

Grid computing: how it works? Technical explanations Conceptual framework:

Grid computing reflects a conceptual framework rather than a physical resource. The Grid approach is utilized to provision a computational task with administratively-distant resources. The focus of Grid technology is associated with the issues and requirements of flexible computational provisioning beyond the local (home) administrative domain.

Grid computing: how it works? Technical explanations Virtual organization:

A Grid environment is created to address resource needs. The use of that resource(s) (eg. CPU cycles, disk storage, data, software programs, peripherals) is usually characterized by its availability outside of the context of the local administrative domain. This 'external provisioning' approach entails creating a new administrative domain referred to as a Virtual Organization (VO) with a distinct and separate set of administrative policies (home administration policies plus external resource administrative policies equals the VO (aka your Grid) administrative policies). The context for a Grid 'job execution' is distinguished by the requirements created when operating outside of the home administrative context. Grid technology (aka. middleware) is employed to facilitate formalizing and complying with the Grid context associated with your application execution.

Grid computing: how it works? Technical explanations

Grid computing: how it works? Technical explanations Resources:

One characteristic that currently distinguishes Grid computing from distributed computing is the abstraction of a 'distributed resource' into a Grid resource. One result of abstraction is that it allows resource substitution to be more easily accomplished. Some of the overhead associated with this flexibility is reflected in the middleware layer and the temporal latency associated with the access of a Grid (or any distributed) resource. This overhead, especially the temporal latency, must be evaluated in terms of the impact on computational performance when a Grid resource is employed.

Second Part

Some famous Grids…

Pionners: BOINC, from BerkeleyPionners: BOINC, from Berkeley

The The Berkeley Open Infrastructure for Network Berkeley Open Infrastructure for Network ComputingComputing (BOINC) is a distributed computing (BOINC) is a distributed computing infrastructure, originally developed out of the SETI@home infrastructure, originally developed out of the SETI@home project, but intended to be useful to fields beyond SETI. project, but intended to be useful to fields beyond SETI.

This software platform is open in that it is free and open This software platform is open in that it is free and open source software released under the GNU Lesser General source software released under the GNU Lesser General Public License. Currently BOINC is being developed by a Public License. Currently BOINC is being developed by a team based at the University of California, Berkeley led by team based at the University of California, Berkeley led by David Anderson, the project director of SETI@home — a David Anderson, the project director of SETI@home — a project which uses this software. project which uses this software.

As a "quasi-supercomputing" platform BOINC has over As a "quasi-supercomputing" platform BOINC has over 475,000 active computers (hosts) worldwide processing on 475,000 active computers (hosts) worldwide processing on average 615 TFLOPS as of September 9, 2006.average 615 TFLOPS as of September 9, 2006.

SETI@home: alien contact SETI@home: alien contact researchresearch

SETI@homeSETI@home is a is a distributeddistributed computingcomputing project using project using InternetInternet-connected -connected computers, hosted by the computers, hosted by the SpaceSpace Sciences Sciences LaboratoryLaboratory, at the , at the UniversityUniversity ofof CaliforniaCalifornia, Berkeley, Berkeley, in the , in the UnitedUnited States States. . SETISETI is an is an acronym for the acronym for the SearchSearch for for Extra-TerrestrialExtra-Terrestrial Intelligence Intelligence. SETI@home was released to . SETI@home was released to the public on May 17, 1999. the public on May 17, 1999.

SETI @ HomeSETI @ Home

SETI @ Home : how does it work?SETI @ Home : how does it work? Anybody can participate in SETI@home by running a free program that Anybody can participate in SETI@home by running a free program that

downloads and analyzes radio telescope data.downloads and analyzes radio telescope data.

Observational Data is recorded on 35 Gigabyte tapes at the Arecibo Observational Data is recorded on 35 Gigabyte tapes at the Arecibo Observatory in Puerto Rico, each holding 15.5 hours of observations, which Observatory in Puerto Rico, each holding 15.5 hours of observations, which are then mailed to Berkeley (Korpela et al. 2001). Once there, it is divided are then mailed to Berkeley (Korpela et al. 2001). Once there, it is divided in both time and frequency domains work units of 107 seconds of data in both time and frequency domains work units of 107 seconds of data (SETI@home 2001), or approximately 0.35 MB, which overlap in time but (SETI@home 2001), or approximately 0.35 MB, which overlap in time but not in frequency (Korpela et al. 2001). These work units then get sent from not in frequency (Korpela et al. 2001). These work units then get sent from the SETI@home server over the internet to people around the world to the SETI@home server over the internet to people around the world to analyze. Arecibo does not have a high bandwidth internet connection, so analyze. Arecibo does not have a high bandwidth internet connection, so data must go by postal mail to Berkeley at first.data must go by postal mail to Berkeley at first.

The analysis software can search for signals with about one-tenth the The analysis software can search for signals with about one-tenth the strength of those sought in previous surveys, because it makes use of a strength of those sought in previous surveys, because it makes use of a computationally intensive algorithm called coherent integration that no one computationally intensive algorithm called coherent integration that no one else has had the computing power to implement.else has had the computing power to implement.

Data is merged into a database using SETI@home computers in Berkeley. Data is merged into a database using SETI@home computers in Berkeley. Interference is rejected, and various pattern-detection algorithms are Interference is rejected, and various pattern-detection algorithms are applied to search for the most interesting signals.applied to search for the most interesting signals.

SETI @ Home : StatisticsSETI @ Home : Statistics

Statistics:Statistics:

With over With over 5.2 million5.2 million participants worldwide, the project participants worldwide, the project is the distributed computing project with the most is the distributed computing project with the most participants to date. Since its launch on participants to date. Since its launch on May 17May 17, , 19991999, the , the project has logged over two million years of aggregate project has logged over two million years of aggregate computing time. On computing time. On SeptemberSeptember 26 26, , 20012001, SETI@home had , SETI@home had performed a total of 1021 performed a total of 1021 floatingfloating point point operations. It is operations. It is acknowledged by the acknowledged by the Guinness World RecordsGuinness World Records as the as the largest computation in history (Newport 2005). With over largest computation in history (Newport 2005). With over 1.2 million computers in the system, as of November 16, 1.2 million computers in the system, as of November 16, 2006, SETI@home has the ability to compute over 238 2006, SETI@home has the ability to compute over 238 TeraFLOPS. For comparison, Blue Gene (currently the TeraFLOPS. For comparison, Blue Gene (currently the world's fastest supercomputer) computes 280 TFLOPS.world's fastest supercomputer) computes 280 TFLOPS.

Physics & Astronomy: Physics & Astronomy: Einstein@HomeEinstein@Home

Einstein@HomeEinstein@Home is a is a distributed computing distributed computing project running on the project running on the Berkeley Open Berkeley Open Infrastructure for Infrastructure for Network Computing Network Computing (BOINC) software (BOINC) software platform. It searches platform. It searches for unknown pulsars for unknown pulsars and gravitational and gravitational waves. waves.

Medicinal researchMedicinal research

Help the medicine, calculating molecular Help the medicine, calculating molecular reaction to drugs…reaction to drugs…

Rosetta@homeRosetta@home Rosetta@home is a distributed Rosetta@home is a distributed

computing project, run by the computing project, run by the Baker Laboratory at the Baker Laboratory at the University of Washington, University of Washington, aiming to solve the protein aiming to solve the protein structure prediction problem.structure prediction problem.

Rosetta's goal is to develop Rosetta's goal is to develop computational methods that computational methods that accurately predict and design accurately predict and design protein structure and protein protein structure and protein complexes. This computational complexes. This computational endeavor may ultimately help endeavor may ultimately help researchers develop cures for researchers develop cures for human diseases such as human diseases such as HIV/AIDS, cancer, Alzheimer's HIV/AIDS, cancer, Alzheimer's disease, malaria and many disease, malaria and many other diseases.other diseases.

Fight aids @ HomeFight aids @ Home FightAIDS@HomeFightAIDS@Home is is

a a distributeddistributed computingcomputing project for project for InternetInternet-connected -connected home computers, home computers, operated by the operated by the Olson Olson LaboratoryLaboratory at at TheThe ScrippsScripps ResearchResearch InstituteInstitute. It aims to . It aims to use biomedical use biomedical software simulation software simulation techniques to search techniques to search for ways to cure or for ways to cure or prevent the spread of prevent the spread of AIDSAIDS and and HIVHIV. .

Climateprediction.netClimateprediction.net

Climateprediction.netClimateprediction.net, , or CPDN, is a distributed or CPDN, is a distributed computing project to computing project to investigate and reduce investigate and reduce uncertainties in climate uncertainties in climate modelling. modelling.

BBC Climate Change ExperimentBBC Climate Change Experiment

Presented by the BBC, this program Presented by the BBC, this program had a real succes in the whole had a real succes in the whole world:world:

Mathematics and computingMathematics and computing

Chess960@home is a distributed computing Chess960@home is a distributed computing project that runs on the BOINC software project that runs on the BOINC software platform. It aims to create a vast collection of platform. It aims to create a vast collection of Chess960 games and to publish it on the Chess960 games and to publish it on the internet for public use.internet for public use.

In Chess960@home one half-move is one In Chess960@home one half-move is one workunit. The deadline for workunits is short, workunit. The deadline for workunits is short, only 24 hours. The workunits typically take from only 24 hours. The workunits typically take from one to fifteen minutes to complete, depending one to fifteen minutes to complete, depending on the computer.on the computer.

Riesel Sieve: an idea to solve the Riesel Sieve: an idea to solve the Riesel problem Riesel problem

Riesel Sieve is a Riesel Sieve is a distributed computing distributed computing project trying to prove project trying to prove the Riesel conjecture. the Riesel conjecture. This conjecture says This conjecture says that 509203 is the that 509203 is the smallest Riesel smallest Riesel number. Seventeen or number. Seventeen or bust is a similar bust is a similar project for Sierpinski project for Sierpinski numbers. Riesel Sieve numbers. Riesel Sieve is running partially is running partially under the BOINC under the BOINC platform.platform.

Proving the conjecture: Proving the conjecture: To prove the Riesel To prove the Riesel conjecture we must find conjecture we must find an an nn value for every odd value for every odd kk smaller than 509203 so smaller than 509203 so that k*2^n-1 is a prime that k*2^n-1 is a prime number. At the start of the number. At the start of the project 101 of these project 101 of these k'k's s were left. Now we're down were left. Now we're down to 69 possible Riesel to 69 possible Riesel numbers. These numbers numbers. These numbers we're dealing with are we're dealing with are huge. The largest prime huge. The largest prime found by this project is found by this project is 26773.2^2465343 . This 26773.2^2465343 . This is a 742,147 digit number.is a 742,147 digit number.

A Crazy one: Electronic SheepA Crazy one: Electronic Sheep Electric SheepElectric Sheep is a distributed computing project for is a distributed computing project for

generating, downloading, and playing movies of generating, downloading, and playing movies of fractal flamesfractal flames while the screen saver is running, created by Scott Draves.while the screen saver is running, created by Scott Draves.

The parameters that generate these movies, called sheep, can The parameters that generate these movies, called sheep, can be created in a few ways: they can be created and submitted by be created in a few ways: they can be created and submitted by members of the electricsheep community, members of this members of the electricsheep community, members of this community can download the parameters of existing sheep and community can download the parameters of existing sheep and tweak them, or sheep can be mated together automatically by tweak them, or sheep can be mated together automatically by the server or manually by server admins (nicknamed the server or manually by server admins (nicknamed shepherds).shepherds).

Users may vote on sheep that they like or dislike, and this Users may vote on sheep that they like or dislike, and this voting is used for the genetic algorithm which generates new voting is used for the genetic algorithm which generates new sheep. All sheep parameters and movies are distributed under sheep. All sheep parameters and movies are distributed under the creative commons license and automatically downloaded by the creative commons license and automatically downloaded by the screen saver. Each movie is a fractal flame with several of the screen saver. Each movie is a fractal flame with several of its parameters changed over time.its parameters changed over time.

A crazy one: Electronic SheepA crazy one: Electronic Sheep

The phrase Electric Sheep is taken from the The phrase Electric Sheep is taken from the title of Philip K. Dick's novel "Do Androids title of Philip K. Dick's novel "Do Androids Dream of Electric Sheep?", as the title Dream of Electric Sheep?", as the title mirrors the nature of the project: computers mirrors the nature of the project: computers (Androids) who have started running the (Androids) who have started running the screensaver begin rendering (Dreaming) screensaver begin rendering (Dreaming) the fractal movies (Sheep).the fractal movies (Sheep).

Third Part

The limits of the Grids…

Most users are not ready for Most users are not ready for this sacrifice…this sacrifice…

• Those programs are Those programs are made to seems made to seems invisible. Users must invisible. Users must not be disturbed, and not be disturbed, and their computers must their computers must not be affected in the not be affected in the classic use.classic use.

• Most users are not Most users are not ready to leave their ready to leave their machines work night machines work night and day, without using and day, without using it. So most jobs are it. So most jobs are done little by little.done little by little.

Enterprises can’t accept any Enterprises can’t accept any error!error!

• Enterprises which use a computing grid Enterprises which use a computing grid to solve problems can’t be sure that the to solve problems can’t be sure that the resources will be available forever.resources will be available forever.

• Every user can quit his session without Every user can quit his session without finishing a jobs he had to. Problems finishing a jobs he had to. Problems can’t be solved if users aren’t really can’t be solved if users aren’t really implicated in the researches they help.implicated in the researches they help.

Nuclear weapons tests: share Nuclear weapons tests: share the results with the rest of the the results with the rest of the

worldworld??• And for evident reasons, military operations And for evident reasons, military operations

can’t be calculated by a Grid. Those can’t be calculated by a Grid. Those information can’t be shared without any risks.information can’t be shared without any risks.

• For example, the French government recently For example, the French government recently bought a supercomputer in order to simulate bought a supercomputer in order to simulate nuclear weapons tests. The subject of those nuclear weapons tests. The subject of those researches must be kept in a secret place: researches must be kept in a secret place: from this computer depends the French strike from this computer depends the French strike force menace, the base of every military force menace, the base of every military power.power.