网格计算与云计算. “cloud” computing is 1+ yr old michael sheehan’s gogrid blog, july...

112
网网网网网网网网

Upload: joseph-powers

Post on 24-Dec-2015

313 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网格计算与云计算

Page 2: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

“Cloud” Computing is 1+ yr old

Michael Sheehan’s GoGrid Blog, July 25, 2008 http://linux.sys-con.com/node/587717

Page 3: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Confused?

SaaS

Utility Computing

SaaS = Software as a Service

?

?

Virtualization

Grid Computing

Cluster Computing

Cloud Computing

P2P

Page 4: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

One can categorize each component

Cloud Computing

SaaS

Grid Computing

Cluster Computing

Utility Computing

Usage Model Infrastructure

Virtualization

P2P

Page 5: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网格计算网格计算

Page 6: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

6

What is a Grid?

Enable “coordinated resource sharing & problem solving in dynamic, multi-institutional virtual organizations.”

(Source: “The Anatomy of the Grid”)

Page 7: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

7

Virtual Organizations

Page 8: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

TeraGrid

Page 9: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

9

What is the TeraGrid?

Technology + Support = Science

– NSF 已投资 2.460 亿美元– 自 2004 年 10 月已处于生产运行阶段,目前已用高性能网络集成了每秒 750 万亿次计

算能力、 30PB 存储空间和 100 多个学科的数据库资源。

Page 10: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

10

TeraGrid’s 3-pronged strategy to further science

• DEEP Science: Enabling Terascale Science– Make science more productive

through an integrated set of very-high capability resources

• ASTA projects

• WIDE Impact: Empowering Communities– Bring TeraGrid capabilities to the

broad science community• Science Gateways

• OPEN Infrastructure, OPEN Partnership– Provide a coordinated, general

purpose, reliable set of services and resources

• Grid interoperability working group

Page 11: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

11

TeraGrid Used

Page 12: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

12

TeraGrid PI’s By Institution

TeraGrid PI’s

Blue: 10 or more PI’sRed: 5-9 PI’sYellow: 2-4 PI’sGreen: 1 PI

Page 13: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

13

ANL/UC IU NCSA ORNL PSC Purdue SDSC TACC

ComputationalResources

Itanium 2(0.5 TF)

IA-32(0.5 TF)

Itanium2(0.2 TF)

IA-32(2.0 TF)

Itanium2(10.7 TF)

SGI SMP (7.0 TF)

Dell Xeon(17.2TF)

IBM p690(2TF)

Condor Flock(1.1TF)

IA-32 (0.3 TF)

XT3 (10 TF)

TCS (6 TF)

Marvel SMP

(0.3 TF)

Hetero(1.7 TF)

IA-32(11 TF)Opportunistic

Itanium2(4.4 TF)

Power4+(15.6 TF)

Blue Gene(5.7 TF)

IA-32(6.3 TF)

Online Storage 20 TB 32 TB 1140 TB 1 TB 300 TB 26 TB 1400 TB 50 TB

Mass Storage 1.2 PB 5 PB 2.4 PB 1.3 PB 6 PB 2 PB

Net Gb/s, Hub 30 CHI 10 CHI 30 CHI 10 ATL 30 CHI 10 CHI 10 LA 10 CHI

DataCollections# collectionsApprox total sizeAccess methods

5 Col.>3.7 TBURL/DB/GridFTP

> 30 Col.URL/SRB/DB/GridFTP

4 Col.7 TBSRB/Portal/OPeNDAP

>70 Col.>1 PBGFS/SRB/DB/GridFTP

4 Col. 2.35 TBSRB/Web Services/URL

Instruments ProteomicsX-ray Cryst.

SNS and HFIR Facilities

VisualizationResourcesRI: Remote InteractRB: Remote BatchRC: RI/Collab

RI, RC, RB IA-32, 96 GeForce 6600GT

RBSGI Prism, 32 graphics pipes; IA-32

RI, RBIA-32 + Quadro4 980 XGL

RBIA-32, 48 Nodes

RB RI, RC, RBUltraSPARC IV, 512GB SMP, 16 gfx cards

TeraGrid Resources

100+ TF8 distinct architectures

3 PB Online Disk

>100 data collections

Page 14: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

14

Science GatewaysA new initiative for the TeraGrid

• Increasing investment by communities in their own cyberinfrastructure, but heterogeneous:

• Resources• Users – from expert to K-12• Software stacks, policies

• Science Gateways– Provide “TeraGrid Inside”

capabilities– Leverage community investment

• Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

Workflow Composer

Page 15: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

15

Gateways are growing in numbers

• 10 initial projects as part of TG proposal• >20 Gateway projects today• No limit on how many gateways can use TG resources

– Prepare services and documentation so developers can work independently

• Open Science Grid (OSG)• Special PRiority and Urgent Computing Environment

(SPRUCE)• National Virtual Observatory (NVO)• Linked Environments for Atmospheric Discovery

(LEAD)• Computational Chemistry Grid (GridChem)• Computational Science and Engineering Online (CSE-

Online)• GEON(GEOsciences Network)• Network for Earthquake Engineering Simulation (NEES)• SCEC Earthworks Project• Network for Computational Nanotechnology and

nanoHUB• GIScience Gateway (GISolve)• Biology and Biomedicine Science Gateway• Open Life Sciences Gateway• The Telescience Project• Grid Analysis Environment (GAE)• Neutron Science Instrument Gateway• TeraGrid Visualization Gateway, ANL• BIRN• Gridblast Bioinformatics Gateway• Earth Systems Grid• Astrophysical Data Repository (Cornell)

• Many others interested– SID Grid– HASTAC

Page 16: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

OSG(Open Science Grid)

Page 17: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

17

Open Science Grid (OSG)

Origins:– National Grid (iVDGL, GriPhyN, PPDG) and LHC Software &

Computing Projects Current Compute Resources:

– 61 Open Science Grid sites– Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps– Compute & Storage Elemets– All are Linux clusters– Most are shared

• Campus grids• Local non-grid users

– More than 10,000 CPUs• A lot of opportunistic usage • Total computing capacity difficult to estimate• Same with Storage

Origins:– National Grid (iVDGL, GriPhyN, PPDG) and LHC Software &

Computing Projects Current Compute Resources:

– 61 Open Science Grid sites– Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps– Compute & Storage Elemets– All are Linux clusters– Most are shared

• Campus grids• Local non-grid users

– More than 10,000 CPUs• A lot of opportunistic usage • Total computing capacity difficult to estimate• Same with Storage

Page 18: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

18

96 Resources across production & integration infrastructures

20 Virtual Organizations +6 operations

Includes 25% non-physics.

~20,000 CPUs (from 30 to 4000)

~6 PB Tapes

~4 PB Shared Disk

Snapshot of Jobs on OSGs

Sustaining through OSG submissions:

3,000-4,000 simultaneous jobs .

~10K jobs/day

~50K CPUhours/day.

Peak test jobs of 15K a day.

Using production & research networks

OSG Snapshot

Page 19: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

NERSC

BU

UNMSDSC

UTA

OU

FNALANL

WISC BNL

VANDERBILT

PSU

UVA

CALTECH

IOWA STATE

PURDUE

IU

BUFFALO

TTU

CORNELL

ALBANY

UMICH

INDIANAIUPUI

STANFORD

UWM

UNL

UFL

KU

UNI

WSUMSU

LTU

LSU

CLEMSON

MCGILL

UMISS

UIUC

UCRUCLA

LEHIGH

NSF

ORNL

HARVARD

UIC

SMU

UCHICAGO

What is the Open Science Grid?

(+Brazil, Mexico, Tawain, UK)

Page 20: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

OSGOSG 应用应用Genome sequence analysis

STAR: 5 TB transfer(SRM, GridFTP)

Sloan digital sky survey

Earth System Grid:O(100TB) online data

Page 21: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Earth System GridEarth System Grid

Page 22: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

EGEE(Enabling Grids for E-sciencE)

Page 23: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

23

European Grid Initiative

Page 24: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

June 2, 200824

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…

>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>150 VOs>150,000 jobs/day

Page 25: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

June 2, 200825

Users and resources distribution

Page 26: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

26

XferCPU

Storage

EGEE workload in 2007

CPU: 114 Million hours

Data:

25PB stored

11PB transferred

http://gridview.cern.ch/GRIDVIEW/same_index.php http://calculator.s3.amazonaws.com/calc5.html? 17/05/08 $58688679.08

Page 27: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

LCG(LHC Computing Grid)

Page 28: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 Federico Calzolari 28

LHC - Large Hadronic Collider

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG 4 experiments:

ATLAS Alice CMS LHCb

27 km long pipe

7+7 TeV

Page 29: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 Federico Calzolari 29

LCG - LHC Computing GridG

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

目前集成了 33 个国家的140 个计算中心。

2008 年将执行 1 亿个计算任务。

Page 30: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 30

Proxy certificate

Get your proxy certificate temporary (usually 24h) certificate depending on VO:

grid-proxy-initvoms-proxy-init -voms <VO>:/<VO>/Role=<role> -valid 1000:00

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 31: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 Federico Calzolari 31

Certificate Install your certificate on the User Interface:

Log in into the UserInterface, copy there the file you exported, and create a directory where your certificate + private key will be stored:mkdir ~/.globus

Convert PKCS12 file .p12 into the supported standard .pemThis operation will split your mycert.p12 file in two files: the certificate (usercert.pem) and the private key (userkey.pem)openssl pkcs12 -nocerts -in <mycert.p12> -out ~/.globus/userkey.pemopenssl pkcs12 -clcerts -nokeys -in <mycert.p12> -out ~/.globus/usercert.pemchmod 0400 ~/.globus/userkey.pemchmod 0600 ~/.globus/usercert.pem

At end you should have something like:[user@userinterface .globus]$ ls -al-rw------- 1 user user 2008 Nov 13 16:50 usercert.pem-r-------- 1 user user 963 Nov 13 16:50 userkey.pem

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 32: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 32

Register to a VOG

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

for generic user

http://grid-it.cnaf.infn.it

Page 33: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 33

JDL: Job Description Language

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

JOB overview:

JDL (job encapsulation) main script executable program Creation

Submission

Status

Retrieval

Page 34: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 34

JDL test.jdl

Executable = "script.sh";StdOutput = "std.out";StdError = "std.err";InputSandbox = {"script.sh","exe.bin"}; # InputOutputSandbox = {"std.out","std.err","out"}; # OutputVirtualOrganisation = "<VO>";DataAccessProtocol = {"file","gsiftp","rfio","dcap"};InputData = {"lfn:/grid/<VO>/<FILE>"};OutputSE = "<SE>";

Requirements=Member("<SITE>", other.GlueHostApplicationSoftwareRunTimeEnvironment && other.GlueCEName=="<QUEUE>");

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 35: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 35

Main script script.sh

#!/bin/sh# Environmentdate >> out2hostname >> out2

# Get datalcg-cp [-v] --vo <VO> lfn:<file> file:///data.tgz

# Unpack input [data.tgz: src.cpp,...]tar -zxvf data.tgz

# Compile sourceg++ src.cpp -o exe.binchmod u+x exe.bin

# Exec program./exe.bin > out

# Pack outputtar -zcvf out.tgz out out2

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 36: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 36

Submit a Job Submit a JOB

edg-job-submit -o ID <JDL> # save JOBid on file ID

Selected Virtual Organisation name (from JDL): cms

Connecting to host rb119.cern.ch, port 7772 # Resource BrokerLogging to host rb119.cern.ch, port 9002********************************************************************************************* JOB SUBMIT OUTCOMEThe job has been successfully submitted to the Network Server.Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:- https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ # JOBid*********************************************************************************************

Control JOB status

edg-job-status <JOBid> [https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ]

*************************************************************

BOOKKEEPING INFORMATION:Status info for the Job : https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQCurrent Status: Waiting / Scheduled / Running / Done (Success/Abort)Status Reason: Job successfully submitted to GlobusDestination: ce0001.m45.ihep.su:2119/jobmanager-lcgpbs-cmsreached on: Sat Nov 17 22:38:34 2007*************************************************************G

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

Page 37: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 37

Get the output JOB output retrieve

edg-job-get-output <JOBid> [https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ]

Retrieving files from host: rb119.cern.ch( for https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQ)

********************************************************************************* JOB GET OUTPUT OUTCOMEOutput sandbox files for the job:- https://rb119.cern.ch:9000/tG3Xp2jT_58IUeXoY1GoZQhave been successfully retrieved and stored in the directory:/tmp/jobOutput/<USER>_ tG3Xp2jT_58IUeXoY1GoZQ*********************************************************************************

ls -al /tmp/jobOutput/calzolar_ tG3Xp2jT_58IUeXoY1GoZQ -rw-r--r-- 1 calzolar cms 11 Nov 17 23:59 out -rw-r--r-- 1 calzolar cms 133 Nov 17 23:59 std.err -rw-r--r-- 1 calzolar cms 8 Nov 17 23:59 std.out

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 38: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 38

Job Requirements JDL Requirements

everywhereNO Requirements

at PisaRequirements=Member("INFN-PISA",other.GlueHostApplicationSoftwareRunTimeEnvironment);

on a queue 1 day at least longRequirements=(other.GlueCEPolicyMaxCPUTime>60*24);

on a site with at least 20 free CPURequirements=(other.GlueCEStateFreeCPUs>20);

on a site with at least 1 TB (unit:kb) local disk availableRequirements=anyMatch(other.storage.CloseSEs,target.GlueSAStateAvailableSpace > 1000000000);

on a site with a given software locally installedRequirements=Member(”VO-<VO>-TAG",other.GlueHostApplicationSoftwareRunTimeEnvironment);

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 39: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 39

Requirements TAGs from SINICA http://goc.grid.sinica.edu.tw/gstat/<SITE>/

GlueHostOperatingSystemName: Scientific Linux CERN GlueHostOperatingSystemRelease: 4.5 GlueHostOperatingSystemVersion: Beryllium GlueSubClusterPhysicalCPUs: 0 GlueSubClusterLogicalCPUs: 0 GlueHostApplicationSoftwareRunTimeEnvironment:

LCG-2 LCG-2_1_0LCG-2_1_1 LCG-2_2_0LCG-2_3_0 LCG-2_3_1LCG-2_4_0 LCG-2_5_0LCG-2_6_0 LCG-2_7_0GLITE-3_0_0 R-GMAINFN-PISA SI00MeanPerCPU_1800SF00MeanPerCPU_2000 MPICHMPI_HOME_NOTSHARED AFSVO-atlas-cloud-IT VO-atlas-production-12.0.5VO-atlas-production-12.0.6 VO-atlas-production-12.0.7[…]

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 40: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 40

Resources search Query CPU / Storage available per VO

lcg-infosites --vo <VO> ce

#CPU Free Total Jobs Running Waiting ComputingElement----------------------------------------------------------

165 1 1 0 1 ce.phy.bg.ac.yu:2119/jobmanager-pbs-cms120 11 0 0 0 fangorn.man.poznan.pl:2119/jobmanager-pbs-cms192 110 0 0 0 gridce.atlantis.ugent.be:2119/jobmanager-pbs-cms212 0 529 146 383 gridce.iihe.ac.be:2119/jobmanager-pbs-cms227 5 312 222 90 ingrid.cism.ucl.ac.be:2119/jobmanager-lcgcondor-cms 15 15 0 0 0 ce002.ipp.acad.bg:2119/jobmanager-lcgpbs-cms 80 43 0 0 0 ce02.grid.acad.bg:2119/jobmanager-pbs-cms 24 13 0 0 0 ce001.grid.uni-sofia.bg:2119/jobmanager-lcgpbs-cms

lcg-infosites --vo <VO> se

Avail Space(Kb) Used Space(Kb) Type SEs----------------------------------------------------------97470000 n.a n.a dpm.phy.bg.ac.yu395467659 779205896 n.a cmsse01.ihep.ac.cn27664924 59878772 n.a se001.grid.uni-sofia.bg149180000 n.a n.a se.hpc.iit.bme.hu1 1 n.a dcsrm.usatlas.bnl.gov190040000 208 n.a lxdpm101.cern.ch1000000000000 500000000000 n.a castorgrid.cern.ch1000000000000 500000000000 n.a srm.cern.chG

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

Page 41: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 41

Resources search Query available sites for my Job

edg-job-list-match <JDL>

Selected Virtual Organisation name (from JDL): cmsConnecting to host rb119.cern.ch, port 7772*************************************************************************** COMPUTING ELEMENT IDs LISTThe following CE(s) matching your job requirements have been found: *CEId*a01-004-128.gridka.de:2119/jobmanager-pbspro-cmsSa01-004-128.gridka.de:2119/jobmanager-pbspro-cmsXS

ares02.cyf-kr.edu.pl:2119/jobmanager-pbs-cms beagle14.ba.itb.cnr.it:2119/jobmanager-lcgpbs-cms bogrid5.bo.infn.it:2119/jobmanager-lcgpbs-cms ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsL ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsS ce-fzk.gridka.de:2119/jobmanager-pbspro-cmsXS ce.bg.ktu.lt:2119/jobmanager-lcgpbs-cms ce.cc.ncu.edu.tw:2119/jobmanager-lcgpbs-cms

[…]gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-cmsgridce2.pi.infn.it:2119/jobmanager-lcglsf-cms4gridce.sns.it:2119/jobmanager-lcgpbs-cms

GR

ID T

uto

rial

GR

ID T

uto

rial

- H

ow

to

us

e L

CG

- H

ow

to

us

e L

CG

Page 42: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 42

Grid MonitoringG

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

GOC Sinica

GridICE INFN

Page 43: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

04/19/23 43

Grid MonitoringG

RID

Tu

tori

alG

RID

Tu

tori

al -

Ho

w t

o u

se

LC

G-

Ho

w t

o u

se

LC

G

AOB

Page 44: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

云计算云计算

Page 45: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Cloud Computing

45

Page 46: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Cloud Computing

Definition Cloud computing is a concept of using the internet to allow

people to access technology-enabled services. It allows users to consume services without knowledge of control over the technology infrastructure that supports them.

- Wikipedia

Page 47: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

47

Page 48: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Enterprise IT spending challenge

Source: IBM Corporate Strategy analysis of IDC data, Sept. 2007

Global Annual IT SpendingEstimated US$B 1996-2010

$0B

50

100

150

200

250

300

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

New Server Spending

Server Mgt and Admin Costs

Power and Cooling Costs

Page 49: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Dream or Nightmare?

Page 50: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Seasonal Spikes

Page 51: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

A Closer Look at Cloud Computing

Enterprise Cloud

Public Cloud

INNOVATIVE BUSINESS MODELS

End Users / Requestors

Government/ Academics

Industry(Startups/ SMB/ Enterprise)

Consumers

• An “Elastic” pool of high performance virtualized compute resources

• Cloud applications enable the simplificationof complex services

• A cloud computing platform combines modular componentson a service oriented architecture with flexible pricing

• New combinations of services to form differentiating value propositions at lowercosts in shorter time

• Internet protocol based convergence of networks and devices

SIMPLIFIED SERVICES

Source: Corporate Strategy

Page 52: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

52

Examples of Different Types of Services

Cloud Computing

Service Catalog

DatacenterInfrastructure

Virtual Client service

Web Application Service

Compute Service

Database service

Storage service

Content Classification

Storage backup, archive… service

Job SchedulingService

Collaboration Services

Page 53: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Google and Cloud ComputingGoogle 与云计算

Page 54: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

User Centric

• Data stored in the “Cloud”

• Data follows you & your devices

• Data accessible anywhere

• Data can be shared with others

music

preferences

maps

newscontacts

messages

mailing lists

photo

e-mails

calendar

phone numbers

investments

Page 55: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Google 的三大法宝Google File System(GFS) BigTable MapReduce

Page 56: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Google File System(GFS)

Page 57: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

57

GFS Architecture

Google48%

MSN19%

Yahoo33%

• Files broken into chunks (typically 64 MB)• Master manages metadata• Data transfers happen directly between clients/chunkservers

Client

ClientClientRep

licas

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver 1

C0

C2

C5

Chunkserver N

C1

C3C5

Chunkserver 2

ClientClient

ClientClient

ClientClient

Page 58: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

GFS Usage @ Google

• 200+ clusters• Filesystem clusters of up to 5000+ machines• Pools of 10000+ clients• 5+ Petabyte Filesystems• All in the presence of frequent HW failure

Page 59: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Google 的三大法宝Google File System(GFS) BigTable MapReduce

Page 60: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

BigTable

• Data model (row, column, timestamp) cell contents

Page 61: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

BigTable

• Distributed multi-level sparse map Fault-tolerance, persistent

• Scalable Thousand of servers Terabytes of in-memory data Petabytes of disk-based data

• Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance

Page 62: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Why not just use commercial DB?

• Scale is too large or cost is too high for most commercial databases

• Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer Also fun and challenging to build large-scale systems

Page 63: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

BigTable Summary

• Data model applicable to broad range of clients Actively deployed in many of Google’s services

• System provides high-performance storage system on a large scale Self-managing Thousands of servers Millions of ops/second Multiple GB/s reading/writing

• Largest bigtable cell manages – 3PB of data spread over several thousand machines

Page 64: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Google 的三大法宝Google File System(GFS) BigTable MapReduce

Page 65: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce

• A simple programming model that applies to many data-intensive computing problems

• Hide messy details in MapReduce runtime library Automatic parallelization Load balancing Network and disk transfer optimization Handle of machine failures Robustness Easy to use

Page 66: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce Programming Model

• Borrowed from functional programmingmap(f, [x1,…,xm,…]) = [f(x1),…,f(xm),…]

reduce(f, x1, [x2, x3,…])

= reduce(f, f(x1, x2), [x3,…])

= …

(continue until the list is exhausted)

• Users implement two functionsmap (in_key, in_value) (key, value) list

reduce (key, [value1,…,valuem]) f_value

f f f f f f

f f f f f returned

initial

Page 67: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce – A New Model and System• Two phases of data processing

– Map: (in_key, in_value) {(keyj, valuej) | j = 1…k}

– Reduce: (key, [value1,…valuem]) (key, f_value)

Data store 1 Data store nmap

(key 1, values...)

(key 2, values...)

(key 3, values...)

map

(key 1, values...)

(key 2, values...)

(key 3, values...)

Input key*value pairs

Input key*value pairs

== Barrier == : Aggregates intermediate values by output key

reduce reduce reduce

key 1, intermediate

values

key 2, intermediate

values

key 3, intermediate

values

final key 1 values

final key 2 values

final key 3 values

...

Page 68: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce Version of Pseudo Code

Page 69: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Example – WordCount (1/2)

• Input is files with one document per record• Specify a map function that takes a key/value pair

key = document URL Value = document contents

• Output of map function is key/value pairs. In our case, output (w,”1”) once per word in the document

Page 70: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Example – WordCount (2/2)

• MapReduce library gathers together all pairs with the same key(shuffle/sort)

• The reduce function combines the values for a key. In our case, compute the sum

• Output of reduce paired with key and saved

Page 71: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce Framework

• For certain classes of problems, the MapReduce framework provides: Automatic & efficient parallelization/distribution I/O scheduling: Run mapper close to input data Fault-tolerance: restart failed mapper or reducer tasks on the

same or different nodes Robustness: tolerate even massive failures:

e.g. large-scale network maintenance: once lost 1800 out of 2000 machines

Status/monitoring

Page 72: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Task Granularity And Pipelining

• Fine granularity tasks: many more map tasks than machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing

• Often use 200,000 map/500 reduce tasks with 2000 machines

Page 73: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 74: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 75: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 76: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 77: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 78: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 79: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 80: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 81: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 82: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce: Uses at Google

• Typical configuration: 200,000 mappers, 500 reducers on 2,000 nodes

• Broad applicability has been a pleasant surprise Quality experiences, log analysis, machine translation, ad-hoc

data processing Production indexing system: rewritten with MapReduce

• ~10 MapReductions, much simpler than old code

Page 83: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

MapReduce Summary

• MapReduce is proven to be useful abstraction• Greatly simplifies large-scale computation at

Google• Fun to use: focus on problem, let library deal

with messy details

Page 84: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

A Data Playground

• MapReduce + BigTable + GFS = Data playground Substantial fraction of internet available for processing Easy-to-use teraflops/petabytes, quick turn-around Cool problems, great colleagues

Page 85: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Web Services

Page 86: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Simple Storage Service

S3

Page 87: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Simple Storage ServiceAmazon Simple Storage Service

$.15 per GB per monthstorage

• Object-Based Storage• 1 B – 5 GB / object• Fast, Reliable, Scalable• Redundant, Dispersed• 99.99% Availability Goal• Private or Public• Per-object URLs & ACLs• BitTorrent Support $.10 - $.18 per

GB data transfer

$.01 for 1000 to 10000 requests

Page 88: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon S3 Concepts

Objects:Opaque data to be stored (1 byte … 5 Gigabytes)Authentication and access controls

Buckets:Object container – any number of objects100 buckets per account / buckets are “owned”

Keys:Unique object identifier within bucketUp to 1024 bytes longFlat object storage model

Standards-Based Interfaces:REST and SOAPURL-Addressability – every object has a URL

Page 89: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

S3 SOAP/Query APIService:

ListAllMyBuckets

Buckets:CreateBucketDeleteBucketListBucketGetBucketAccessControlPolicySetBucketAccessControlPolicyGetBucketLoggingStatusSetBucketLoggingStatus

Objects:PutObjectPutObjectInlineGetObjectGetObjectExtendedDeleteObjectGetObjectAccessControlPolicySetObjectAccessControlPolicy

Page 90: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 91: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008
Page 92: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Simple Queue Service

SQS

Page 93: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Simple Queue ServiceAmazon Simple Queue Service

$.10 per 1000 messages

• Scalable Queuing• Elastic Capacity• Reliable, Simple, Secure

Inter-process messaging, data buffering, architecture component

$.10 - $.18 per GB data transfer

Page 94: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon SQS Concepts

Queues:Named message containerPersistent

Messages:Up to 256KB of data per messagePeek / Lock access model

Scalable:Unlimited number of queues per accountUnlimited number of messages per queue

Page 95: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

SQS SOAP/Query APIQueues:

ListQueues DeleteQueueSetVisibilityTimeoutGetVisibilityTimeout

Messages: SendMessage ReceiveMessage DeleteMessage PeekMessage

Security:AddGrantListGrantsRemoveGrant

Page 96: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Elastic Compute Cloud

EC2

Page 97: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon Elastic Compute CloudAmazon Elastic Compute Cloud

$.10 per server hour

• Virtual Compute Cloud• Elastic Capacity• 1.7 GHz x86• 1.7 GB RAM• 160 GB Disk• 250 MB/Second Network• Network Security Model

Time or Traffic-based Scaling, Load testing, Simulation and Analysis, Rendering, Software as a Service Platform, Hosting

$.10 - $.18 per GB data transfer

Page 98: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon EC2 Concepts

Amazon Machine Image (AMI):Bootable root diskPre-defined or user-builtCatalog of user-built AMIsOS: Fedora, Centos, Gentoo, Debian, Ubuntu, Windows ServerApp Stack: LAMP, mpiBLAST, Hadoop

Instance:Running copy of an AMILaunch in less than 2 minutesStart/stop programmatically

Network Security Model:Explicit access controlSecurity groups

Inter-service bandwidth is free

Page 99: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Amazon EC2 At Work

StartupsCruxy – Media transcodingGigaVox Media – Podcast Management

Fortune 500 clients:High-Impact, S hort-Term ProjectsDevelopment Host

Science / Research:Hadoop / MapReducempiBLAST

Load-Management and Load Balancing Tools:Pound WeogeoRightscale

Page 100: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

EC2 SOAP/Query API

Images:RegisterImageDescribeImagesDeregisterImage

Instances:RunInstancesDescribeInstancesTerminateInstancesGetConsoleOutputRebootInstances

Keypairs:CreateKeyPairDescribeKeyPairsDeleteKeyPair

Image Attributes:

ModifyImageAttribute

DescribeImageAttribute

ResetImageAttribute

Security Groups:

CreateSecurityGroup

DescribeSecurityGroups

DeleteSecurityGroup

AuthorizeSecurityGroupIngress

RevokeSecurityGroupIngress

Page 101: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

Web-Scale Architecture

Page 102: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

GigaVox Economics

Implemented Amazon S3, Amazon EC2 and Amazon SQS in November 2006

Created an infinitely scalable infrastructure for less than $100 - building the same infrastructure themselves would have cost thousands of dollars

Reduced staffing requirements - far less responsibility for 24x7 operations

Page 103: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

分析展望分析展望

Page 104: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网络的迅猛发展

1986 年到 2000 年 计算机 : × 500 网络 : × 340,000

Page 105: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网络发展的必然结果…

Page 106: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网格计算与云计算的比较• 异构资源• 不同机构• 虚拟组织• 科学计算为主• 高性能计算机 • 紧耦合问题• 免费• 标准化• 科学界

• 同构资源• 单一机构• 虚拟机• 数据处理为主• 服务器 /PC• 松耦合问题• 按量计费• 尚无标准• 商业社会

107

Page 107: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

云计算是广义网格的一种

“ 网格是构筑在互联网上的一组新兴技术,它将高速互联网、高性能计算机、大型数据库、传感器、远程设备等融为一体,为科技人员和普通老百姓提供更多的资源、功能和交互性服务。

108

Ian Foster, The Grid, 1998

Page 108: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

未来 10 年的科学

Science 2.0Science 2.0网格计算网格计算

Page 109: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

未来 10 年的商业

Business 2.0Business 2.0云计算云计算

Page 110: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

网格书籍网格书籍

Page 111: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

http://www.chinagrid.net

Page 112: 网格计算与云计算. “Cloud” Computing is 1+ yr old Michael Sheehan’s GoGrid Blog, July 25, 2008

http://www.china-cloud.net