d. britton preliminary project plan for gridpp3 david britton 15/may/06

22
D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

Upload: dillan-cable

Post on 11-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

D. Britton

Preliminary Project Plan for GridPP3

David Britton 15/May/06

Page 2: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Boundary Conditions

Timeframe: GridPP2+ Sep 07 to Mar 08 GridPP3 Apr 08 to Mar 11

Budget Line: Unknown exactly. Scale set by exploitation review input. (both from GridPP input and Experiments)

Exploitation Review input from GridPP: FY07: £7,343k FY08-FY10: £29,302k

Total: £36,643k

Page 3: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

LHC Hardware Requirements

GridPP Exploitation Review input: Took Global Hardware requirements and multiplied by UK authorship fraction.

ALICE 1% ATLAS 10% CMS 5% LHCB 15%

Problematic using “Authors” in the denominator when not all Authors (globally) have an associated Tier-1. Such an algorithm applied globally would not result in sufficient hardware. GridPP has asked the experiments for requirements and their input (relative to global requirements) is:

ALICE ~1.3% ATLAS ~13.7% CMS ~10.5% LHCb ~16.8%

?? (Global Requirements) X (Global T1 author frac.)

(Global Requirements) (Number of Tier1s)

~50% X

(Global Requirements) (Number of Tier1s)

~ UK Authorship fraction

Page 4: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Proposed Hardware

The proposal from the User Board is that that the hardware requirements in the GridPP3 proposal are:

• Those defined by the LHC experiments;

• plus those defined by BaBar (historically well understood);

• plus a 5% provision for “Other” experiments at the Tier-2s only.

Page 5: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Proposal

• We propose to use the UB input to define the Hardware request (and not include alternative scenarios).

• We will note that these hardware requirements are not very elastic. Strategic decisions on the UK obligations, roles, and priorities will need to be made if the Hardware is to be significantly reduced.

• (Internally, we should continue to discuss how to respond to lower funding scenarios).

Page 6: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Hardware Costs

Hardware costs are rather uncertain. We have previously quantified this uncertainty as 10% per year of extrapolation (10% in 2007, 20% in 2008, etc). Translates to an uncertainty of about £3.8m in the proposal.

Capital Costs Jan-06 Jan-07 Jan-08 Jan-09 Jan-10CPU cost [k£/Ksi2K] 0.49 0.39 0.31 0.25 0.17CPU Installation [K£/Ksi2K] 0.03 0.02 0.02 0.01 0.01Disk cost [k£/TB] 1.38 0.95 0.67 0.48 0.34Disk installation cost [k£/TB] 0.07 0.05 0.04 0.03 0.02Tape media [k£/TB] 0.26 0.18 0.16 0.13 0.07Tape Infrastructure [k£/yr] 133 132 489 247 321Tier-1 Infrastructure [k£/yr] 108 113 119 125 131

(Actual numbers here will be updated – these are a few months old)

2007 2008 2009 2010 2011 2012

CAPACITY MODELRequired capacity 816 2538 4808 7682 9753 12085Actual CASTOR Capacity 544 2538 4808 10516 10129 12085

9940 MediaExisting 9940 Slot Count 1948 0 0 0 0 0Media Capacity (9940) 0.182 0.182 0.182 0.182 0.182 0.182Existing 9940 Capacity 324 0.000 0.000 0.000 0.000 0.000

T10/20K Media Total Required Tape Capacity April (TB) 816 2538 4808 7682 9753 12085Tapes phased out in March 0 0 0 0 430 778Total Tapes Available in March 430 1208 5639 10684 11254 10476Total Storage Capacity (March) 194 544 2538 9616 10129 9429Addirtional TB Required for April 350 1994 2270 0 0 2656Additional Tapes Purchased 778 4432 5045 1000 0 2951Used Slots April (T10/20K) 1208 5639 10684 11684 11254 13428T10/20K Media Cost 0.08 0.07 0.06 0.06 0.06 0.06Media Capacity 0.45 0.45 0.45 0.9 0.9 0.9Spent on Media 62 310 303 60 0 177

Spent on new Robot Infrastructure 250 50 50New Slots Purchased 6000 2000 2000Maximum Slot Count Available 5000 11000 11000 13000 13000 15000Total Used Slots 3156 5639 10684 11684 11254 13428

Bandwidth MODEL

Estimated rate to Fill (6 months) 32 114 151 191 137 155In beam Double Fill Rate 228 301 381 275 309In beam Media Conversion (6 months) 17 319In beam reprocessing 114 151 191 137 155Out of beam Reprocessing Read Rate (4 months?) 252 478 764 970 1202Drive deadtime on writes 25% 25% 25% 25% 25% 25%Drive deadtime on Reads 25% 25% 25% 25% 25% 25%

In beam write capacity required 327 401 933 366 412Out Beam write capacity required 0 0 0 0 0In beam read capacity required 174 201 679 183 206Out Beam read capacity required 337 638 1019 1294 1603

In beam total required bandwidth 501 602 1613 549 619out beam total required bandwidth 337 638 1019 1294 1603Total available CASTOR bandwidth 555 640 720 1680 1320 1680

9940B Drives 6 3 0 0 0 09940B Maintainance Cost/drive 3.0 3.3 3.6 4.0 4.4 4.8Spent on 9940B Maintainance 18 9.9 0 0 0 0Bandwidth per brick (MB/s) 25 25 25 25 25 259940B Bandwidth 150 75 0 0 0 0

Cost of Storage Brick (T10) 19.15 19.15 19.15 19.15 19.15 19.15T10K Maintainance Cost/drive 2.3 2.3 2.3 2.3 2.3 2.3New T10K Server Bricks 3 2 1 0Total T10K Server Bricks 6 8 9 9 0 0Bandwidth per brick (MB/s) 80 80 80 80 80 80Spent on Server bricks 57.45 38.3 19.15 0 0 0Spent on T10K Maintaince 6.9 13.8 18.4 20.7Total T10K Bandwidth 480 640 720 720 0 0

Cost of Storage Brick (T20) 19.15 19.15 19.15T20K Maintainance Cost/drive 2.3 2.3 2.3New T20K Server Bricks 8 3 3Total T20K Server Bricks 8 11 14Bandwidth per brick (MB/s) 120 120 120Spent on Server bricks 0 0 0 153.2 57.45 57.45Spent on T20K Maintaince 0 0 0 0 18.4 25.3Total T10K Bandwidth 0 0 0 960 1320 1680

Spent on ADS Maintainance 10 10 0 0 0 0Spent on Minor Parts 10 10 10 10 10 10Spent on Robot 1 M&O 30 30 30 30 30 30Spent on Robot 2 M&O 50 50 55 55 60

Summary

Spent on Media 62 310 303 60 0 177Spent on Bandwidth and Operation 132 412 128 319 189 258Spent Total 195 722 430 379 189 435

Page 7: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-1 Hardware

(Work in progress: numbers are still evolving!)

CPUCapacity

Purchase 3485 3573 6252 7101Phase Out 201.7 152.35 456.65 332Net Total 6590 10011 15806 22575Target +5% 6590 10011 15806 22575

0 0 0 0Disk

CapacityPurchase 2383 1637 3088 3158Phase Out 40.56 117 162 357Net Total 3604 5124 8050 10851Target +5% 3604 5124 8050 10851

0 0 0 0Tape

CapacityPurchase 3167 2732 0 558Phase Out 330 0 0 620Net Total 3787 6519 13039 12977Target 3787 6519 9326 12977

Apr-08 Apr-09 Apr-10 Apr-11

Apr-09 Apr-10 Apr-11Available in

Apr-08

Available inApr-08 Apr-09 Apr-10 Apr-11

Available in

SPEND FY10[K£] (in Jan-08) (in Jan-09) (in Jan-10) (in Jan-11)

CPU 1143 929 1150 923Disk 1692 822 1096 793Tape 494 363 0 37Tape Infra. 132 489 247 321Tier1 Infra. 113 119 125 131Running Cost 0 617 787 1059

Total 3575 3340 3405 3264

FY07 FY08 FY09

Page 8: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Running Costs

(Work in progress)

Running Costs CPU2007 2008 2009 2010

New Systems 166 761 404 473New Racks 5 24 13 15Phased out racks 4 3 5 0Rack Count 18 39 47 61KW/New System 0.26 0.26 0.27 0.29

198 110 136Phased Out KW 18 51 0Total Load (KW) 151 330 390 525Cost Per KW 0.00008 0.00008 0.00009

£0k £347k £430k £609k

New KW

Cost

Disk2007 2008 2009 2010

101 201 82 13414 29 12 193 4 0 10

32 57 69 780.735 0.77 0.81 0.85

155 66 11414 0 49

116 257 323 3880.00008 0.00008 0.00009

£0k £270k £357k £450k

Page 9: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Running Costs

• Running costs traditionally charged indirectly (at institutes and CCLRC). Normally averaged over larger communities which tends to be to the advantage of particle physics.

• We hope this continues as long as possible.

• Exploitation review input contained ~£1.8m running costs split between Tier-1 and Tier2 which is only 50% of the current estimate.

• Should we avoid explicitly include running costs in the GridPP3 proposal (on the basis that it is not known how these will be charged)? Instead, include a footnote pointing out the assumption that running costs are funded by other mechanisms (SLA, FEC).

Page 10: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-2 Resources

In GridPP2 we paid for staff in return for provision of hardware, which is not a sustainable model. Need a transition to a sustainable model that generates sufficient (but not excessive) hardware, which institutes will buy into.

Such a model should:• Acknowledge that we are building a Grid (not a computer centre).• That historically Tier2s have allowed us to lever resources/funding.• That Tier2 are designed to provide different functions and different levels of service from the Tier1.• Dual funding opportunities may continue for a while.• Institutes may have strategic gain by continuing to be part of the

"World's largest Grid"

Page 11: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-2 Resources

A possible model:

- GridPP funds ~15 FTE at the Tier-2s (same as Tier-1).- Tier-2 Hardware requirements are defined by the UB request.- That GridPP pays the cost of purchasing hardware to satisfy the following years requirements at the current year price, divided by the nominal hardware lifetime (4 years for disk; 5 years for CPU).E.g. 2253 TB of Disk is required in 2008. In January 2007, this would cost ~1.0k£/TB. With a life-time of 4 years, the 1-year “value” is 2253/4 = £563k.

Note: This does not necessarily reimburse the full cost of the hardware because in subsequent years, the money GridPP pays depreciates with the falling cost of hardware, whereas the Tier2s who actually made a purchase, have been locked into a cost determined by the purchase date. However, GridPP does pay cost up to 1-year before the actual purchase date, and institutes which already own resources can delay the spend further.

Page 12: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-2 Resources

Sanity Checks:

1) Can apply the model and compare cost of hardware at the Tier-1 and Tier-2 integrated over the lifetime of the project:

2) Total cost of ownership: Can compare total cost of the Tier-2 facilities with the cost of placing the same hardware at the Tier-1 (estimate that doubling the Tier-1 hardware requires a 35% increase in staff).

Tier-1 Tier-2CPU (K£/KSI2K-year): 0.071 0.043DISK (K£/TB-year): 0.142 0.107TAPE (K£/TB-year): 0.05

Including staff and hardware, the cost of the Tier-2 facilities is 80% of cost of an enlarged Tier-1.

Question: Would institutes be prepared to participate at this level?

Page 13: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Staff Effort

Currently using the GridPP input to the exploitation review as the baseline (with the addition of Dissemination + Industrial Liaison) .

Top Down Plan 2012

Scn-1 Scn-1 Scn-1 Scn-1 Scn-1 Scn-1

Tier-1 Staff 8.75 15.00 15.00 15.00 15.00 15.00

Tier-2 Staff 8.75 15.00 15.00 15.00 15.00 15.00

Grid Support Staff 12.25 21.00 17.00 15.00 15.00 15.00

Grid Operations Staff 0.58 8.00 8.00 8.00 8.00 8.00

Management 1.98 2.50 2.50 2.50 2.50 2.50

Dissem. + Indus.Lias. 0.88 1.50 1.50 1.50 1.50 1.50

Total Staff 33.19 63.00 59.00 57.00 57.00 57.00

2007 2008 2009 2010 2011

GridPP2+ GridPP3

Page 14: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Staff Costs

University Posts 2007 2008 2009 2010 2011 2012Inflation Factor 2.50% 2.50% 2.50% 2.50% 2.50% 2.50%Salary Progression Factor 4.00% 4.00% 4.00% 4.00% 4.00% 4.00%Average Salary [K£] 36.0 38.3 40.8 43.5 46.3 49.3Average Indirect Costs[k£] 33.0 33.8 34.7 35.5 36.4 37.3Average Estate Costs [k£] 11.0 11.3 11.6 11.8 12.1 12.4FEC Fraction 80% 100% 100% 100% 100% 100%Total FTE Cost 71.2 83.4 87.1 90.9 94.9 99.1

CCLRC Posts 2007 2008 2009 2010 2011 2012Inflation Factor 2.50% 2.50% 2.50% 2.50% 2.50% 2.50%Salary Progression Factor 1.00% 1.00% 1.00% 1.00% 1.00% 1.00%Average Salary [K£] 74.3 76.9 79.6 82.4 85.3 88.2

Page 15: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-1 Staff

The staff required will be 15 FTE to run and operate the CPU, disk, tape, networking and core services as well as provide Tier-1 operations, deployment and experiments support managed in an effective manner. Support will be during daytime working hours (08:30-17:00 Monday to Friday) with on call cover outside this period. CCLRC may provide additional effort to underpin the service. In order to provide staff present on-site for 24x7 (weekend) cover a further 5 FTE (2 FTE) would be needed.

9 FTE in GridPP1; 13.5 FTE in GridPP2

Management Service

Disk Service

Tape Service

File System Service

CPU Service

Deployment Service

Experiment Support

Middleware Support

Core Services

Operations Service

Security Service

Network Service

Other Service

(Exploitation review input)

Page 16: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Tier-2 Staff

Currently GridPP provides 9.0 FTE of effort for hardware support at the Tier-2s (London 2.5, NorthGrid 4.5, ScotGrid 1.0 and SouthGrid 1.0). This is acknowledged to be too low and operating the Tier-2s is a significant drain on rolling-grant funded System Managers and Physicist Programmers. Large facilities require at least one FTE per site, whereas smaller sites need at least a half FTE. On the basis of currently available hardware an allocation for HEP computing would be 4 FTE to London (5 sites), 6 FTE to NorthGrid (4 Sites), 2 to ScotGrid (3 sites) and 3 to SouthGrid (5 sites) making a total of 15 FTE.

(Exploitation review input)

Page 17: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Grid Support Staff

From the middleware support side, at least one FTE is required for each of the following areas: security support; storage systems; workload management; networking; underlying file transfer and data management systems; and information systems and monitoring (where an additional FTE of effort is anticipated ensuring that our main contribution to EGEE is supported in the longer term). It would be inappropriate to reduce to this level of effort abruptly at precisely the time that LHC is expected to start producing data in 2007. Rather it is advised to phase the reduction to this level over FY08 and FY09 thereby sustaining a necessary and appropriate level of support at this critical time.…there will remain core Grid application interfaces supporting the experiment applications that will continue into the LHC running period. These stand to some extent independent of the experiment-specific programmes, although they serve them. A total of 7 FTEs is required for these common application interface support tasks. It should be noted that the proposed effort in this combined area is a significant reduction from the current effort in these Grid developments of more than 30FTEs.

(Exploitation review input)

Data ManagementDevelopment File placement, Replica location, auto file transfer

Bug-fixingSupport FTS, GridFTP, LFC

Storage ManagementDevelopment Disk usage monitoring; Castor dev

Bug-fixing Castor/SRM-2, sCacheSupport Castor, SRM, dCache, DPM

Generic MetadataDevelopment Security support in databases

Bug-fixingSupport AMGA

InfoMonDevelopment

Bug-fixing R-GMASupport BDII, R-GMA, GLUE, APEL

WMSDevelopment

Bug-fixing SGE supportSupport RB, CEmon, etc

SecurityDevelopment Policy

Bug-fixing Vulnerability, GridSite supportSupport Operational security, VOMS

NetworkingDevelopment

Bug-fixingSupport Provisioning (LHCOPN, …), monitoring

Application InterfacesUser Interfaces

PortalsDocumentation

DocumentationUser Support

User Support

Page 18: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Grid Operations(Exploitation review input)

In order to operate and monitor the deployment of such a Grid, a further 8 FTEs of effort is needed, corresponding to the Production Manager, 4 Tier-2 Regional Coordinators and 3 members of the Grid Operations Centre.

Area Task CommentOperations Management 1 1 Deployment and Operations Manager

Tier-2 Co-ordinators 0 4 Tier-2 Coordinators"GOC" Posts Need Details 0 3 More information required

Scenario 1

Page 19: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Future Management

GridPP2 Beyond GridPP

Project Leader Tony Doyle 0.7 Project Leader ~0.7

Project Manager Dave Britton 0.9 Project Manager ~0.9

CB/Tier-2 Chair Steve Lloyd 0.5“Deployment Supervisor”

~0.4Deployment Board Chair

Dave Kelsey 0.3

Applications Coordinator

Roger Jones 0.5“Technical Supervisor” ~0.4

Middleware CoordinatorRobin Middleton

0.5

Total 3.4 2.5

• Project Leader appointed by CB search Committee• Others by Project Leader?• 2.5 1.5 over time?• What about CB itself?• What about Dissemination?

(Last CB –Steve’s slide)

Page 20: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Dissemination

4. The bid (s) should : a) show how developments build upon PPARC’s existing investment in e-Science and IT investment, leverage investment by the e-science Core programme and demonstrate close collaboration with other science and industry and with key international partners such as CERN. It is expected that a plan for collaboration with industry will be presented or justification if such a plan is not appropriate.

For exploitation review it was assumed dissemination was absorbed by PPARC. Unlikely at this point! Presently we have effectively 1.5 FTE working on dissemination alone (Sarah Pearce plus events officer). Want to maintain a significant dissemination activity (insurance policy) so adding in industrial liaison suggests maintaining the level at 1.5 FTE.

Page 21: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Full Proposal (work-in-progress)

Compares with exploitation review input of £36,643k which included £1,800k running costs excluded above.

FY10GridPP3

Tier-1 Staff 650 1154 1194 1236Tier-1 Hardware 3575 2722 2619 2205Tier-2 Staff 623 1252 1306 1363Tier-2 Hardware 660 1259 1344 1247Grid Support Staff 891 1684 1417 1299Grid Operation Staff 43 641 667 693Management Staff 144 200 208 217Dissem. + Indus.Lias. 62 125 131 136Travel and Operations 133 252 236 228

Total 6781 9289 9121 8624Grand Total 33815

GridPP2+ GridPP3 GridPP3

FY07 FY08 FY09Cost Table in K£

Page 22: D. Britton Preliminary Project Plan for GridPP3 David Britton 15/May/06

15/May/2006 GridPP3 D. Britton

Proposed Balance (work in

progress)

GridPP3 Proposal

13%

33%

13% 13%

16%

6%

2%

1%

3%

Tier-1 Staff

Tier-1 Hardware

Tier-2 Staff

Tier-2 Hardware

Grid Support Staff

Grid Operation Staff

Management Staff

Dissem. + Indus.Lias.

Travel and Operations