Download - Tier 2 Prague Institute of Physics AS CR
NEC2013 Varna M. Lokajicek 1
Tier 2 PragueInstitute of Physics AS CR
Status and OutlookJ. Chudoba, M. Elias, L. Fiala, J. Horky,
T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka
13. September 2013
2
Outline
• Institute of Physics AS CR (FZU)• Computing Cluster• Networking• LHCONE• Looking for new resources
– CESNET National Storage Facility– IT4I supercomputing project
• Outlook
11 September 2013
3
Institute of Physics AS CR (FZU)• Institute of Physics of the Academy
of the Czech Republic• 2 locations in Prague, 1 in Olomouc
– In 2012: 786 employees (281 researchers + 78 doctoral students)– 6 Divisions
• Division of Elementary Particle Physics• Division of Condensed Matter Physics• Division of Solid State Physics• Division of Optics• Division of High Power Systems• ELI Beamlines Project Division
• Department of Networking and Computing Techniques (SAVT)
11 September 2013
4
FZU - SAVT
• Institute’s networking and computing service department– Several server rooms– Computing clusters
Golias – Particle physics, Tier2• Few nodes from already before EDG• WLCG iMoU 4 July 2003 (interim)• New server room 1 November 2004• WLCG MoU from 28 April 2008
– Dorje – solid state, condensed matterLuna, Thsun, smaller group clusters
11 September 2013
5
Main server room• Main server room (in FZU, Na Slovance)
– 62 m2, ~20 racks, 350 kVA motor generator, 200 + 2 x 100 kVA UPS, 108 kW air cooling, 176 kW water cooling
– continuous changes– hosts computing servers and central services
11 September 2013
611 September 2013
7
Cluster Golias• Upgraded every year
several (9) sub-clusters of the identical HW• 3800 cores, 30 700 HS06• 2 PB disk space
• Tapes used only for local backups (125 LTO4, max 500 cassettes)
• Serving: ATLAS, ALICE, D0 (NOvA), Auger, STAR, …
• WLCG Tier2Golias@FZU + xrootd servers@REZ (NPI)
11 September 2013
8
Utilization
• Very high average utilization– Several different projects, different tools for production– D0 – production submitted locally by 1 user– ATLAS – panda, ganga, local users; DPM– ALICE – VO box; xrootd
11 September 2013
D0
ATLAS
ALICE
3.5 k
9
„RAW“ CapacitiesHEPSPEC2006 % TB disk %
2009 10 340 186
2010 19 064 100 427 100
2011 23 484 100 1 714 100
2012 29 660 100 2521 100
D0 9 993 34 35 1
ATLAS 12 127 41 1880 (+16 MFF) 74
ALICE 7 540 25 606(+100 Řež) 24
2013 29 660 100 2521 100
D0 9 993 34 35 1
ATLAS 12 127 41 1880 (+16 MFF) 74
ALICE 7540 25 606(+140 Řež) 24
11 September 2013
10
2012 D0, ATLAS and ALICE usage
• ATLAS• 2,2 M tasks• 90 MHEPSEPC06 hours,
1,9 PB disk space• Data transfer 1,2 PB to farm
0,9 PB from farm
• 2% contribution to ATLAS
• ALICE• 2 M simulation tasks• 60 MHEPSEC06 hours• Data transfer 4,7 PB to farm and 0,5 PB from
farm• 5% our contribution to ALICE tasks processing
• 140TB disk space in INF (Tier3)
2012-01
2012-02
2012-03
2012-04
2012-05
2012-06
2012-07
2012-08
2012-09
2012-10
2012-110.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
incommingoutgoing
2012 Data transfers inside farm - month means to and from working nodes in TB
11 September 2013
• D0• 290 M tasks• 90 MHEPSPEC06 hours• 13% contribution to D0
1111 September 2013
12
Network - CESNET, z. s. p. o.• FZU Tier2 Network connections
– 10 Gbps LHCONE (GEANT), 18 July 2013– 10 Gbps KIT from 1st Sept 2013
11 September 2013
– 1 Gbps FNAL, BNL, Taipei– 10 Gbps to commodity network– 1-10 Gbps to Tier3 collaborating institutes
http://netreport.cesnet.cz/netreport/hep-cesnet-experimental-facility2/
13
LHCONE - Network transition
11 September 2013
• Link to KIT saturated at 1 Gbps E2E line
• LHCONE from 18 July 2013 over 10 Gbps infrastructure
• Relieves also the commodity network
10 Gbps
14
Atlas tests
• Testing upload speed of files > 1 GB to all Tier1 centra
• After LHCONE connection only 2 sites with < 5MB/s
• Prague Tier2 ready for validation as T2D
11 September 2013
30 60
15
LHCONE – trying to understand monitoring
• L
11 September 2013
Prague – DESY Very asymmetric throughput
LHCONE line cut
DESY – PragueLHCONE optical line cutAt 4:00One way latency improved
16
International contribution of Prague center to ATLAS + ALICE centra T2 LCG
• http://accounting.egi.eu/• Grid + local tasks• Long term slide down until we
received regular financing in 2008
• Original 3% target is not achievable with current financial resources
• Necessary to look for other resources
2005 2006 2007 2008 2009 2010 2011 20120
1
2
3
4
5
6
jobs %cpu %
11 September 2013
17
Remote storage
11 September 2013
• CESNET - Czech NREN + other services
• New project: National storage facility
FZU Tier-2 in Prague
CESNET storage site in Pilsen
100km
FZU<->Pilsen - 10Gbit link with ~3.5ms latency
• Three distributed HSM based storage sites• Designed for research and science
community– 100TB for both ATLAS and Auger
experiments offered– Implemented as remote Storage Element
with dCache– disk <-> tape migration
18
Remote storage
11 September 2013
remote/local Method TTreeCache events/s ( %) Bytes transferred %CPU Efficiency
local rfio ON 100% 117% 98,9%
local rfio OFF 74% 100% 72,7%
remote dCap ON 75% 101% 73,5%
remote dCap OFF 46% 100% 46,9%
• TTreeCache in ROOT helps a lot – both for local and for remote transfers
• TTreeCached remote jobs faster than local ones without the cache
Influence of distributing a Tier-2 data storage on physics analysis
19
Outlook • In 2015 after LHC start up
– Higher data production– Flat financing not sufficient– Computing can become an item of
M&O A (Maintenance &Operations cat. A)
• Search for new financial resources or new unpaid capacities necessary– CESNET
• Crucial free delivery of network infrastructure
• Unpaid External storage, how long?– IT4I, Czech supercomputing project
search for computing capacities (free cycles), relying on other project to find the way how to use them
11 September 2013
20
• 16th International workshop on Advanced Computing and Analysis Techniques in physics (ACAT)
• http://www.particle.cz/acat2014
• Topics• Computing Technology
for Physics Research• Data Analysis -
Algorithms and Tools• Computations in
Theoretical Physics: Techniques and Methods
11 September 2013
21
Backup
11 September 2013