proposal: lhc_07 r&d for atlas grid computing

Proposal: LHC_07R&D for ATLAS Grid computing

Tetsuro MashimoInternational Center for Elementary Particle Physics

(ICEPP), The University of Tokyoon behalf of the LHC_07 project team

2013 Joint Workshop of FKPPL and FJPPL(TYL)June 4, 2013 @Yonsei University, Seoul

LHC_07“R&D for ATLAS Grid computing”

• Cooperation between French and Japanese teams in R&D on ATLAS distributed computing in order to face the important challenges of the next years (in preparation for LHC 14 TeV runs in 2015~)

• Important challenges of the next years: new computing model, hardware, software, and networking issues

• The International Center for Elementary Particle Physics (ICEPP), the University of Tokyo (WLCG Japanese Tier-2 center) and French Tier-2 centers

LHC_07: members

* leader

French group Lab. Japanese group Lab.

E. Lançon* Irfu T. Mashimo* ICEPP

L. Poggioli IN2P3 I. Ueda ICEPPP.-E. Macchi IN2P3 T. Nakamura ICEPPM. Jouvin IN2P3 N. Matsui ICEPPS. Jézéquel IN2P3 H. Sakamoto ICEPPE. Fede IN2P3 T. KawamotoJ.-P. Meyer Irfu


Successor of the project LHC_02: “ATLAS computing” (year 2006 ～ 2012)• The LHC_02 project started as a collaboration

between the computer center of the IN2P3 in Lyon, France (Tier-1 center) and the ICEPP Tier-2 center (associated with the Lyon Tier-1 in the ATLAS “cloud” computing model)

• LHC_02: Various R&D studies, especially, how to exploit efficiently the available bandwidth of the international long distance network connection

Network between Lyon and Tokyo

Lyon

BNL (USA-Long Island)

Triumf (Canada-Vancouver)

ASGC (Taiwan)

Tokyo

New York

SINET

GEANT

RENATER

Lyon

10 Gb/sRTT=300 ms

Exploiting the bandwidth is not a trivial thing: packet loss at various places, directional asymmetry in transfer performance, performance change in time, …

Difficulties to find out the source of NW problems

6

Sitessecurity policy

Network providerssupport area

Projects, VOscomputing model

LYONTOKYOTOKYOLYON

[Mbp

s]

May. 28, 2011

iperf

May. 28, 2011

gridftp

Large

Medium

Small

LYONTOKYO

[MB/

sec]

Case with CC-IN2P3

7

Nov. 17, 2011

LYONTOKYOTOKYOLYON

packet loss cleared

Nov. 17, 2011

Jan. 21, 2012

Misconfiguration on the LBE packet related to QoS

Improved by LAN reconfiguration in CC-IN2P3


• The LHC_02 project was successful• Continue the collaboration to face the

important challenges of the next years: new ATLAS computing model, hardware, software, and networking issues

9

ATLAS Computing Model - Tiers

Implementation of the ATLAS computing model: tiers and clouds

• Hierarchical tier organization based on Monarc network topology

• Sites are grouped into clouds for organizational reasons

• Possible communications:• Optical Private Network

• T0-T1• T1-T1

• National networks• Intra-cloud T1-T2

• Restricted communications: General public network

• Inter-cloud T1-T2• Inter-cloud T2-T2

11

Detector Data Distribution

– RAW and reconstructed data generated at CERN and dispatched at T1s.

– Reconstructed data further replicated downstream to T2s of the SAME cloud

Tier-2Tier-2Tier-2

Tier-2

Tier-0

Tier-1

Tier-2Tier-2Tier-2

Tier-2

Tier-1

Tier-2

Tier-1

O(2to4GB) files(with exceptions)

Data distribution after Reprocessing and Monte Carlo Reconstruction

– RAW data is re-processed at T1s to produce a new version of derived data

– Derived data are replicated to T2s of the same cloud

– Derived data are replicated to a few other T1s (or CERN)

• And, from there, to other T2s of the same cloud Tier-2Tier-2

Tier-2

Tier-0

Tier-1

Tier-2Tier-2

Tier-1

O(2to4GB) files(with exceptions)

Monte Carlo production

– Simulation (and some reconstruction) run at T2s

– Input data hosted at T1s is transferred (and cached) at T2s

– Output data are copied and stored back to T1s

– For reconstruction, derived data are

• replicated to few other T1s (or CERN)

• And, from there, to other T2s of the same cloud

Tier-2Tier-2

Tier-2

Tier-0

Tier-1Tier-1

INPUT

OUTPUT

Analysis

• The paradigm is “jobs go to data” i.e. – Jobs are brokered at sites where data have been pre-

placed– Jobs access data only from the local storage of the site

where they run– Jobs store the output in the storage the site where

they run• No WAN involved.

(by Simone Campana, ATLAS TIM, Tokyo, May 2013)

Issues - I

• You need data at some T2 (normally “your” T2)

• The inputs are at some other T2 in a different cloud

• Examples:– Outputs of analysis jobs– Replication of particular

samples on demand

15

Tier-2

Tier-2

Tier-1Tier-1

According to the model you should:


Issues - II

• You need to process data available only at a give T1

• All sites of that cloud are very busy

• You assign jobs to some T2 of a different cloud

16

Tier-2

Tier-1Tier-1

INPUT

OUTPUTAccording to the model you should:


Evolution of the ATLAS computing model

• ATLAS decided to relax the “monarch model”– Allow T1-T2 and T2-T2 traffic between different

clouds (growth of network bandwidth)• Any site can exchange data with any site if the

system believes it is convenient• So far ATLAS asked (large) T2s

– To be well connected to their T1– To be well connected to the T2s of their cloud

• Now ATLAS are asking large T2s:– To be well connected to all T1s– To foresee non negligible traffic from/to other

(large) T2s

Evolution of the model

18

Tier-2

Tier-2

Tier-1Tier-1

Tier-2

Tier-1Tier-1

Multi-Cloud Monte Carlo production Analysis Output


LHC_07: R&D for ATLAS Grid computing

• Networking therefore remains as a very important issue

• Other topics addressed by the collaboration– Use of virtual machines for operating WLCG services– Improvement of reliability of the middleware for storage– Performance of data access from analysis jobs through

various protocols– Investigation of federated Xrootd storage– Optimization and monitoring of data transfer between

remote sites

WAN for TOKYO

20

TOKYO

ASGC

BNL

TRIUMF

NDGF

RALCCIN2P3CERNCANFPIC

SARANIKEF

LA

PacificAtlantic

10Gbps

10Gbps

WIX

Additional new line (10Gbps)since the end of March 2013

OSAKA

40Gbps

40Gbps

20 Gbps

10 Gbps

Amsterdam

Geneva

Dedicated line

14:50 Overview of the SINET 20' 14:50 Overview of the SINET 20'

LHCONE: New dedicated (virtual) network for Tier-2 centers, etc.“perfSONAR” tool put in place for network monitoring

Budget plan in the year 2013

Item Euro Support-ed by

Item k Yen Support-ed by

Travel 1,000 Travel 160

Nb travels 3 3,000 IN2P3 Nb travels 3 480 ICEPP

Per-diem 230 Per-diem 22.7

Nb days 15 3,450 IN2P3 Nb days 12 272 ICEPP

Nb Travels 1 1,000 Irfu

Nb days 5 1,150 Irfu

Total 8,600 752

Cost of the project• The project uses the existing computing facilities at

the Tier-1 and Tier-2 centers in France and Japan and the existing network infrastructure provided by NRENs and GEANT, etc. The cost for hardware is therefore not necessary in this project.

• For the communication between the members, e-mails and TV conferences are mainly used, but face-to-face meetings are necessary usually once per year (a small workshop), therefore the cost for travel and stay.

proposal: lhc_07 r&d for atlas grid computing

Documents