erlangen regional computing center 08.09.2015 1st international workshop on fault tolerant systems,...

17
ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application using the GASPI communication layer

Upload: steven-sharp

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

ERLANGEN REGIONAL COMPUTING CENTER

08.09.20151st International Workshop on Fault Tolerant Systems,IEEE Cluster `15

Building a fault tolerant application using the GASPI communication layer

Page 2: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

2

Motivation

Nowadays, the increasing computational capacity is mainly due to extreme level of hardware parallelism.

With future machines, the Mean time to failure is expected to be in minutes and hours.

Absence of fault tolerant environment will put precious data at risk.

The lack of well-defined fault tolerant environment is the first big challenge in the development of fault tolerant application.

Building a fault tolerant application using the GASPI communication layer

Page 3: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

3

Automatic Fault Tolerance Application: Approaching the problem

1. Failure detection:

i. Who detects the failure?

ii. Failure information propagation

iii. Consensus about failed processes

2. Processes and communicator recovery

i. Shrink

ii. Spawn

iii. Spare

3. Lost data recovery

Building a fault tolerant application using the GASPI communication layer

Page 4: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

4

Failure detection approaches:

1. Ping based all-to-all• Within each iteration, the health of all procecsses is

check.

2. Ping based neighbor-level:• After neighbor failure detection -> check all-to-all health

3. Unsuccessful communication• After failure detection -> check all-to-all health

4. Dedicated failure detection process(es)• Pings all other processes• Global view of processes healths• Propagates the failure info to remaining processes

Building a fault tolerant application using the GASPI communication layer

Page 5: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

5

Automatic Fault Tolerance Application: Approaching the problem

1. Failure detection: Fault-detector process

2. Processes and communicator recovery Spare nodes

3. Lost data recovery „Neighbor“ node level Checkpoint/Restart

Building a fault tolerant application using the GASPI communication layer

4 5 6 7 8 9

Worker communicator

1 2 3Spare nodes

0 Fault -detector process

Page 6: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

6

Fault tolerance in GASPI: Introduction (I)

GASPI – Developed by Fraunhofer IWTM, Kaiserslautern, Germany Based on PGAS programming model Two memory parts

• Local: only local to the GASPI process (and its threads)

• Global: Available to other processes for reading and writing.

Enables fault tolerance:

• In case of single node failure, rest of the nodes stay up and running

• Provides TIMEOUT for every communication call. Return values:

GASPI_SUCCESS, GASPI_TIMEOUT, GASPI_ERROR

Building a fault tolerant application using the GASPI communication layer

Page 7: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

7

Fault tolerance in GASPI: Introduction (II)

What GASPI provides:• Gaspi_proc_ping(): A process can check the state of a process by

pinging any specific process.

• The return value of ping can either be 0 or 1 (Healthy or dead).

User side:• Deletion of old comm., creation of new comm., new communication

structure, (checkpoint/restart) -> user‘s responsibility

Building a fault tolerant application using the GASPI communication layer

Page 8: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

8

Failure detector (I):

0

1 2 3

4 5 6 7 8 9

Worker communicator

Idle processes

gaspi_proc_ping()return_val = gaspi_wait()

return_val:1) GASPI_SUCCESS2) GASPI_TIMEOUT3) GASPI_ERROR

gaspi_proc_ping()return_val = gaspi_wait()

Fault detector process

Building a fault tolerant application using the GASPI communication layer

Page 9: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

9

Failure detector (II):

0

3

4 5 6 7 8 9

Worker communicator

Idle processes

gaspi_proc_ping()return_val = gaspi_wait()

GASPI_ERROR Failed Proc(s) IDs

Rescue Proc(s) IDs

6, 7 1, 2

Failure detector process

Detector processes informs every process about failure details via gaspi_write().

1 2 return_val:1) GASPI_SUCCESS2) GASPI_TIMEOUT3) GASPI_ERROR

Building a fault tolerant application using the GASPI communication layer

Page 10: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

10

Automatic Fault Tolerance Application

Program flow:

Building a fault tolerant application using the GASPI communication layer

Page 11: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

11

Asynchronous in-memory checkpointing

Building a fault tolerant application using the GASPI communication layer

Page 12: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

12

Benchmarks (I): Test bed

Lanczos algorithm:

Checkpoint data structure: After startup: Every process once stores matrix communication

data structure. Two recent Lanczos vectors are stored at each checkpoint

iteration. Recently calculated eigenvalues.

Test cluster: LiMa – RRZE, Erlangen: 500 nodes, Xeon 5650 "Westmere" chips (12 cores

+ SMT), 2.66 GHz, 24 GB RAM, QDR Infiniband

Building a fault tolerant application using the GASPI communication layer

Checkpoint data: vj, vj+1 metadata

Page 13: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

13

Benchmark (II):

Average ping time per process ~ 5-6 µs

Failure-Detector Process: Weak scaling of ping scan, failure detection and ack. time.

Building a fault tolerant application using the GASPI communication layer

Page 14: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

14

Benchmarks (III):

64s

Fai

lure

det

ecti

on

+

re-i

nit

+ r

edo

-wo

rk

Co

mp

uta

tio

nC

om

pu

tati

on

Num. of nodes = 256, threads-per-process = 12

Failure detection + acknowledgement

+

Re-init

= 11 sec.

Building a fault tolerant application using the GASPI communication layer

# iters. = 3500

Chpt. freq = 500

Page 15: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

15

Remarks:

Worker processes remain undisturbed in failure-free application run.

Overhead only in case of worker failure(s).

Redo-Work after failure recovery Checkpoint Frequency.

Building a fault tolerant application using the GASPI communication layer

Page 16: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

16 Building a fault tolerant application using the GASPI communication layer

Outlook:

Related work: FT communication: › MPICH-V

› User-level Failure Mitigation - MPI (ULFM)

› Fault tolerance Messaging Interface FMI

Node-level checkpoint/restart: › Fault Tolerance Interface (FTI)

› Scalable Checkpoint/Restart (SCR)

Future work: Having multiple failure detector processes. Adding Redundancy for failure detector processes Compartive study: ULFM, SCR

Page 17: ERLANGEN REGIONAL COMPUTING CENTER 08.09.2015 1st International Workshop on Fault Tolerant Systems, IEEE Cluster `15 Building a fault tolerant application

17

Thank you! Questions?

Building a fault tolerant application using the GASPI communication layer

http://blogs.fau.de/essex/

https://bitbucket.org/essex/ghost