high performance global file systems - scicomp

33
Garching 2007-07-18 ScicomP 13 High Performance Global File Systems Easy Data Management in Supercomputer Grids Andreas Schott ([email protected])

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Garching 2007-07-18 ScicomP 13

High Performance Global File Systems Easy Data Management in Supercomputer Grids

Andreas Schott ([email protected])

Garching 2007-07-18 ScicomP 13 2

Overview

Motivation / Choices

GPFS / MC-GPFS

DEISA’s Implementation and Status

Garching 2007-07-18 ScicomP 13 3

Motivation for Global File Systems

Advantages

• Simple access

• Standard commands

• No special data preparation

• No re-writing of jobs and binaries

• Everything everywhere at any time

Issues

• Network stability

• Latency

• Performance

• Availability

Garching 2007-07-18 ScicomP 13 4

Available Choices

• (Open)AFS

• GFS

• PVFS

• OCFS

• NFS

• NFS4

• Lustre

• MC-GPFS

Garching 2007-07-18 ScicomP 13 5

General Concepts of MC-GPFS

MC-GPFS = Multiple Cluster General Parallel File System

available for all HPC architectures in DEISA

servers available for AIX and Linux

Principle Structure

distributed – shared – striped

kernel add-on for file system

block oriented data transfer

Features achieved

shared and high performance access

safe and secure data

high administrative flexibility

Garching 2007-07-18 ScicomP 13 6

General Concepts of MC-GPFS

Technical Aspects

each site with its own servers possible

local disk space locally administered

scalability and high performance access by inherent parallelism

easy extensible

file consistency by sophisticated token management

high recoverability and increased data availability

simplified storage management

storage pools, file sets

simplified administration

globally acting commands

Garching 2007-07-18 ScicomP 13 7

General Concepts of MC-GPFS

Security Aspects

separate network communication for administration possible

remote security

authenticated remote access for servers

mount and/or data with SSL-keys

easy root-mapping

easy no-suid functionality

userid mapping for remote access via interfaces

Garching 2007-07-18 ScicomP 13 8

General Concepts of MC-GPFS

Access and Availability

transparent access

no special data transfer commands required

global visibility inside DEISA

extended access rights

no single point of failure communication

delegated locking and other communication

Garching 2007-07-18 ScicomP 13 9

Summary of MC-GPFS

Local and Remote High Performance Access

high parallelism in data and file access

very large file and file system support

High Availability

each site with its own servers

redundant access path

simply extensible and scalable

striped data

parallel access path

Garching 2007-07-18 ScicomP 13 10

Advantages of GPFS (admin)

• Easy Management

• Easy Extensibility

• High Performance

• Security Features

• Add-On Features like HSM Functionality

Garching 2007-07-18 ScicomP 13 11

Advantages of GPFS (user)

• Standard Access Methods

Transparent Access

• Data globally visible

No special actions for data transfer required

• Simplicity

• Extended Access Right Features

• Add-On Features like HSM Functionality

Garching 2007-07-18 ScicomP 13 12

Local GPFS File Servers

Network

File Server

1

File Server

2

File Server

N

FC-Switch

Disk

System

1

Disk

System

2

Disk

System

M

...

...

Garching 2007-07-18 ScicomP 13 13

Local GPFS Access

Network

File Server

1 ... N

FC-Switch

Disk

System

1 ... M

Compute

Server 1

Compute

Server N

Separate

Clusters

One

Clusters

Garching 2007-07-18 ScicomP 13 14

Remote GPFS Access

Network

Site A

Network

Site B

File Server

1 ... N

FC-Switch

Disk

System

1 ... M

Compute

Server 1

Compute

Server N

File Server

1 ... N

FC-Switch

Disk

System

1 ... M

Compute

Server N

Compute

Server 1

WAN

Garching 2007-07-18 ScicomP 13 15

DEISA Partners

Garching 2007-07-18 ScicomP 13 16

Aims of DEISA

Providing HPC resources to the Scientific CommunityOffering an add-on value to local facilities

optimal hardware selection

easy usability

transparent data access

Achievement of these Aimscommon network structure

using internal features of job schedulers

additional middleware for easy access (e.g. UNICORE)

global file system in a network of trust

Garching 2007-07-18 ScicomP 13 17

MC-LoadLeveler in DEISA

Implementation

• Environment Variables for DATA

• Modules

• Local Home Directories

• Job Movement (Filters)

Caveats

• Path Unification

• Treatment of HSM

• Data Availability

Pre- and Post-processing

Garching 2007-07-18 ScicomP 13 18

NJS CINECA IBM P5

IDB UUDB

GatewayCINECA

AIXLL-MC

AIXLL

AIXLL-MC

AIXLL-MC

CINECA user

Super-UXNQS II

AIXLL

job

LINUXLSF

LINUXPBS Pro

AIXLL-MC

AIXLL-MC

LINUXLL

Super-UXNQS II

[email protected]

Garching 2007-07-18 ScicomP 13 19

GatewayCSC

GatewayECMWF

GatewayFZJ

GatewayIDRIS

GatewaySARA

GatewayLRZ

GatewayHPCX

GatewayHLRS

NJS CINECA IBM P5

IDB UUDB

GatewayBSC

GatewayCINECA NJS

FZJ IBM P4

IDB UUDB

NJS RZG IBM P4

IDB UUDB

NJS ECMWF IBM P5

IDB UUDB

NJS CSC IBM P4

IDB UUDB

NJS HPCX IBM P5

IDB UUDB

NJS LRZ SGI ALTIX

IDB UUDB

NJS HLRS NEC SX8

IDB UUDB

AIXLL-MC

AIXLL

AIXLL-MC

AIXLL-MC

CINECA user

Super-UXNQS II

AIXLL

job

NJS SARA SGI ALTIX

IDB UUDB

NJS BSC IBM PPC

IDB UUDB

LINUXLSF

LINUXPBS Pro

GatewayRZG

NJSIDRIS IBM P4

IDB UUDB

AIXLL-MC

AIXLL-MC

LINUXLL

Super-UXNQS II

[email protected]

Garching 2007-07-18 ScicomP 13 20

GatewayCSC

GatewayECMWF

GatewayFZJ

GatewayIDRIS

GatewaySARA

GatewayLRZ

GatewayHPCX

GatewayHLRS

NJS CINECA IBM P5

IDB UUDB

GatewayBSC

GatewayCINECA NJS

FZJ IBM P4

IDB UUDB

NJS RZG IBM P4

IDB UUDB

NJS ECMWF IBM P5

IDB UUDB

NJS CSC IBM P4

IDB UUDB

NJS HPCX IBM P5

IDB UUDB

NJS LRZ SGI ALTIX

IDB UUDB

NJS HLRS NEC SX8

IDB UUDB

AIXLL-MC

AIXLL

AIXLL-MC

AIXLL-MC

CINECA user

Super-UXNQS II

AIXLL

job

NJS SARA SGI ALTIX

IDB UUDB

NJS BSC IBM PPC

IDB UUDB

LINUXLSF

LINUXPBS Pro

GatewayRZG

NJSIDRIS IBM P4

IDB UUDB

AIXLL-MC

LINUXLL

AIXLL-MC

Super-UXNQS II

[email protected]

Garching 2007-07-18 ScicomP 13 21

postgres DB

RFT

IO node

grid gateway

gg.rzg.mpg.de

GLOBUS client tools

grid-proxy-init

globusrun-ws

globus-url-copy

gsissh

internetinternet gsissh

DMZ

hig

h p

erfo

rman

ce

sw

itch

p5io3.rzg.mpg.de

GPFS

LRMS (master)

(head node)

Linux

AIX

disk system

intranet

LRMS (node hosting the LoadLeveler master)

Local Resource Management System (IBM LoadLeverer)

head node (e.g., for code development and testing)

gsisshd 2222

LRMS client

full DEISA CPE available

Cluster compute nodes (IBM P5)

grid gateway (job submission host)

gridftp frontend 2811 (user mode)

gridftp backend (root)

globus container 8443DMZ firewall inbound ports (8443,20000-25000)

(fork), LRMS client

GPFS available

grid-mapfile: (DN � D-GRID username)

D-GRID user

[email protected]

Globus Installation at RZG

Garching 2007-07-18 ScicomP 13 22

GPFS Configuration in DEISA

Each AIX-site provides its own server

Some non-AIX-sites will provide servers based on Linux

RZG hosts disk space for non-AIX-sites without servers

RZG provides HSM-functionality on GPFS

locally disk space performs like local disk space

total of more than 30 TB

wide area network connection with 10GBit/s (mostly)

remotely disk space no longer limited by network

Garching 2007-07-18 ScicomP 13 23

DEISA „proof of concept“ phase

Premium IP:

IP Priority:

LSPs:

DFN

RENATER

GARR

GÈANT

RENATER

GARR

DFN

1 Gb/s

Garching 2007-07-18 ScicomP 13 24

Evolution of GPFS in DEISA

RZG (DE)

Power4

AIX

FZJ (DE)

Power4

AIX

IDRIS (FR)

Power4

AIX

CINECA (IT)

Power5

AIX

October 2004

Garching 2007-07-18 ScicomP 13 25

SDSC

Chicago

New York Amsterdam

GEANT

Milano

Paris

Teragrid

Frankfurt

FZJ

Jülich

RZG

Munich

DFNNREN Germany

Cineca

Bologna

GARRNREN Italy

IDRIS

Orsay

RENATERNREN France

Internet2/Abilene

[email protected]

1 Gb/s Premium IP

1 Gb/s LSP

10 Gb/s

30-40 Gb/s

10 Gb/s

DEISA – TeraGrid Connection

Super Computing 2005

Garching 2007-07-18 ScicomP 13 26

DEISA 1 Gb/s network infrastructure

RENATER

FUNET

SURFnet

DFN

GARR

UKERNA

RedIris

GÉANTLSPs

Garching 2007-07-18 ScicomP 13 27

Evolution of GPFS in DEISA

RZG (DE)

Power4

AIX

FZJ (DE)

Power4

AIX

IDRIS (FR)

Power4

AIX

CINECA (IT)

Power5

AIX

BSC (ES)

PowerPC

Linux

CSC (FI)

Power4

AIX

SARA (NL)

SGI-Altix

Linux

July 2006

Garching 2007-07-18 ScicomP 13 28

Upgrade of Multiple Cluster GPFS

Problems with GPFS 2.3

Initial MC-functionality not inherently integrated

Each-to-Any communication required

Limitation of participating nodes

Advantages of GPFS 3.1

Better Multi-Cluster Support

Better Encapsulation by possible use of private addresses

Higher Independence between sites

Higher Stability

Better Performance

Garching 2007-07-18 ScicomP 13 29

Evolution of GPFS in DEISA

RZG (DE)

Power4

AIX

FZJ (DE)

Power4

AIX

IDRIS (FR)

Power4

AIX

CINECA (IT)

Power5

AIX

LRZ (DE)

SGI-Altix

Linux

CSC (FI)

Power4

AIX

ECMWF (GB)

Power5+

AIXFebruary 2007

Garching 2007-07-18 ScicomP 13 30

Status of Multiple Cluster GPFS

2250 GB20.12640 Power5+ (1.9 GHz)1 TB2ECMWF

39064 GB62.39728 Montecito (1.6 GHz)0 TB(RZG)LRZ

672 GB2.2512 Power4 (1.1 GHz)2 TB2CSC

4.6

6.7

8.9

2.6

TFlops

2368 GB928 Power4 (1.3 GHz)10 TB2RZG

3136 GB1024 Power4 (1.3 GHz)2 TB2IDRIS

5152 GB1288 Power4 (1.7 GHz)4 TB2FZJ

1152 GB480 Power5 (1.9 GHz)2 TB2CINECA

MemoryCompute-CPUsStorageFile-

serverSite

Garching 2007-07-18 ScicomP 13 31

DEISA – Network (estimated Q3 / 2007)

SURFnet

UKERNA FUNET

RedIris

GARR1 Gb/s 10 Gb/s 10 Gb/s 10 Gb/s

RENATER

10 Gb/s

GÉANT2

DFN10 Gb/s

10 Gb/s

10 Gb/s

10 Gb/s

Dedicated 10 Gb/s wavelength

1 Gb/s LSP

Dedicated 10 Gb/s wavelength(potential)

GÉANTLSP

DFN/GÉANTFrankfurt

[email protected]

Garching 2007-07-18 ScicomP 13 32

Evolution of GPFS in DEISA

RZG (DE)

Power4

AIX

FZJ (DE)

Power4

AIX

IDRIS (FR)

Power4

AIX

CINECA (IT)

Power5

AIX

LRZ (DE)

SGI-Altix

Linux

BSC (ES)

PowerPC

Linux

HLRS (DE)

NEC-SX8

Super-UX

CSC (FI)

Power4

AIX

SARA (NL)

SGI-Altix

LinuxEPCC (GB)

Power4

AIX

ECMWF (GB)

Power5+

AIX

CSC (FI)

Cray XT4

Linux

SARA (NL)

Power5

Linux

/deisa/<site>/home/<group>/<user>

/deisa/<site>/data /<group>/<user>

Garching 2007-07-18 ScicomP 13 33

Discussion

Questions?