a comparative experimental study of parallel file systems for … · 2019-02-25 · a comparative...

Post on 26-Feb-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Comparative Experimental Study of

Parallel File Systems for Large-Scale Data

Processing

Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas

Institute of Computer Science (ICS)

Foundation for Research and Technology Hellas (FORTH)

Heraklion, Crete, Greece

FORTH-ICS First USE�IX LaSCo Workshop 2008 2

Evolution

• Distributed file systems

– NFS versions 2, 3, 4; AFS; etc.

• Shared-disk parallel file systems

– Frangipani/Petal; GPFS; GFS; etc.

• Separating data/metadata paths to object storage

– NASD; pNFS; Panassas; Lustre; PVFS; etc.

FORTH-ICS First USE�IX LaSCo Workshop 2008 3

NASD-style Parallel File Systems

Open, Lookup, etc.Read, Write

FORTH-ICS First USE�IX LaSCo Workshop 2008 4

Lustre, PVFS

• Open-source systems, following the NASD paradigm

– Lustre: Cluster File Systems, Inc.; acquired (2007) by Sun

– PVFS: Clemson University, ANL, OSC

• Targeted for large-scale data processing

– LLNL, ORNL, ANL, CERN

• Representative of different approaches to filesystem design

– Client caching

– Statelessness

– Consistency and file access semantics

– Portability

FORTH-ICS First USE�IX LaSCo Workshop 2008 5

Lustre Architecture

Linux ext3 (modified)Linux ext3 (modified)

Intent-based locking

POSIX semantics

Pre-allocation of MD objs

active/backup

Stripping

Data/MD

CachingTCP/IP, RDMA, etc.

Kernel-based

Client Structure

FORTH-ICS First USE�IX LaSCo Workshop 2008 6

PVFS2 Architecture

Stripping

No Data Caching

DBPF (“database plus files”)Any Linux file system

No file locking;

“non-conflicting writes”

multiple active

User-level

Client Structure

TCP/IP, RDMA, etc.

FORTH-ICS First USE�IX LaSCo Workshop 2008 7

Benchmarks

• Streaming; one client, many servers

– IOZone

• Streaming: many clients, many servers

– Parallel I/O (MPI)

• Metadata-intensive

– Cluster PostMark

• Near-random I/O, optionally with data overlap

– Tile I/O (MPI)

• User-perceived response time

– ls –lR on Linux kernel tree

FORTH-ICS First USE�IX LaSCo Workshop 2008 8

Experimental Testbed

PVFS 2.6.3

Lustre 1.6.0.1

MPICH 2Metadata Server

Object Server Object Server Object Server Object Server

Linux ext3 or

modified ext3

Lustre stripe: 64KB, 256KB, 1MB (default)

PVFS2 stripe: 64KB (default)

FORTH-ICS First USE�IX LaSCo Workshop 2008 9

Streaming: One Client, Many Servers

(IOZone Benchmark)

PVFS 2.6.3

Lustre 1.6.0.1

MPICH 2Metadata Server

Object Server Object Server Object Server Object Server

Linux ext3 or

modified ext3

Lustre stripe: 64KB, 256KB, 1MB (default)

PVFS2 stripe: 64KB (default)

1, 4 threads

Block sizes 1KB, 64KB, 1MB, 4MB

FORTH-ICS First USE�IX LaSCo Workshop 2008 10

Streaming: One Client, Many Servers

(IOZone Benchmark) – Lustre

4 Threads

1 Thread

1 client, 12 servers

95% CPU

70% CPU

FORTH-ICS First USE�IX LaSCo Workshop 2008 11

Streaming: One Client, Many Servers

(IOZone Benchmark) – PVFS

1 Thread

4 Threads

1 client, 12 servers

50% CPU

20% CPU

FORTH-ICS First USE�IX LaSCo Workshop 2008 12

Streaming: Many Clients, Many Servers

(Parallel I/O Benchmark)

PVFS 2.6.3

Lustre 1.6.0.1

MPICH 2Metadata Server

Object Server Object Server Object Server Object Server

Linux ext3 or

modified ext3

Lustre stripe: 64KB, 256KB, 1MB (default)

PVFS2 stripe: 64KB (default)

Block sizes 64KB-16MB

1. Writes to separate files

2. Reads from single file

3. Read-modify-write to single file

FORTH-ICS First USE�IX LaSCo Workshop 2008 13

Streaming: Many Clients, Many Servers

(Parallel I/O Benchmark)

Lustre

PVFS2

Writes to Separate Files; 12 clients, 12 servers

FORTH-ICS First USE�IX LaSCo Workshop 2008 14

Streaming: Many Clients, Many Servers

(Parallel I/O Benchmark)

Lustre

PVFS2

Reads from Single File; 12 clients, 12 servers

Lustre

PVFS2

FORTH-ICS First USE�IX LaSCo Workshop 2008 15

Cluster PostMark

• Three configurations:

(a) Few (100), large (10-100MB) files

(b) Moderate number (800) of medium-size (1-10MB) files

(c) Many (8000), small (4-128KB) files

FORTH-ICS First USE�IX LaSCo Workshop 2008 16

Cluster PostMark – Bandwidth

Block Size

12 clients, 12 servers

FORTH-ICS First USE�IX LaSCo Workshop 2008 17

Cluster PostMark – Transactions

Block Size

12 clients, 12 servers

FORTH-ICS First USE�IX LaSCo Workshop 2008 18

Near-random I/O, optionally with data overlap

(Tile I/O)

block element tile

• two-dimensional logical structure overlaid on a single file

• each tile assigned to a separate client process

FORTH-ICS First USE�IX LaSCo Workshop 2008 19

Tile I/O - Reads

FORTH-ICS First USE�IX LaSCo Workshop 2008 20

Tile I/O - Writes

FORTH-ICS First USE�IX LaSCo Workshop 2008 21

User-Perceived Response Time

80PVFS2

58Lustre

5.5Linux ext3 (local)

Response Time (sec)File System

ls –lR on Linux 2.6.12 kernel tree (~25,000 files)

FORTH-ICS First USE�IX LaSCo Workshop 2008 22

Conclusions

• Scalable I/O bandwidth is achievable through

parallel I/O paths to file servers

• Lustre’s efficient metadata management is critical

for metadata-intensive applications

• Lustre’s consistency semantics are useful to some

applications but cause unnecessary overhead to

others that do not require them

FORTH-ICS First USE�IX LaSCo Workshop 2008 23

Thank You!

top related