distributed file systems for exascale...

1
FusionFS Overview Access Patterns Develop both theoretical and practical aspects of building distributed files systems scalable to exascale supporting millions of nodes and billions of concurrent IO requests Goal Acknowledgement This work is supported by NSF grant OCI-1054974 ZHT: distributed metadata management system HyCache: SSD/HDD caching IDAStore: GPU-based coding for data redundancy FFSNET: light-weighted data transfer protocol PAFS: provenance-aware distributed file system Building Blocks Current Status FusionFS prototype with POSIX has been developed FusionFS has been deployed on: o Linux cluster (512-cores) o IBM Bluegene/P (2048-cores) Benchmarks tested: o IOZone and IOR o Metadata: Excellent scalability (4ms@1-node 7ms@512-nodes) Support asynchronous file writes Test with real scientific applications Scale to 32K-cores Next Release (before 2013) References [1] FusionFS project website: http://datasys.cs.iit.edu/projects/FusionFS/index.html [2] Ioan Raicu, Ian Foster and Pete Beckman. Making a Case for Distributed File Systems at Exascale, ACM Workshop on Large-scale System and Application Performance (LSAP), 2011 Scale to 1 million nodes Support fault tolerance with replications and erasure coding Improve FUSE performance and/or develop FusionFS kernel module Long-Term Plan (before 2016) Dongfang Zhao Department of Computer Science Illinois Institute of Technology [email protected] Ioan Raicu Department of Computer Science Illinois Institute of Technology [email protected] Motivation Current architecture (i.e. compute nodes are remotely connected to storage nodes) would unlikely scale well at exascale Proposed Architecture Distribute data into local persistent storage to explore data locality for computation Distribute metadata into local persistent storage to remove the bottleneck of centralized metadata management Coexist with remote parallel file systems Distributed File Systems for Exascale Computing Features Distributed Metadata Management Distributed Data Management Data Indexing Relaxed Semantics Data Locality Overlapping I/O with Computations POSIX 1-many read (all processes read the same file and are not modified) many-many read/write (each process read/write to a unique file) write-once read-many (files are not modified after it is written) append-only (files can only be modified by appending at the end of files) metadata (metadata is created, modified, and/or destroyed at a high rate). Metadata performance on IBM Bluegene/P

Upload: others

Post on 12-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed File Systems for Exascale Computingdatasys.cs.iit.edu/publications/2012_SC12-FusionFS-poster.pdf · " Coexist with remote parallel file systems Distributed File Systems

FusionFS Overview

Access Patterns

Develop both theoretical and practical aspects of building distributed files systems scalable to exascale supporting millions of nodes and billions of concurrent IO requests

Goal

Acknowledgement This work is supported by NSF grant OCI-1054974

•  ZHT: distributed metadata management system •  HyCache: SSD/HDD caching •  IDAStore: GPU-based coding for data redundancy •  FFSNET: light-weighted data transfer protocol •  PAFS: provenance-aware distributed file system

Building Blocks

Current Status •  FusionFS prototype with POSIX has been developed •  FusionFS has been deployed on: o  Linux cluster (512-cores) o  IBM Bluegene/P (2048-cores)

•  Benchmarks tested: o  IOZone and IOR o  Metadata: Excellent scalability (4ms@1-node à 7ms@512-nodes)

•  Support asynchronous file writes •  Test with real scientific applications •  Scale to 32K-cores

Next Release (before 2013) References [1] FusionFS project website: http://datasys.cs.iit.edu/projects/FusionFS/index.html [2] Ioan Raicu, Ian Foster and Pete Beckman. Making a Case for Distributed File Systems at Exascale, ACM Workshop on Large-scale System and Application Performance (LSAP), 2011 •  Scale to 1 million nodes

•  Support fault tolerance with replications and erasure coding •  Improve FUSE performance and/or develop FusionFS kernel module

Long-Term Plan (before 2016)

Dongfang Zhao Department of Computer Science

Illinois Institute of Technology [email protected]

Ioan Raicu Department of Computer Science

Illinois Institute of Technology [email protected]

Motivation Current architecture (i.e. compute nodes are remotely connected to storage nodes) would unlikely scale well at exascale

Proposed Architecture v  Distribute data into local persistent storage to

explore data locality for computation v  Distribute metadata into local persistent storage to

remove the bottleneck of centralized metadata management

v  Coexist with remote parallel file systems

Distributed File Systems for Exascale Computing

Features  Ø  Distributed Metadata

Management Ø  Distributed Data

Management Ø  Data Indexing Ø  Relaxed Semantics Ø  Data Locality Ø  Overlapping I/O with

Computations Ø  POSIX

q  1-many read (all processes read the same file and are not modified)

q many-many read/write (each process read/write to a unique file)

q  write-once read-many (files are not modified after it is written)

q  append-only (files can only be modified by appending at the end of files)

q metadata (metadata is created, modified, and/or destroyed at a high rate).

Metadata  performance  on  IBM  Bluegene/P