february, 2015 bill loewe - hpc advisory council

18
February, 2015 Bill Loewe

Upload: others

Post on 14-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

February, 2015

Bill Loewe

Seagate Confidential

File System Metadata, a growing issue

Parallel File System - Lustre Overview

Metadata and Distributed Namespace

Test setup and implementation for metadata testing

Scaling Metadata Servers

High Availability

Agenda

Seagate Confidential

File System Performance typically viewed in Bandwidth

Bandwidth problem largely addressed, but metadata is a

growing issue.

We see this in workloads with high numbers of files to

access and process.

• Genome processing

• CPU Chip manufacturing

• Video compositing/rendering

Metadata Performance

Lustre Parallel File System

Lustre is an open source,

distributed parallel file system

Object-based design provides

extreme scalability

Compute clients interact directly

with storage servers

Comprised of:

Clients

Metadata Servers and Targets

Storage Servers and Targets

Seagate Confidential

Distributed NamespacE (DNE) is a new feature available

in Lustre 2.5 that allows multiple MDS / MDT

components to participate in a single file system.

DNE allows the namespace to be divided across multiple

metadata servers.

Enables the size of the namespace and metadata

throughput to be scaled with the number of servers.

The Lustre DNE project is comprised of 2 phases.

Lustre Distributed NamespacE (DNE)

Seagate Confidential

Phase 1, Lustre 2.5 Release

Remote Directories -- Lustre sub-directories are

distributed over multiple metadata targets (MDTs).

Sub-directory distribution is defined by an

administrator.

Root

dir a

File

dir b

dir b2

File

dir c

dir c2

File

dir d

dir d2

File

dir e

dir e2

File

Remote Directories

Seagate Confidential

Phase 2, Lustre 2.7

Striped Directories -- The contents of a given directory

are distributed over multiple MDTs.

File

dir c2

File

Striped Directory

dir e2

Striped Directories

Seagate Confidential

Engineered Storage Solutions for HPC, Big Data & Cloud

ClusterStor Parallel

file system/Object

Data protection

Linux OS

Flash optimization

BIOS/IPMI

GEM diagnostics

Custom X86 embedded server

Seagate storage platforms

High availability

File system (Ext4)

High speed networking (IB/40GB/e)

Architected Integrated Optimized Qualified Supported

Seagate Storage Devices

Seagate Confidential

OSS

Lustre Components

Clients

MDS OSS OSS

Directory Operations, File

open/close, metadata, and concurrency

File creation, file status, and recovery

File I/O and locking

ClusterStor Management Unit (CMU):

Management and Metadata (MDS/MDT) CSM Manager and MDS/MGS

Nodes

2RU 4-node Sandy Bridge Servers

– Server 1: CSM Mgmt

– Server 2: Boot

– Server 3: MGS

– Server 4: MDS

Fault Tolerance (active/passive)

Serviceability

2U24 JBOD – MDT

SAS JBOD for

MDS/MGS/Management

Disk Configuration

– Qty 4 Lustre Management (MGS)

– Qty 4 ClusterStor Management

and NFS

– Qty 2 Global Hot spares

– Qty 14 Drives for MDT

Scalable Storage Unit (SSU)

SSU

5U84 Enclosure

2 Object Storage Servers’s per

SSU

Two (2) trays of 42 HDD’s each

for Object Storage Targets

H/A on each SSU

Infiniband QDR/FDR and 40Gb

Ethernet data network

connectivity

ClusterStor & Lustre 2.5 DNE Hardware

DNE is available in ClusterStor v2.0

• MDT0 is master and default in DNE environment

DNE Servers are configured in active / active pairs

• Seagate 2U24 with 2 MDS embedded server modules

Scale Metadata Capacity / Performance with DNE Server

pairs

Base MDS

Root

dir a

File

dir b

dir b2

File

dir c

dir c2

File

dir d

dir d2

File

dir e

dir e2

File

Object Storage Server

Seagate Embedded Application Server

Object Storage Target Seagate 5U84 Storage Bay Bridge Enclosure

ClusterStor Hardware and the Lustre File System Meta Data and Management

Servers 2U x 4 Servers

Meta Data Target

Seagate 2U24 JOBD

1) Where is file?

2) File is at….

Client

File

3) Single File (3,072Kb)

5a) File block stripe 1 of 3 (1,024Kb)

5b) File block stripe 2 of 3 (1,024Kb)

5c) File block stripe 3 of 3 (1,024Kb)

4) File is broken into block stripe segments (1,024Kb)

Seagate Confidential

Scaling MDS and DNEs

•MDS + 4 DNE Servers

(2 ADUs)

•mdtest create/stat/del

•Mean of 5 iterations

0

100,000

200,000

300,000

400,000

500,000

600,000

Op

/s

mdtest scaling MDS + 4 DNEs

Mean Create

Mean Stat

Mean Remove

Seagate Confidential

Metadata High Availability

MDT failover will ensure that the

Lustre filesystem remains

available in the face of MDS node

failure

Based on existing OSS pair

failover model

Failover is graceful, quick, and

non-disruptive

Failback is automatic and non-

disruptive

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

BeforeFailover

Failed over After Failover

Op

/s

High Availability and Performance

Mean Create

Mean Stat

Mean Remove

15

Green Machine: Environmentally-Aware Cold Storage Solution

Power

Space

Cooling

Green

Light weight Small foot print

Cold storage optimized design

Recyclable chassis Reduced metal

Responsible disposal of old chassis

Zero heat emission Ambient cooling/No fans

High operating temp. tolerant HDDs

Dynamic power management Low power servers

Aggressive TCO goals

Lowest Operating Cost

Reduced Carbon footprint

“Best for the Planet”

16

Typical Use cases

• Retrieve content, photographs etc. from deep archive while maintaining consistent user experience

• Online pictures/Social media store use cases

• Pictures >45 days in cold storage

• Retrieve MRIs/X-rays of a patient

• Use cases leveraging Tape-based solutions

Thank you !