overview of lustre ece, u of mn changjin hong (prof. tewfik’s group) [email protected] monday,...

24
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) [email protected] Monday, Aug. 19, 2002

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Overview of Lustre

ECE, U of MNChangjin Hong (Prof. Tewfik’s

group)[email protected], Aug. 19, 2002

Page 2: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Outline

• Reference• Lustre Cluster• Lustre System Components• Distributed Lock Manager• Object Based Storage• Conclusion (security issues)

Page 3: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Reference

• Lustre: A SAN File System for Linux– http://www/lustre.org/docs/lustre/luswhi

te.pdf

• Several presentation materials from Dr. Peter J. Braam

Page 4: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

A Lustre Cluster

10,000’s

10’s of nodes

1,000’s

Page 5: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Key Design Issue : Scalability

• I/O throughput– How to avoid bottlenecks

• Metadata scalability– How can 10,000’s of nodes work on files in

same folder

• Cluster Recovery– If sth fails, how can transparent recovery

happen

• Management– Adding, removing, replacing, systems; data

migration & backup

Page 6: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

System Components

Page 7: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Interaction between systems

OST

MDS

Client

CMD protocol(directory) metadata handling,

inodes updates,concurrency

Pre-allocation file creation, recovery purpose, file status,

OS protocolFile I/O, allocation of blocks, striping,

security enforcement

Page 8: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Client File System

• A directory tree, subdivision into filesets for cluster ▷wide Unix file sharing semantics

• CMD protocol– Transaction-based– Authenticated access– Write-behind caching for MD updates

with strict data/metadata coherency

Page 9: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Metadata Service (MDS)

• All access to the file is governed by MDS which will directly or indirectly authorize access.

• To control namespace and manage inodes• Load balanced cluster service for the

scalability (a well balanced API, a stackable framework for logical MDS, replicated MDS)

• Journaled batched metadata updates

Page 10: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Object Storage Targets (OST)

• Keep file data objects• File I/O service ▷Access to the objects• The block allocation for data obj.,

leading distributed and scalability• OST s/w modules

– OBD server, Lock server– Obj. storage driver, OBD filter– Portal API

Page 11: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

VAXCluster DLM adapted

Page 12: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Distributed Lock Manager

• For generic and rich lock service• Lock resources: resource database

– Organize resources in trees

• High performance– node that acquires resource manages

tree

Page 13: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Big Picture

Resource Tree and namespace

<namespace>Name1Name2Name3Name4

:

Obj.2

Obj.1

Obj.3

Obj.4

Resource manager

RR

R R

distributed resource directory/hash function (LDWV)/lock directory

Apps.

Page 14: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Mechanism in resource dB

• Hash binary string % N ▷ get h• Lookup system in lock directory

weightvector [h] ▷ find system K.• Systems

– may occupy 0, 1 or more slots in LDWV– Number of slots is lock directory weight

Page 15: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Lustre DLM features

• Low concurrency– Want write-back caching

• High concurrency– Want load balancing in cluster– Subdivide directories etc with hashes– Want server of request to limit lock

revocations-> ops. on the MD cluster in a client server RPC model

• Deadlock detection

Page 16: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Object Based Storage

Page 17: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Object Based Storage

• Object Based Storage Device– More intelligent than block device

• Speak storage at “inode level”– create, unlink, read, write, getattr, setattr…– Iterators, security, almost arbitrary processing

Page 18: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Components of OB Storage

• Storage Object Device Drivers– Class drivers : attach driver to interface

• Targets, clients : remote access• Direct drivers : to manage physical storage• Logical drivers: for intelligence & storage

management

• Object storage application (OSA)– (cluster) file systems– Advanced storage : parallel I/O, snapshots– Specialized apps. : caches, db’s, filesrv

Page 19: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

System Interface

• Modules– Load the kernel modules to get drivers

of a certain type– Name devices to be of a certain type– Build stacks of devices with assigned

types

Page 20: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Layering of Object Drivers

Page 21: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Interaction of Obj. Storages/w modules

Page 22: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Benefits-clustering/SM

• Suitable for use in a SAN file system• Shared at the level of an individual block• Obj namespace : divided into obj group. Thi

s is very advantageous to be able to create obj w/ given obj id’s. Good for snapshot!

• Hot file migration

Page 23: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Conclusion

• Object Based StorageTo process the disk operations on the higher

concept of individual files and the file inode level, rather than the low-level h/w disk block level.

• Security Issues– Auxiliary service in cluster

• LDAP, PKI, Kerberos

– Purpose• CFS/ MDS/ OST

– Authenticate to each other– Set up session keys

Page 24: Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) hongcj92@ece.umn.edu Monday, Aug. 19, 2002

Etc.

• GSS-API for authentication and Integrity Checks

• Remote DMA– Layer for NEVER bypass security

processing– Request processing for checking

authentication by a higher level layer in the networking stack