self stabilizing distributed file system

Self Stabilizing Distributed File System

Shlomi Dolev and Ronen I. KatShlomi Dolev and Ronen I. Kat

Department of Computer Science, Ben-Gurion Department of Computer Science, Ben-Gurion UniversityUniversity

Research Sponsored by IBMResearch Sponsored by IBM

DFS Motivation

• Performance

• Fault tolerance

• Placing files closer to users

Related Work

• File systems• NFS – network file system protocol• AFS – Andrew file system – CMU(1988)• Coda - CMU (1998)• Intermezzo – Peter J. Braam, CMU

• Peer to peer (2000)• Global storage: OceanStore – Berkeley• Server less: Microsoft Farsite.

Talk Overview

• Self-stabilization• Design• Algorithms• File system implementation• Future work

Self Stabilization

• Self healing• Adaptiveness• Automatic recovery• Autonomic computing

Self StabilizationDijkstra 1974

Self Stabilization

A self-stabilizing system is a system that can automatically recover following the occurrence of (transient) faults.

The idea is to design system that can be started in an arbitrary state and still converge to a desired behaviour.

E.G., Self-stabilization / S. Dolev.

Self Stabilization Motivation

• The combination and type of faults cannot be totallytotally anticipated in on-going systems

• Any on-going system mustmust be self stabilizing (or manually monitored)

• Self-stabilizing algorithm can recover from any arbitrary state reached due to the occurrence of faults

Design

Design

• Replication servers joined to a spanning tree

• A spanning tree is constructed• File updates are propagated using self-

stabilizing -synchronizer

Design (Cont’)

• Clients join the replication tree and form a caching tree

• File leases• Global locking

Algorithms – Self Stabilizing

Electing a leader (leader election)Electing a leader (leader election)• Collecting connectivity information• Optimising communication costs -Synchronizer for file consistency

Leader Election

• A single leader coordinates construction

• If non exists, a server becomes a leader• If more than one exists, one survives• Message are periodically broadcasted

Leader Election Algorithm

• Every T1 do:• If (p = leader) then send-multicast(‘I’m a leader’)• Leader-exists = true

• Every T1+Td do:• If (not leader-exists) then leader = p• Leader-exists = false

• Upon arrival of message do:• If (p.volume=volume) then

• If (p=leader) then leader = min(leader,sender)• Else leader = sender

• Leader-exists = true


• Electing a leader (leader election) Collecting connectivity informationCollecting connectivity information• Optimising communication costs -Synchronizer for file consistency

Induced Graph Example

Update Algorithm

• Collect routing tables from all neighbours in the induced graph

• Elect a manager (local leader) for the tree, a server with the minimal ID

• Build a distributed BFS spanning tree• The algorithm converges


• Electing a leader (leader election)• Collecting connectivity information Optimising communication costsOptimising communication costs -Synchronizer for file consistency

Optimising Communication Costs

• Goal: find the minimal radius that keeps connectivity

• Increase by a factor of 2• Run a 2nd instance of update with < • Searching for using binary search

Tree Structure

Caching Tree

• Extends the replication tree • The update algorithm constructs both• Servers execute two instances• Caches execute one instance

Combined Spanning Tree


• Electing a leader (leader election)• Collecting connectivity information• Optimising communication costs -Synchronizer for file consistency-Synchronizer for file consistency

Synchronization Mechanism

• Provide reliable command and timing• Propagate commands between servers• Collect and distribute information

Replication Consistency

• Verifies signatures• Multiple signature – a conflict• Conflict resolution• Broadcast resolved signature

Locking Table

• A (unified) global lock table • Lock are requested• Leader resolves multiple locks• Lock are removed by cancelling the

locks request

File System Implementation

Accessing a FileLock file

Get signature

Get a copy

Yes

No

No

Use local copy

Yes

Update?

Cached?

Closing a File

Send new signature

Yes

No

Update?

Confirm signature

Meta Access

• Globally processed• Blocked until a lock is

obtained

Lock file

Executecommand

Waitconfirmation

Linux Based bgRFS

Application

User LevelLinux system calls

System Calls

New implementation:

open, close, lstat, mkdir, etc …

SyncDaemon:Cache manager & Server

Up calls

Network Communication

Future Work

• Kernel VFS module.• Communication improvements:

– Reducing update messages– Using timers with -synchronizer

• Performance enhancements• Integrating disconnected operations• Conflict resolution algorithms

Credits

Undergraduate Students:Amir Livneh [email protected] Granik [email protected] Lansky [email protected] Shmuel [email protected] Shish [email protected] Erlich [email protected] Chohen [email protected] Biran [email protected] Fridman [email protected] Bernard [email protected] Ferents [email protected] Feintuch [email protected] Shalev [email protected] Kraim [email protected] Hayuit

FacultyProf Shlomi Dolev [email protected]

Graduate StudentsRonen I. Kat [email protected]

System EngeenierAlbina Budker [email protected]

Visit us atVisit us at

www.cs.bgu.ac.il/~bgrwww.cs.bgu.ac.il/~bgr

fsfs

self stabilizing distributed file system

Documents

senderelse leader

manager local leader

self stabilizationdijkstra

replication tree

volume thenif p

arbitrary state

occurrence of transient

distributed bfs