cmpt 401 2008 dr. alexandra fedorova lecture vii: distributed file systems

60
CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

Post on 19-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

CMPT 401 2008

Dr. Alexandra Fedorova

Lecture VII: Distributed File Systems

Page 2: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

2CMPT 401 2008 © A. Fedorova

Last Week-end I Could not Check my E-mail

pine binary lives on a file system accessed via NFS.

I use pine for e-mail.

NFS server wasn’t working

Page 3: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

3CMPT 401 2008 © A. Fedorova

Then I Finally Started The E-mail Client

• I’ve read a couple of messages• But after a while my e-mail application froze, and the

following message appeared on the screen:

NFS server fas5310c-h2 not responding still trying

• This was a result of a distributed file system failure• Distributed file systems (DFS) are a topic of today’s lecture

Page 4: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

4CMPT 401 2008 © A. Fedorova

Outline

• Overview of a distributed file system• DFS design considerations• DFS Usage patterns that drive design choices• Case studies

– AFS– NFS

Page 5: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

5CMPT 401 2008 © A. Fedorova

A Distributed File Systemclient server

network

access files

file sharingreplication

Page 6: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

6CMPT 401 2008 © A. Fedorova

How To Design A DFS?

• Some design considerations:– Providing location transparency– Stateful or stateless server?– Failure handling– Caching– Cache consistency

Page 7: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

7CMPT 401 2008 © A. Fedorova

Location Transparency• Location transparency: the client is unaware of the server location• How do you name a remote file? • Is the name structure different from the local file? • Do you include the name of the server in the file name?• Existing file systems have used both designs• Pros of location transparency:

– The client needs not be changed if the server name changes– Facilitates replication of file service

• Cons of location transparency– The client cannot specify the server of choice (i.e., to recover from failures

or to get better performance)

Page 8: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

8CMPT 401 2008 © A. Fedorova

Failures

• Server crash (failstop failures):– All data that’s in the server memory can be lost– The server may write to the local disk and send acknowledgement

to the client, but the data might not have made it to disk, because it was still in the OS buffer or in disk buffer

• Message loss (omission failures):– Usually taken care of by the underlying communication protocol

(i.e., RPC)

Page 9: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

9CMPT 401 2008 © A. Fedorova

Stateless vs. Stateful Server

• Stateless server: loses state in the event of a crash– A stateless server looks to the client like a slow server– Simple server design– Quick recovery after reboot– Client needs to maintain state: if the server crashed, the client

does not know if the operation has succeeded, so it must retry– File operations must be idempotent

• Stateful server: remembers its state and recovers– Complex server design yet simple client design– Longer server recovery time– Permits non-idempotent operations (i.e. file append)

Page 10: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

10CMPT 401 2008 © A. Fedorova

Caching

• File access exhibit temporal locality– If a file has been accessed it will likely be accessed soon again

• Caching: keeping a recently access copy of a file (or part of a file) close to where it is accessed

• Caching can be done at various point in DFS:– Caching in client’s memory– Caching on the client’s disk– Caching in server memory– Caching inside the network (proxy caches)

• Caches anticipate future accesses by doing a read-ahead

Page 11: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

11CMPT 401 2008 © A. Fedorova

Caching and Consistency

• Consistency is about ensuring that the copy of the file is up-to-date• Caching makes consistency an issue: the file might be modified in the

cache, but not at its true source• DFS should maintain consistency to prevent data loss

– When a client modifies the file in its cache, the client copy becomes different from server copy

– We care about consistency, because if the client crashes, the data will be lost

• DFS should maintain consistency to facilitate file sharing– Client A and B share a file– Client A modifies the file in its local cache; Client B sees outdated copy of

the file

Page 12: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

12CMPT 401 2008 © A. Fedorova

Consistency Protocols• Consistency protocol determines when the modified (dirty)

data is propagated to its source• Write-through: instant propagation

– A client propagates dirty data to the server as soon as the data is written

– Reduces the risk of data loss on a crash– May result in a large number of protocol messages

• Write-back: delayed propagation (or lazy propagation)– A client propagates dirty file data when the file is closed or after a delay

(i.e., 30 seconds)– Higher risk of data loss – Smaller number of protocol messages

Page 13: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

13CMPT 401 2008 © A. Fedorova

Consistency Protocols and File Sharing

• When files are shared among the clients, one client may modify the data, causing the data to become inconsistent

• Data consistency should be validated• Approaches: client validation, server validation• Client validation:

– Client contacts the server for validation• Server validation:

– Server notifies the client when the cached data is stale

Page 14: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

14CMPT 401 2008 © A. Fedorova

Granularity of Data Access

• Block granularity: the file is transferred block by block– If you use only a small part of the file, you don’t waste time

transferring the whole file– Cache consistency is done on block-by-block access: consistency

protocol may generate many messages• File granularity: the file is transferred as a whole

– If you have a large file and you do not use the entire file, you waste resources by transferring the entire file

– Cache consistency is done on the whole-file granularity – so there’s fewer consistency messages

Page 15: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

15CMPT 401 2008 © A. Fedorova

How to Design a Good DFS?

• There many design considerations• There are many design choices• Design considerations should be driven by usage patterns,

i.e., how clients use the file system• So, we will look at some common usage patterns• Let these usage patterns drive our design choices

Page 16: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

16CMPT 401 2008 © A. Fedorova

DFS Usage Patterns

• Most files are small• Read operations are much more frequent than write

operations• Most accesses are sequential, random access is rare• Files are usually read in their entirety

Page 17: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

17CMPT 401 2008 © A. Fedorova

DFS Usage Patterns (cont)

• Data in files tends to be overwritten often• Most files are read and written by one user • When users share a file, typically only one user modifies

the file• Fine-grained read/write sharing is rare (in

research/academic environments)• File references show substantial temporal locality

Page 18: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

18CMPT 401 2008 © A. Fedorova

Designing a Good DFS

Design Considerations•Stateless/stateful•Caching•Cache consistency•File sharing•Data transfer granularity

DFS Usage patterns•Temporal locality•Most files are small•Little read/write sharing•Files are accessed in its entirety

Good DFS DesignUsage patterns drive the design

Page 19: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

19CMPT 401 2008 © A. Fedorova

Exercise in DFS Design

• Usage pattern #1: Files are accessed in its entirety• Block or Whole File Transfer Granularity?• Whole file transfer

• Usage pattern #2: Most files are small• Block or Whole File Granularity?• Either whole file transfer or block transfer (small files will

fit in a single block)

Page 20: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

20CMPT 401 2008 © A. Fedorova

Exercise in DFS Design (cont.)

• Usage pattern #3: Read operations are more frequent than write operations

• Client or server caching?• Cache data on the client

Page 21: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

21CMPT 401 2008 © A. Fedorova

Exercise in DFS Design (cont.)

• Usage pattern #4: Data in files tends to be overwritten often

• Instant or delayed propagation of writes?• Delayed propagation of dirty data

• Usage pattern #5: Most files are read/written by one user• Instant or delayed propagation of writes?• No need for instant propagation

Page 22: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

22CMPT 401 2008 © A. Fedorova

Exercise in DFS Design (cont.)

• Usage pattern #6: Fine-grained read-write sharing is rare• Instant or delayed propagation of writes?• No need for instant propagation

• Usage pattern #7: Fine-grained read-write sharing is rare• Server or client validation?• Client validation

Page 23: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

23CMPT 401 2008 © A. Fedorova

Outline

• Overview of a distributed file system• DFS design considerations• DFS Usage patterns that drive design choices• Case studies

– AFS– NFS

Page 24: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

24CMPT 401 2008 © A. Fedorova

Introduction to AFS

• Andrew File System • Design started in 1983 as a joint project between Carnegie

Mellon University (CMU) and IBM• Design team included prominent scientists in the area of

distributed systems: M. Satyanarayanan, John Howard• Goal of AFS: scalability, support for large number of users• AFS is widely used all over the world

Page 25: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

25CMPT 401 2008 © A. Fedorova

Overview of AFS

• Venus – FS client• Vice – FS server• Unix compatibility – access and location transparency

© Pearson Education 2001

Page 26: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

26CMPT 401 2008 © A. Fedorova

Location and Access Transparency in AFSLocation transparency Access transparency

© Pearson Education 2001 © Pearson Education 2001

Page 27: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

27CMPT 401 2008 © A. Fedorova

Key Features of AFS

• Caching of files on the client’s local disk• Whole file transfer• Whole file caching• Delayed propagation of updates (on file close)• Server validation

Page 28: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

28CMPT 401 2008 © A. Fedorova

Whole File Transfer and Caching

• When the client opens the file, the whole file is transferred from server to client

• Venus writes a copy of the file to the local disk• Cache state survives client crashes and reboots• This reduces the number of server accesses and answers

the requirement of scalability

Page 29: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

29CMPT 401 2008 © A. Fedorova

Whole File Transfer and Caching vs. Usage Patterns

• How is the decision to transfer and cache whole files driven by usage patterns?

• Most files are small• Most files are accessed in their entirety – so it pays to

transfer the whole file• File accesses exhibit high temporal locality, so caching

makes sense

Page 30: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

30CMPT 401 2008 © A. Fedorova

Delayed Update Propagation• Delayed update propagation: client propagates file updates on close • Server invalidates cached file copies on other clients• On opening a file, the client asks from the server a callback promise –

the server promises to tell the client when the file is updated• Callback promise can be:

– valid – means that the cached copy is valid– cancelled – means that the cached copy is invalid

• When a client propagates changes to the file, the server sends callbacks to all client who hold callback promises

• The clients set their callback promises to “cancelled”• The server must remember client to whom it promised callbacks – the

server is stateful

Page 31: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

31CMPT 401 2008 © A. Fedorova

Delayed Update Propagation vs. Usage Patterns

• Most files are read and written by one client – delayed update propagation is acceptable

• Fine-grained data sharing is rare – delayed update propagation is acceptable

• Note: callback cancellation message can be lost due to server or client crash

• In this case the client will see an inconsistent copy of data• Such weak consistency semantics were deemed

acceptable due to infrequent fine-grained sharing

Page 32: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

32CMPT 401 2008 © A. Fedorova

Server Propagation and Scalability

• In AFS 1 client asked the server if the file was still valid on every open

• This generated lots of client/server traffic and limited system scalability

• In AFS 2 it was decided to use the callback system

Page 33: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

33CMPT 401 2008 © A. Fedorova

Client Opens a File

User processopen file

UNIX kernel

If file is shared, pass request to Venus

Venus

Check if file is in the cache. If it is and if callback promise is valid, open the local copy and return the file descriptor. If not, ask the file from the server.

Place a file in the local disk cache, open it and return the file descriptor to the user process.

Vice

Transfer file to Venus

Page 34: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

34CMPT 401 2008 © A. Fedorova

Client Reads a File

User processread file

UNIX kernel

perform a normal Unix read operation on the local copy

Venus Vice

Page 35: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

35CMPT 401 2008 © A. Fedorova

Client Writes a File

User processwrite file

UNIX kernel

perform a normal Unix write operation on the local copy

Venus Vice

Page 36: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

36CMPT 401 2008 © A. Fedorova

Client Closes a File

User processwrite file

UNIX kernel

close a local copy and notify Venus that the file has been closed

Venus

If the file has been changed, send a copy to the Vice server

Vice

Replace the file contents and send a callback to all other clients holding callback promises on the file

Page 37: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

37CMPT 401 2008 © A. Fedorova

Summary: AFS

• Goal is scalability: this drove the design• Whole file caching reduces the number of server accesses• Server validation (based on callbacks) reduces the number

of server accesses• Callbacks require keeping state on the server (the server

has to remember the list of clients with callback promises)• Client is also stateful in some sense (client cache state

survives crashes)

Page 38: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

38CMPT 401 2008 © A. Fedorova

Introduction to NFS

• NFS – Network File System• In widespread use in many organizations• Developed by Sun, implemented over Sun RPC, can use

either TCP or UDP• Key features:

– Access and location transparency (even inside the kernel!)– Block-granularity of file access and caching– Delayed update propagation– Client validation– Stateless server– Weak consistency semantics

Page 39: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

39CMPT 401 2008 © A. Fedorova

Access Transparency in the NFS

VFS is a software layer that redirects file-related system calls to the right file system (such as NFS)

VFS provides access transparency at user level and inside the kernel

© Pearson Education 2001

Page 40: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

40CMPT 401 2008 © A. Fedorova

VFS and vnodes

• VFS – virtual file system• A layer of software in the kernel• Contains: a set of vnode data structures• vnode represent files and directories• A vnode contains (among other things):

– Name of the file– Function pointers that should be called when this file is operated

upon– A vnode representing an NFS file is set up to call into the NFS client– The NFS client then redirects file operations to the server

Page 41: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

41CMPT 401 2008 © A. Fedorova

Location Transparency in NFS

• Client sees the file directory structure that looks the same as for a local file system• A remote directory is mounted at a mount point• A mount point is the name of local directory that is mirrored remotely• Mounting sets up the directory vnode to call into NFS client on operations on that

directory

© Pearson Education 2001

Page 42: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

42CMPT 401 2008 © A. Fedorova

Hard vs Soft Mounts

• Hard mount:– When the server crashes, the client blocks, waiting for the server

to start responding • Soft mount:

– NFS client times out, returns error to the application• Most applications are not written to handle file access

errors• As a result, NFS systems usually use hard mounts

Page 43: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

43CMPT 401 2008 © A. Fedorova

Block Granularity of File Access and Caching

• NFS uses VFS’s cache• VFS caching is on block granularity, so NFS files are accessed

and cached on a block granularity• Typical block size is 8KB• Files are cached on the client only in memory, not on disk• Unlike in AFS, cache state does not survive client crashes

Page 44: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

44CMPT 401 2008 © A. Fedorova

Block Granularity? But Most Files Are Accessed In Their Entirety!

• NFS does pre-fetching to anticipate future accesses from the client (recall: most files are accessed sequentially)

• Pre-fetching is modest at start (pre-fetch 1 or 2 blocks at a time)

• But it gets more aggressive if the client shows sequential access patterns, i.e., if pre-fetched data is actually used

• So why doesn’t NFS do whole file caching, like AFS?– Block granularity caching allows for access transparency

within the kernel and unified VFS cache management– Unified VFS cache management facilitates more efficient

use of memory

Page 45: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

45CMPT 401 2008 © A. Fedorova

Client Caching in NFS

• Delayed propagation• Updates to cached file blocks are not propagated to the

server immediately; they are propagated when:– A file is closed– An application calls “sync”– A flush daemon writes the dirty data back to the server

• Therefore, clients accessing the same file may see inconsistent copies

• To maintain consistency, NFS uses client polling

Page 46: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

46CMPT 401 2008 © A. Fedorova

Consistency and Client Polling

• Polling is based on two timestamps• Tc – the timestamp when the cache entry was last validated• Tm – the timestamp when the data was last modified on the

server• Client polls the server at interval t (between 3 and 30 sec.)• A cache entry is valid at time T if:

– T-Tc > t (fewer than t seconds elapsed since last validation), or– Tm_client = Tm_server (data has not been modified on the

server since the client requested it)

Page 47: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

47CMPT 401 2008 © A. Fedorova

Reducing Polling Overhead

• Client polling can result in significant overhead (this is why it was decided not to use it in AFS)

• Measures to reduce polling overhead in NFS:– When a client receives new Tm value from the server, it

applies it to all blocks from the same file– Tm for file F is piggybacked on all server responses for all

operations on file F– Polling interval t is set adaptively for each file, depending on

the frequency of updates to that file– t for directories is larger: 30-60 seconds

Page 48: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

48CMPT 401 2008 © A. Fedorova

Server Caching and Failure Modes

• In addition to client caching there is caching on the server• NFS server caches files in its local VFS cache• Writes to the local file system use delayed propagation:

– Data is written to the VFS cache– Flush daemon flushes dirty data to disk every 30 seconds or so

• How should NFS client/server interoperate in face of server caching?

Page 49: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

49CMPT 401 2008 © A. Fedorova

NFS Client/Server Interaction in the Presence of Server Caching

• Option #1: Client writes the data to the server. Server sends acknowledgement to the client after writing the data to the local file cache, not to disk– Advantage: Fast response to the client– Disadvantage: If the server crashes before the data is written to

disk, the data is lost.• Option #2: Client writes the data to the server. Server

syncs the data to the disk, then responds to client– Advantage: The data is not lost if the server crashes– Disadvantage: Each write operation takes longer to complete,

this can limit server’s scalability

Page 50: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

50CMPT 401 2008 © A. Fedorova

NFS Client/Server Interaction in the Presence of Server Caching (cont)

• Older versions of NFS used write-through (option #2)• This was recognized as performance problem• NFSv3 introduced a commit operation• Client can ask the server to flush the data to disk by

sending the commit operation

Page 51: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

51CMPT 401 2008 © A. Fedorova

NFS Statelessness

• NFS server is stateless• It forgets its state when it crashes. Local FS recovers local file

system state if it was made inconsistent because of a crash• The client keeps retrying the operations until the server reboots• All file operations must be idempotent

– Reading/writing file blocks is idempotent– Creating/deleting files is idempotent – the server’s local FS will

not allow to create/delete the same file twice

Page 52: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

52CMPT 401 2008 © A. Fedorova

NFS vs. AFS

• Access granularity:– AFS: whole file– NFS: block

• Server statefulness:– AFS: stateful– NFS: stateless

• Client caching:– AFS: whole file, on local disk– NFS: blocks, in memory only

Page 53: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

53CMPT 401 2008 © A. Fedorova

NFS vs. AFS (cont.)

• Cache validation– AFS: server validation with callbacks– NFS: client validation using polling

• Delay propagation– AFS: on file close– NFS: on file close or on sync or by flush daemon

• Consistency semantics– AFS: weak consistency– NFS: weak consistency

Page 54: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

54CMPT 401 2008 © A. Fedorova

Comparing Scalability: NFS vs. AFS

Page 55: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

55CMPT 401 2008 © A. Fedorova

Explanation of Results• This benchmark performed operations as:

– Scan the directory – read attributes of every file in a recursive directory traversal

– Read lots of files from the server (every byte of every file)– Make (compile a large number of files residing on remote server)

• AFS scaled better – its performance degraded at a smaller rate as the load increased

• NFS performance suffered during directory scans, reading many files and compilation – these actions involve many “file open” operations

• NFS needs to check with server on each “file open” operation, while AFS does not

Page 56: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

56CMPT 401 2008 © A. Fedorova

Which Is Better: NFS or AFS?

• AFS showed greater scalability• But NFS is in widespread use• Why? • There are good things about NFS too

– NFS is stateless, so server design is simpler– AFS requires caching on local disk – this takes space on the

client’s disk– Networks have become much faster. What if we re-

evaluated NFS/AFS comparison now? Would AFS still scale better?

Page 57: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

57CMPT 401 2008 © A. Fedorova

Weak Consistency Semantics on DFS?

• Both AFS and NFS provide weak consistency semantics for cached files

• This makes fine-grained file sharing impossible• Isn’t this a bad design decision? What do you think?• What guided this design decision:

– Most applications do not perform fine-grained sharing– Supporting strong consistency is hard– It’s a bad idea to optimize the system for the uncommon

case, especially if the optimization is so hard to implement– For fine-grained sharing users should use database systems

Page 58: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

58CMPT 401 2008 © A. Fedorova

Summary

• Considerations in DFS design:– Access granularity (whole-file vs. block)– Client caching (disk or memory, whole-file or block)– How to provide access and location transparency– Update propagation (immediate vs. delayed)– Validation (client polling vs. server callbacks)

Page 59: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

59CMPT 401 2008 © A. Fedorova

Summary (cont)

• Usage patterns that drive design choices:– Most files are small– Most files are accessed in their entirety– Most accesses are sequential, random access is rare– File references exhibit strong temporal locality– Most files are read and written by one user– Fine-grained file sharing is rare– Users are comfortable with weak consistency semantics

Page 60: CMPT 401 2008 Dr. Alexandra Fedorova Lecture VII: Distributed File Systems

60CMPT 401 2008 © A. Fedorova

Other DFS Architectures (preview)

• There are many other DFS architectures• Some DFS allow replication in face of concurrent updates: multiple client

write data at the same time and the data is kept consistent on multiple servers

• Some DFS allow automatic failover: when one server fails the other one automatically starts serving files to client

• Some DFS allow disconnected operation• Some DFS are designed to operated in low network bandwidth conditions• There are file systems with transactional semantics (and you will design

one!)• There are serverless (peer-to-peer) file systems• We will look at some of these DFS in future lectures