distributed file systems - brown universitycs.brown.edu/courses/csci1380/s17/lectures/15dfs.pdf ·...
TRANSCRIPT
CS 138 XV–2 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Outline
• Failure • Basic concepts • NFS version 2 • CIFS • DCE DFS • NFS version 4
CS 138 XV–3 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Synchronous vs. Asynchronous
• Execution speed – synchronous: bounded – asynchronous: unbounded
• Message transmission delays – synchronous: bounded – asynchronous: unbounded
• Local clock drift rate: – synchronous: bounded – asynchronous: unbounded
CS 138 XV–4 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Failures
• Omission failures – something doesn’t happen
- process crashes - data lost in transmission - etc.
• Byzantine (arbitrary) failures – something bad happens
- message is modified - message received twice - etc.
• Timing failures – something takes too long
CS 138 XV–5 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Detecting Crashes
• Synchronous systems – timeouts
• Asynchronous systems – ?
• Fail-stop – an oracle lets us know
CS 138 XV–6 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Gossip Scenario
Client
Client
Client Client
Replica Manager
Replica Manager
Replica Manager
CS 138 XV–7 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Data Server
Data Server
Data Server
Data Server
DFS Scenario
Client
Client
Client Client File Server
Cache
Cache
Cac
he C
ache
CS 138 XV–8 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Components
• Data state – file contents
• Attribute state – size, access-control info, modification time,
etc. • Open-file state
– which files are in use (open) – lock state
CS 138 XV–9 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Possible Locations data
cache
attr cache
open-file state
Client
data cache
attr cache
open-file state
Client
data cache
attr cache
Server
local file system
open-file state
CS 138 XV–10 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
In Practice …
• Data state – NFS
- weakly consistent - less weak if program uses locks
– CIFS and DFS - strictly consistent
• Lock state – must be strictly consistent
CS 138 XV–11 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Thursday morning, November 17th At 7:00 a.m.
Maytag, the department’s central file server, will be taken down to kick off a filesystem consistency check.
Linux machines will hang. All Windows users should log off.
Normal operation will resume by 8:30 a.m. if all goes well. All windows users should log off before this time.
Questions/concerns to [email protected]
CS 138 XV–12 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Failures in a Local File System
On-Disk File System
Cache
0 1 2 3
.
.
.
n–1
1 rw 0
Open-File State Server
Client
Client Client
Client
On-Disk File System
Cache
0 1 2 3
.
.
.
n–1
1 rw 0
Open-File State Server
Client
Client Client
Client
CS 138 XV–13 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Distributed Failure
On-Disk File System
Cache
0 1 2 3
.
.
.
n–1
1 rw 0
Open-File State Server
Client
Client Client
Client
On-Disk File System
Cache
0 1 2 3
.
.
.
n–1
1 rw 0
Open-File State Server
Client
CS 138 XV–14 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
In Practice … • NFS version 2
– relaxed approach to consistency – handles failures well
• CIFS – strictly consistent – intolerant of failures
• DCE DFS – strictly consistent – sort of tolerant of failures
• NFS version 4 – either relaxed or strictly-consistent – handles failures very well
CS 138 XV–15 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
NFS Version 2
• Released in mid 1980s • Three protocols in one
– file protocol – mount protocol – network lock manager protocol
Basic NFS Extended NFS
CS 138 XV–16 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Distribution of Components data
cache
attr cache
open-file state
NFSv2 client
data cache
attr cache
open-file state
NFSv2 client
data cache
attr cache
NFSv2 server
local file system
CS 138 XV–17 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Consistency in Basic NFSv2
file x block 1
file x block 5
file y block 2
file y block 17
Data cache
file x attrs
file y attrs
Attribute cache
validity period
validity period
CS 138 XV–18 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
More …
• All write RPC requests must be handled synchronously on the server
• Close-to-Open consistency – client writes back all changes on close – flushes cached file info on open
CS 138 XV–19 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Client Crash Recovery
Data Cache
Server
Data Cache
Process A
Data Cache
Process B
CS 138 XV–20 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Server Crash Recovery
Data Cache
Server
Data Cache
Process A
Data Cache
Process B
CS 138 XV–21 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
File Locking
• State is required on the server! – recovery must take place in the event of client
and server crashes • Locking Protocol is independent of the File
Protocol – locking is advisory – one can lock a file and ask if a file is locked – not required to honor locks
- may read/write a file locked by others!
CS 138 XV–22 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Network Lock Manager Protocol
Data Cache
Server
Data Cache
Process A
Data Cache
Process B
lockd statd
lockd statd
lockd statd
CS 138 XV–23 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
NFS Version 3
• In use at Brown and in most of the rest of the world
• Basically the same as NFSv2 – improved handling of attributes – commit operation for writes – various other things
CS 138 XV–24 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
CIFS
• Common Internet File System – Microsoft’s distributed file system
• Features – strictly consistent
• Not featured … – depends on reliability of transport protocol – loss of connection == loss of session
CS 138 XV–25 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
History
• Originally a simple means for sharing files – developed by IBM and called server message
block protocol (SMB) – ran on top of NetBIOS
• Microsoft took over – renamed CIFS in late 1990s – uses SMB as RPC-like communication protocol
- runs on NetBIOS - usually layered on TCP - sometimes no NetBIOS, just TCP
CS 138 XV–26 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Consistency vs. Performance
• Strict consistency is easy … – … if all operations take place on server – no client caching
• Performance is good … – … if all operations take place on client – everything is cached on client
• Put the two together …
ø – or you can do opportunistic locking
CS 138 XV–27 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Opportunistic Locks
Server
Open A
OK, Op Lock
Client 1 Client 2
Open A
Revoke Op Lock
OK, changes
OK
CS 138 XV–28 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Back to NFS
• File system name space – how is distributed file system perceived on
clients? • Cross-computer links
– how are files on other computers referred to?
CS 138 XV–29 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
NFS Mount Protocol
Server
Client
Approved List
CS 138 XV–30 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
File Handles
• Servers provide opaque file handles to clients to refer to files
– contents mean nothing to clients – identify files on server
• Clients contact server via mount protocol to obtain file handles of roots of exported file systems
CS 138 XV–31 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
File Handle Contents
• File-System ID – which server file system
• File ID – which file within file system
• Generation # – guards against inode reuse
File-System ID File ID Generation #
CS 138 XV–32 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Server File Systems
/
B A D E C
H I
Q P O N
U
3 2
T
1 Z
F G
M L K J
S
Y X
R
W V
CS 138 XV–33 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Client vs. Server Mount Points (1)
/
B A D E C
H I
Q P O N
U
3 2
T
1 Z
F G
M L K J
S
Y X
R
W V
/
C1 C2
mount server:/B /C2
CS 138 XV–34 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Client vs. Server Mount Points (2)
/
B A D E C
H I
Q P O N
U
3 2
T
1 Z
F G
M L K J
S
Y X
R
W V
/
C1 C2
mount server:/B /C2
mount server:/B/F/K /C2/F/K
CS 138 XV–35 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Local vs. Global Namespace
• Local namespace – each host configures its own file-system
namespace – NFS clients each mount the appropriate remote
file systems • Global namespace
– all hosts share the same namespace – not done in early NFS
CS 138 XV–36 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Mount Protocol Problems
• Local namespaces don’t work • Achieve global name space by having each
client mount everything consistently • giving each client a table listing all possible
mounts is administratively difficult • performing all possible mounts is time
consuming • mounting is a “heavyweight” operation
CS 138 XV–37 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Rather than this …
home etc dev
carlos rohil rodrigo max louisa twd atty haris ishan
CS 138 XV–38 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
… this
home etc dev
Autofs
automount database
CS 138 XV–39 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Automounting: 2000
• Maintain description of global namespace in global database: NIS
• Do mounts only when needed • Automount times out after period of unuse
CS 138 XV–40 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Automounting: 2017
• Global namespace maintained in LDAP database
– lightweight directory access protocol - vendor neutral
– everything mounted at boottime - fewer, but larger, file systems
– no timeout
CS 138 XV–41 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DCE’s DFS fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
Client
Client
Client
Client
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
CS 138 XV–42 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Mount Points fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
Client
Client
Client
Client
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
bin.sol7 bin.osf1 users.twd proj.motif proj.dce
root.cell
bin.sol7 bin.osf1 users.twd proj.motif proj.dce
CS 138 XV–43 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Strict Consistency in DFS fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
Client
Client
Client
Client
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
CS 138 XV–44 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (1) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client
File A: Read:0-4095
File B: Write:0-512
File A: Read:0-4095
File B: Write:513-4096
CS 138 XV–45 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (2) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client
File A: Read:0-4095
File A: Read:0-4095
Write(A, 0-4095)
CS 138 XV–46 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (3) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client
File A: Read:0-4095
File A: Read:0-4095
Revoke(A, Read, 0-4095)
CS 138 XV–47 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (4) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client File A: Read:0-4095
Grant(A, write, 0-4095)
File A: Write:0-4095
CS 138 XV–48 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (5) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client File A: Read:0-4095
File A: Write:0-4095
Read(A, 0-4095)
CS 138 XV–49 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (6) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client File A: Read:0-4095
File A: Write:0-4095
CS 138 XV–50 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Tokens (7) fs
system users projects
sol7 osf1
bin usr bin
bin
twd motif dce
Client
FLDB
FLDB
FLDB
Server
Server
Server
Server
Client File A: Read:0-4095
File A: Read:0-4095
Grant(A, read, 0-4095)
CS 138 XV–51 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Crash Recovery
Server
Client
Client
CS 138 XV–52 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Crash Recovery (notes continued)
Server
Client
Client
CS 138 XV–53 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
DFS Recovery Problems
• Client application must participate! – must recognize that operation returns “timed-
out” error – must retry
• Due to semantics of tokens, it isn’t feasible to provide NFS-style hard mount
CS 138 XV–54 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
NFS Version 4
• Better than … – NFS version 2 – NFS version 3 – CIFS – DFS – (why aren’t we running it?)
CS 138 XV–55 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
NFSv4: Why?
• Problems with NFSv3 – doesn’t provide exact Unix file semantics – doesn’t support mandatory locks – doesn’t cope with byzantine failures
CS 138 XV–56 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Server State
• It’s required! – exact Unix semantics – mandatory locks
CS 138 XV–57 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
State Recovery
• Server crash recovery – clients reclaim state on server
- grace period after crash during which no new state may be established
• Client crash recovery – server detects crash and nullifies client state
information on server
CS 138 XV–58 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Coping with Non-Responsiveness
• Leases – locks are granted for a fixed period of time
- server-specified lease – if lease not renewed before expiration, server
may (unilaterally) revoke locks and share reservations
- most client RPCs renew leases – clients must contact server periodically
- if clientid is rejected as stale, then server has restarted
- server’s grace period is equal to lease period
CS 138 XV–59 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Pathological Network Problems
1) Client 1 obtains a lock on a portion of a file 2) There’s a network partition such that client 1
and server can no longer communicate 3) The server crashes and restarts 4) Client 2 obtains a lock on the same portion
of the same file, modifies the file, and then releases the lock
5) The server crashes and restarts and the network partition is repaired
6) Client 1 recontacts the server and reclaims its lock
CS 138 XV–60 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Coping …
• Possibilities 1) server keeps all client state in non-volatile
storage 2) server keeps all client state in volatile storage
and refuses all reclaim requests (effectively emulating CIFS)
3) something in between …
CS 138 XV–61 Copyright © 2017 Thomas W. Doeppner. All rights reserved.
Compromise
• Keep enough client state in non-volatile memory to know which clients were active at time of crash
– will honor reclaim requests from these clients – will refuse reclaim requests from others
• What to keep: – client ID – the time of the client’s first acquisition of a share
reservation or lock after a server reboot or client lease expiration
– a flag indicating whether the client’s most recent state was revoked because of a lease expiration