nfs & distributed systems issues

22
NFS & Distributed Systems Issues Vivek Pai Dec 6, 2001

Upload: ted

Post on 02-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

NFS & Distributed Systems Issues. Vivek Pai Dec 6, 2001. Mechanics. A few words about Project 5 It’s not just another webserver project. The Next Project. Behavioral spec Implementation up to you Can assume max of 128 procs/threads Use a simple counter to implement simple counts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NFS & Distributed Systems Issues

NFS & Distributed Systems Issues

Vivek PaiDec 6, 2001

Page 2: NFS & Distributed Systems Issues

2

Mechanics A few words about Project 5

It’s not just another webserver project

Page 3: NFS & Distributed Systems Issues

3

The Next Project Behavioral spec Implementation up to you Can assume max of 128

procs/threads Use a simple counter to implement

simple counts I may release a tool to test easier

Page 4: NFS & Distributed Systems Issues

4

Behavioral Spec

The following behavioral spec is important If there aren’t enough free

processes/threads, the server should spawn one per second

If there are too many free, one should be killed per second

This should not depend on any other activity in the system

Page 5: NFS & Distributed Systems Issues

5

Caching Mmap Always use mmap Keep cache of active & inactive

maps Total cache size in KB should be

limited by command-line argument Can only exceed this limit if all

mappings are active

Page 6: NFS & Distributed Systems Issues

6

Man Pages You May Like Mmap, munmap Man –k pthread Flock Sleep Signal Alarm

Page 7: NFS & Distributed Systems Issues

7

Being A Good User Do not fork wildly Try to test on non-shared system

Page 8: NFS & Distributed Systems Issues

8

Imagine The Following Everyone has a desktop machine Each machine has a user Each user has a home directory What problems arise?

Can’t move between machines Can’t easily share files with others How does this data get backed up?

Page 9: NFS & Distributed Systems Issues

9

Was It Always Like This? No Think mainframes:

Big, centralized box All disks attached Programs ran on box Only terminals/monitors on each desk

Page 10: NFS & Distributed Systems Issues

10

How Did We Get Here? Mainframe killers advocated little

boxes Lots of little boxes are a distributed

system Distributed systems introduce new

problems

Page 11: NFS & Distributed Systems Issues

11

Why Use Little Boxes? Little boxes are cheap

Easier to order a PC than a mainframe Little boxes are disposable

No need for a maintenance contract Economy of scale

Design cost amortized over more units

Page 12: NFS & Distributed Systems Issues

12

Were Minis Immune? Minicomputers were “department”-

sized versus “company”-sized Most information not shared among

everyone Administrator per department OK Shared resources only within

department OK

Page 13: NFS & Distributed Systems Issues

13

Why Not Just Shared Disk? Centralized storage

Easier administration/backup Better use of capacity Easier to build large filesystem cache Easier to provide AC/power

Problem: compare bandwidth 10 Mbit/sec Ethernet at the time Switched versus shared irrelevant

Page 14: NFS & Distributed Systems Issues

14

New Problem Single point of failure

Means everything depends on this item In other cases, duplication helps

Common failures = reboot But all information (state) lost All clients would have to be told We’d need to keep track of all clients

• On stable storage!

Page 15: NFS & Distributed Systems Issues

15

Toward Statelessness Make server as dumb as possible Shift burdens to client-side Client failure only harms that client Each operation is self-contained Repeating operations permissible

Idempotent – repeating causes no change

Page 16: NFS & Distributed Systems Issues

16

Idempotency Regular Unix system call

Write(fd, buf, size) Writes size bytes at current position,

moves position forward by size Idempotent version

Pwrite(fd, buf, size, offset) Idempotent operations in NFS hidden

from user programs

Page 17: NFS & Distributed Systems Issues

17

Distributed Caching Local filesystems have caches Use caches to offload network traffic

Same object replicated in many caches No problem for reads

What happens on write/update? Multiple different copies of data? What happens if it’s metadata?

Page 18: NFS & Distributed Systems Issues

18

Distributed Write Problem Possible approaches

Disallow caching on writes• What about emacs?

Disallow caching of shared files• What happens for really big files?

Disallow caching of metadata writes What disk blocks does OS care

about?

Page 19: NFS & Distributed Systems Issues

19

Sun’s Write Philosophy File block write sharing not an issue Very few programs do it Correctness depends on program Reduce window of opportunity

Flush dirty blocks periodically Flush can be asynchronous

Page 20: NFS & Distributed Systems Issues

20

Metadata Operations Performed synchronously at server Must be reflected to disk

Why: stability Overhead: disk op + network

Can we speed up synchronous ops?

Page 21: NFS & Distributed Systems Issues

21

New Statelessness Problems Stale file handle problem

cd ~vivek/temp1/temp in window A rm –r ~vivek/temp1 in window B “ls” in window A

Stale inode problem Machine A gets file for read Filesystem reformatted by admin Machine A modifies file, tries to write

Page 22: NFS & Distributed Systems Issues

22

What Slows Down Servers Network overhead

Disk DMA in 4KB pieces Network processing in 1500 byte

packets + manipulation Multiple CPUs

Synchronous operations Nonvolatile memory + recovery