Download - CS 162 Section
![Page 1: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/1.jpg)
CS 162 Section
Lecture 8
![Page 2: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/2.jpg)
What happens when you issue a read() or write() request?
![Page 3: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/3.jpg)
Life Cycle of An I/O Request
Device DriverTop Half
Device DriverBottom Half
DeviceHardware
Kernel I/OSubsystem
UserProgram
![Page 4: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/4.jpg)
When should you return from the read()/write() call?
![Page 5: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/5.jpg)
Interface Timing• Blocking Interface: “Wait”
– When request data (e.g., read() system call), put process to sleep until data is ready
– When write data (e.g., write() system call), put process to sleep until device is ready for data
• Non-blocking Interface: “Don’t Wait”– Returns quickly from read or write request with count of bytes
successfully transferred to kernel– Read may return nothing, write may write nothing
• Asynchronous Interface: “Tell Me Later”– When requesting data, take pointer to user’s buffer, return
immediately; later kernel fills buffer and notifies user– When sending data, take pointer to user’s buffer, return
immediately; later kernel takes data and notifies user
![Page 6: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/6.jpg)
Magnetic Disk Characteristic• Cylinder: all the tracks under the
head at a given point on all surfaces• Read/write data is a three-stage
process:– Seek time: position the head/arm over the proper track (into
proper cylinder)– Rotational latency: wait for the desired sector
to rotate under the read/write head– Transfer time: transfer a block of bits (sector)
under the read-write head• Disk Latency = Queuing Time + Controller time +
Seek Time + Rotation Time + Xfer Time
• Highest Bandwidth: – Transfer large group of blocks sequentially from one track
SectorTrack
CylinderHead
Platter
SoftwareQueue
(Device Driver)
Hardw
areC
ontroller Media Time
(Seek+Rot+Xfer)
Request
Result
![Page 7: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/7.jpg)
We have a disk with the following parameters:
• 1TB in size• 7200 RPM, Data transfer rate of 40 Mbytes/s
(40 × 106 bytes/sec) • Average seek time of 6ms• ATA Controller with 2ms controller initiation
time • A block size of 4Kbytes (4096 bytes)
What is the average time to read a random block from the disk?
![Page 8: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/8.jpg)
SSD– No penalty for random access– Rule of thumb: writes 10x more expensive than reads, and erases
10x more expensive than writes (read 25μs)– Limited drive lifespan
– Controller maintains pool of empty pages by coalescing used sectors (read, erase, write), also reserve some % of capacity
– Controller uses ECC, performs wear leveling– OS may provide TRIM information about “deleted” sectors
(normally only file system knows about unallocated blocks, not the disk drive)
![Page 9: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/9.jpg)
How will you allocate space on disk?
![Page 10: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/10.jpg)
![Page 11: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/11.jpg)
What is the purpose of a File System?
![Page 12: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/12.jpg)
File System
• Transforms blocks into Files and Directories
• Optimize for access and usage patterns
• Maximize sequential access, allow efficient random access
![Page 13: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/13.jpg)
Linked Allocation: File-Allocation Table (FAT)
![Page 14: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/14.jpg)
If entry size is 16 bits
What is the max size of the FAT?
![Page 15: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/15.jpg)
Given a 512 byte block, What is the max size
of the FS?
![Page 16: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/16.jpg)
What is the space overhead of FAT?
![Page 17: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/17.jpg)
Multilevel Indexed Files (UNIX 4.1)
![Page 18: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/18.jpg)
Where are the i-nodes stored?
![Page 19: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/19.jpg)
![Page 20: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/20.jpg)
What are problems with multi-level indexed files?
![Page 21: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/21.jpg)
Directory Structure
![Page 22: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/22.jpg)
What can the FS do to improve performance?
![Page 23: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/23.jpg)
Bitmap of free blocks
![Page 24: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/24.jpg)
Variable sized splits
![Page 25: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/25.jpg)
Cylinder Groups
![Page 26: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/26.jpg)
File System Caching• Optimizations for sequential access:
– Try to store consecutive blocks of a file near each other– Store inode near data blocks– Try to locate directory near the inodes it points to
• Buffer cache used to increase file system performance– Read Ahead Prefetching and Delayed Writes
• Key Idea: Exploit locality by caching data in memory– Name translations: Mapping from pathsinodes– Disk blocks: Mapping from block addressdisk content
• Buffer Cache: Memory used to cache kernel resources, including disk blocks and name translations– Can contain “dirty” blocks (blocks yet on disk)– Size: adjust boundary dynamically so that the disk access
rates for paging and file access are balanced
![Page 27: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/27.jpg)
File System Caching (cont’d)• Delayed Writes: Writes to files not immediately sent out to
disk– Instead, write() copies data from user space buffer to kernel
buffer (in cache)» Enabled by presence of buffer cache: can leave written file
blocks in cache for a while» If some other application tries to read data before written to disk,
file system will read from cache – Flushed to disk periodically (e.g. in UNIX, every 30 sec)– Advantages:
» Disk scheduler can efficiently order lots of requests» Disk allocation algorithm can be run with correct size value for a
file» Some files need never get written to disk! (e..g temporary scratch
files written /tmp often don’t exist for 30 sec)– Disadvantages
» What if system crashes before file has been written out?» Worse yet, what if system crashes before a directory file has
been written out? (lose pointer to inode!)
![Page 28: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/28.jpg)
Log Structured and Journaled File Systems• Better reliability through use of log
– All changes are treated as transactions – A transaction is committed once it is written to the log
» Data forced to disk for reliability» Process can be accelerated with NVRAM
– Although File system may not be updated immediately, data preserved in the log
• Difference between “Log Structured” and “Journaled”– In a Log Structured file system, data stays in log form– In a Journaled file system, Log used for recovery
• For Journaled system:– Log used to asynchronously update filesystem
» Log entries removed after used– After crash:
» Remaining transactions in the log performed (“Redo”)» Modifications done in way that can survive crashes
• Examples of Journaled File Systems: – Ext3 (Linux), XFS (Unix), HDFS (Mac), NTFS (Windows), etc.
![Page 29: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/29.jpg)
Key Value Store
• Very large scale storage systems• Two operations
– put(key, value)– value = get(key)
• Challenges– Fault Tolerance replication– Scalability serve get()’s in parallel; replicate/cache hot
tuples– Consistency quorum consensus to improve put()
performance
![Page 30: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/30.jpg)
Key Value Store
• Also called a Distributed Hash Table (DHT)• Main idea: partition set of key-values across many
machineskey, value
…
![Page 31: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/31.jpg)
Chord Lookup
• Each node maintains pointer to its successor
• Route packet (Key, Value) to the node responsible for ID using successor pointers
• E.g., node=4 lookups for node responsible for Key=37
4
20
3235
8
15
44
58
lookup(37)
node=44 is responsible for Key=37
![Page 32: CS 162 Section](https://reader036.vdocuments.net/reader036/viewer/2022081515/56816100550346895dd040b9/html5/thumbnails/32.jpg)
Chord
• Highly scalable distributed lookup protocol• Each node needs to know about O(log(M)), where m is
the total number of nodes• Guarantees that a tuple is found in O(log(M)) steps• Highly resilient: works with high probability even if half of
nodes fail