cs9222 advanced operating systems
TRANSCRIPT
CS9222 Advanced Operating System
Unit – V
Dr.A.Kathirvel
Professor & Head/IT - VCEW
Unit - V
Structures – Design Issues – Threads – Process Synchronization – Processor Scheduling – Memory Management – Reliability / Fault Tolerance; Database Operating Systems – Introduction – Concurrency Control – Distributed Database Systems – Concurrency Control Algorithms.
Motivation for Multiprocessors
Enhanced Performance -
Concurrent execution of tasks for increased throughput (between processes)
Exploit Concurrency in Tasks (Parallelism within process)
Fault Tolerance -
graceful degradation in face of failures
Basic MP Architectures
Single Instruction Single Data (SISD) - conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) - Vector and Array Processors
Multiple Instruction Single Data (MISD) - Not Implemented.
Multiple Instruction Multiple Data (MIMD) - conventional MP designs
MIMD Classifications
Tightly Coupled System - all processors share the same global memory and have the same address spaces (Typical SMP system).
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.
MP Block Diagram
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
MM MM MM MM
Interconnection Network
Memory Access Schemes
• Uniform Memory Access (UMA)
– Centrally located
– All processors are equidistant (access times)
• NonUniform Access (NUMA)
– physically partitioned but accessible by all
– processors have the same address space
• NO Remote Memory Access (NORMA)
– physically partitioned, not accessible by all
– processors have own address space
Other Details of MP
Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!
Write-update
Write-invalidate
bus snooping
MP OS Structure - 1
Separate Supervisor -
all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency
• Master/Slave Configuration
– master monitors the status and assigns work to other processors (slaves)
– Slaves are a schedulable pool of resources for the master
– master can be bottleneck
– poor fault tolerance
MP OS Structure - 2
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally have little interaction
multithread kernel and control access to resources (continuum)
MP OS Structure - 3
MP Overview
MultiProcessor
SIMD MIMD
Shared Memory
(tightly coupled) Distributed Memory
(loosely coupled)
Master/Slave Symmetric
(SMP)
Clusters
SMP OS Design Issues
Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency.
Process Synchronization - disabling interrupts is not sufficient.
Process Scheduling - efficient, policy controlled, task scheduling (process/threads) global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread dependencies
Memory Management - complicated since main memory is shared by possibly many processors. Each processor must maintain its own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade gracefully in the event of failures
SMP OS design issues - 2
Typical SMP System
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
I/O
subsystem
Issues:
• Memory contention
• Limited bus BW
• I/O contention
• Cache coherence
Main
Memory
50ns
Typical I/O Bus:
• 33MHz/32bit (132MB/s)
• 66MHz/64bit (528MB/s)
500MHz
System/Memory Bus
ether
scsi
video
Bridge
System Functions
(timer, BIOS, reset)
INT
Some Definitions
Parallelism: degree to which a multiprocessor application achieves parallel execution
Concurrency: Maximum parallelism an application can achieve with unlimited processors
System Concurrency: kernel recognizes multiple threads of control in a program
User Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications. Concurrency not supported by system.
Process and Threads
Process: encompasses
set of threads (computational entities)
collection of resources
Thread: Dynamic object representing an execution path and computational state.
threads have their own computational state: PC, stack, user registers and private data
Remaining resources are shared amongst threads in a process
Threads
Effectiveness of parallel computing depends on the performance of the primitives used to express and control parallelism
Threads separate the notion of execution from the Process abstraction
Useful for expressing the intrinsic concurrency of a program regardless of resulting performance
Three types: User threads, kernel threads and Light Weight Processes (LWP)
User Level Threads
User level threads - supported by user level (thread) library
Benefits:
no modifications required to kernel
flexible and low cost
Drawbacks:
can not block without blocking entire process
no parallelism (not recognized by kernel)
Kernel Level Threads
Kernel level threads - kernel directly supports multiple threads of control in a process. Thread is the basic scheduling entity Benefits:
coordination between scheduling and synchronization
less overhead than a process
suitable for parallel application
Drawbacks:
more expensive than user-level threads
generality leads to greater overhead
Light Weight Processes (LWP)
Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel
User threads scheduled by library onto LWPs
Multiple LWPs per process
Thread operations in user space:
create, destroy, synch, context switch
kernel threads implement a virtual processor
Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread packages to coordinate.
First Class threads (Psyche OS)
Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current user thread when stopped by kernel
kernel is responsible for processor allocation => preemption by kernel.
Thread package responsible for scheduling threads on available processors (activations)
Support for Threading
• BSD: – process model only. 4.4 BSD enhancements.
• Solaris:provides – user threads, kernel threads and LWPs
• Mach: supports – kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.
• Digital UNIX: extends MACH to provide usual UNIX semantics. – Pthreads library.
Process Synchronization:Motivation
Sequential execution runs correctly but concurrent execution (of the same program) runs incorrectly.
Concurrent access to shared data may result in data inconsistency
Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes
Let’s look at an example: consumer-producer problem.
Producer-Consumer Problem
Producer while (true) { /* produce an item and put in nextProduced */
while (count == BUFFER_SIZE); // do nothing
buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++; } count: the number of items in the buffer (initialized to 0)
Consumer while (true) { while (count == 0); // do nothing
nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; count--; // consume the item in nextConsumed
} What can go wrong in concurrent execution?
Race Condition count++ could be implemented as
register1 = count register1 = register1 + 1 count = register1
count-- could be implemented as register2 = count register2 = register2 - 1 count = register2
Consider this execution interleaving with “count = 5” initially:
S0: producer execute register1 = count {register1 = 5} S1: producer execute register1 = register1 + 1 {register1 = 6} S2: consumer execute register2 = count {register2 = 5} S3: consumer execute register2 = register2 - 1 {register2 = 4} S4: producer execute count = register1 {count = 6 } S5: consumer execute count = register2 {count = 4}
What are all possible values from concurrent execution?
How to prevent race condition? Define a critical section in
each process Reading and writing
common variables.
Make sure that only one process can execute in the critical section at a time.
What sync code to put into the entry & exit sections to prevent race condition?
do { entry section critical section exit section remainder section } while (TRUE);
Solution to Critical-Section Problem
1. Mutual Exclusion - If process Pi is executing in its critical section, then no other processes can be executing in their critical sections
2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted
What is the difference between
Progress and Bounded Waiting?
Peterson’s Solution
Simple 2-process solution
Assume that the LOAD and STORE instructions are atomic; that is, cannot be interrupted.
The two processes share two variables:
int turn;
Boolean flag[2]
The variable turn indicates whose turn it is to enter the critical section.
The flag array is used to indicate if a process is ready to enter the critical section. flag[i] = true implies that process Pi is ready!
Algorithm for Process Pi while (true) { flag[i] = TRUE; turn = j; while ( flag[j] && turn == j); CRITICAL SECTION flag[i] = FALSE; REMAINDER SECTION }
Mutual exclusion
Only one process enters critical section at a time.
Proof: can both processes pass the while loop (and enter critical section) at the same time?
Progress
Selection for waiting-to-enter-critical-section process does not block.
Proof: can Pi wait at the while loop forever (after Pj leaves critical section)?
Bounded Waiting
Limited time in waiting for other processes.
Proof: can Pj win the critical section twice while Pi waits?
Entry Section
Exit Section
Algorithm for Process Pi while (true) { flag[i] = TRUE; turn = j; while ( flag[j] && turn == j); CRITICAL SECTION flag[i] = FALSE; REMAINDER SECTION }
Entry Section
Exit Section
while (true) { flag[j] = TRUE; turn = i; while ( flag[i] && turn == i); CRITICAL SECTION flag[j] = FALSE; REMAINDER SECTION }
Synchronization Hardware
Many systems provide hardware support for critical section code
Uniprocessors – could disable interrupts
Currently running code would execute without preemption
Generally too inefficient on multiprocessor systems
Operating systems using this not broadly scalable
Modern machines provide special atomic hardware instructions
Atomic = non-interruptable
TestAndSet(target): Either test memory word and set value
Swap(a,b): Or swap contents of two memory words
TestAndSet Instruction
• Definition:
boolean TestAndSet (boolean *target) { boolean rv = *target; *target = TRUE; return rv: }
Solution using TestAndSet
Shared boolean variable lock, initialized to false.
Solution: while (true) { while ( TestAndSet (&lock )) ; /* do nothing // critical section lock = FALSE; // remainder section }
Does it satisfy mutual exclusion?
How about progress and bounded waiting?
How to fix this?
Entry Section
Exit Section
Bounded-Waiting TestAndSet
• Shared variable boolean waiting[n];
boolean lock; // initialized false.
• Solution: do { waiting[i] = TRUE; while (waiting[i] &&
TestAndSet(&lock); waiting[i] = FALSE; // critical section j=(i+1)%n; while ((j!=i) && !waiting[j]) j=(j+1)%n; If (j==i) lock = FALSE; else waiting[j] = FALSE; // reminder section } while (TRUE);
Mutual exclusion
Proof: can two processes pass the while loop (and enter critical section) at the same time?
Bounded Waiting
Limited time in waiting for other processes.
What is waiting[] for? When does waiting[i] set to FALSE?
Proof: how long does Pi’s wait till waiting[i] becomes FALSE?
Progress
Proof: exit section unblocks at least one process’s waiting[] or set the lock to FALSE.
Entry Section
Exit Section
Swap Instruction
• Definition:
void Swap (boolean *a, boolean *b) { boolean temp = *a; *a = *b; *b = temp: }
Solution using Swap Shared Boolean variable lock initialized to FALSE; Each process
has a local Boolean variable key.
Solution:
while (true) {
key = TRUE;
while ( key == TRUE)
Swap (&lock, &key );
// critical section
lock = FALSE;
// remainder section
}
Mutual exclusion? Progress and Bounded Waiting?
Notice a performance problem with Swap & TestAndSet solutions?
Entry Section
Exit Section
Processor Scheduling
PS: ready tasks are assigned to the processors so that performance is maximized.
Cooperate and communicate through shared variables or message passing, PS in multiprocessor system is difficult problem.
PS is very critical to the performance of multiprocessor systems because a naïve scheduler can degrade performance substantially.
Issues in Processor Scheduling
3 major causes of performance degradation are Preemption inside spinlock-controlled critical sections.
This situation occurs when a task is preempted inside CS when there are other tasks spinning the lock to enter the same CS.
cache corruption Big chunk of data needed by the previous tasks must be purged from the
cache and new data must be brought into the cache.
Very high miss ratio a processor switched to another task – Cache corrp.
context switching overheads Execution of a large no. of instructions to save and store the registers, to
initialize the registers, to switch address space, etc.
Co-Scheduling of the Medusa OS
Co-scheduling –proposed by ousterhout for MOS for cm*
All runnable tasks of an application are scheduled on the processor simultaneously.
Context switching between appl. Rather than bet. Tasks of several different applications.
Pbm: tasks wasting resources in lock-spinning while they wait for a preempted task to release the critical section.
Smart Scheduling
Proposed by zahorjan et al. – 2 nice features
It avoids preempting a task when the task is inside its CS
It avoids the rescheduling of tasks that were busy waiting at the time of their preemption until the task that is executing the corresponding CS release it.
Eliminates the resource waste due to a processor spinning a lock.
To reduce the overhead due to context switching nor to reduce the performance degradation due to cache corruption.
Scheduling in the NYU Ultracomputer
Edler et al. and it cobines the the strategies of the previous 2 scheduling techniques.
Tasks can be formed into groups and scheduled in any of the following ways:
task – scheduled or preempted in the normal manner
All task in group are sched. Or preempted simultaneously.
Tasks in group are never preempted.
Memory Management The Mach Operating System
Virtual MM of mach OS developed at cm*
Design Issues Portability
Data sharing
Protection
Efficiency
The Mach Kernel Basic primitives necessary for building parallel and
distributed applications.
The Mach Kernel
4.3 BSD
emulator
System V
emulator HP/UX
emulator Other
emulator
Microkernel
User process
User space
Kernel space
Software
emulator
layer
The kernel manages five principal abstractions:
1. Processes.
2. Threads.
3. Memory objects.
4. Ports.
5. Messages.
Process Management in Mach
Process
port
Bootstrap
port
Exception
port
Registered
ports kernel
process
Thread
Address space
Ports
The process port is used to communicate with the kernel.
The bootstrap port is used for initialization when a process starts up.
The exception port is used to report exceptions caused by the process. Typical exceptions are division by zero and illegal instruction executed.
The registered ports are normally used to provide a way for the process to communicate with standard system servers.
Ports
A process can be runnable or blocked.
If a process is runnable, those threads that are also runnable can be scheduled and run.
If a process is blocked, its threads may not run, no matter what state they are in.
Process Management Primitives
Create Create a new process, inheriting certain properties
Terminate Kill a specified process
Suspend Increment suspend counter
Resume Decrement suspend counter. If it is 0, unblock the process
Priority Set the priority for current or future threads
Assign Tell which processor new threads should run on
Info Return information about execution time, memory usage, etc.
Threads Return a list of the process’ threads
Threads
Mach threads are managed by the kernel. Thread creation and destruction are done by the kernel.
Fork Create a new thread running the same code as the
parent thread
Exit Terminate the calling thread
Join Suspend the caller until a specified thread exits
Detach Announce that the thread will never be jointed (waited
for)
Yield Give up the CPU voluntarily
Self Return the calling thread’s identity to it
Scheduling algorithm
When a thread blocks, exits, or uses up its quantum, the CPU it is running on first looks on its local run queue to see if there are any active threads.
If it is nonzero, run the highest-priority thread, starting at the queue specified by the hint.
If the local run queue is empty, the same algorithm is applied to the global run queue. The global queue must be locked first.
Scheduling Global run queue for processor set 1 Global run queue for processor set 2
Priority
(high) 0
Low 31
0
31
:Free
Count: 6
Hint: 2
:Busy
Count: 7
Hint: 4
Memory Management in Mach
Mach has a powerful, elaborate, and highly flexible memory management system based on paging.
The code of Mach’s memory management is split into three parts. The first part is the pmap module, which runs in the kernel and is concerned with managing the MMU.
The second part, the machine-independent kernel code, is concerned with processing page faults, managing address maps, and replacing pages.
The third part of the memory management code runs as a user process called a memory manager. It handles the logical part of the memory management system, primarily management of the backing store (disk).
Virtual Memory
The conceptual model of memory that Mach user processes see is a large, linear virtual address space. The address space is supported by paging.
A key concept relating to the use of virtual address space is the memory object. A memory object can be a page or a set of pages, but it can also be a file or other, more specialized data structure.
An address space with allocated regions, mapped objects, and unused addresses
File xyz region
Stack region
Data region
Text region
Unused
Unused
Unused
System calls for virtual address space manipulation
Allocate Make a region of virtual address space usable
Deallocate Invalidate a region of virtual address space
Map Map a memory object into the virtual address space
Copy Make a copy of a region at another virtual address
Inherit Set the inheritance attribute for a region
Read Read data from another process’ virtual address
space
Write Write data to another process’ virtual address space
Memory Sharing
Process 1 Process 2 Process 3
Mapped
file
Operation of Copy-on-Write
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
RW
RO
7
6
5
4
3
2
1
0
RO
Prototype’s address space
Physical memory
Child’s address space
Operation of Copy-on-Write
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
RW
RO
7
6
5
4
3
2
1
0
R
O
Prototype’s address space
Physical memory
Child’s address space 8
Copy of page 7
Advantages of Copy-on-write
1. some pages are read-only, so there is no need to copy them.
2. other pages may never be referenced, so they do not have to be copied.
3. still other pages may be writable, but the child may deallocate them rather than using them.
Disadvantages of Copy-on-write
1. the administration is more complicated.
2. requires multiple kernel traps, one for each page that is ultimately written.
3. does not work over a network.
External Memory Managers
Each memory object that is mapped in a process’ address space must have an external memory manager that controls it. Different classes of memory objects are handled by different memory managers.
Three ports are needed to do the job.
The object port, is created by the memory manager and will later be used by the kernel to inform the memory manager about page faults and other events relating to the object.
The control port, is created by the kernel itself so that the memory manager can respond to these events.
The name port, is used as a kind of name to identify the object.
Distributed Shared Memory in Mach
The idea is to have a single, linear, virtual address space that is shared among processes running on computers that do not have any physical shared memory. When a thread references a page that it does not have, it causes a page fault. Eventually, the page is located and shipped to the faulting machine, where it is installed so that the thread can continue executing.
Communication in Mach
The basis of all communication in Mach is a kernel data structure called a port.
When a thread in one process wants to communicate with a thread in another process, the sending thread writes the message to the port and the receiving thread takes it out.
Each port is protected to ensure that only authorized processes can send it and receive from it.
Ports support unidirectional communication. A port that can be used to send a request from a client to a server cannot also be used to send the reply back from the server to the client. A second port is needed for the reply.
A Mach port
Message queue
Current message count
Maximum messages
Port set this port belongs to
Counts of outstanding capabilities
Capabilities to use for error reporting
Queue of threads blocked on this port
Pointer to the process holding the RECEIVE capability
Index of this port in the receiver’s capability list
Pointer to the kernel object
Miscellaneous items
Message passing via a port
port
Sending
thread
Receiving thread
Kernel
send receive
Capabilities
1
2
3
4
1
2
3
4
Port
X
Port
Y
A B
process thread
Capability list Capability with
SEND right
Capability
with RECEIVE
right kernel
Primitives for Managing Ports
Allocate Create a port and insert its capability in the capability list
Destroy Destroy a port and remove its capability from the list
Deallocate Remove a capability from the capability list
Extract_right Extract the n-th capability from another process
Insert_right Insert a capability in another process’ capability list
Move_member Move a capability into a capability set
Set_qlimit Set the number of messages a port can hold
Sending and Receiving Messages
Mach_msg(&hdr, options, send_size, rcv_size, rcv_port, timeout, notify_port);
The first parameter, hdr, is a pointer to the message to be sent or to the place where the incoming message is put, or both.
The second parameter, options, contains a bit specifying that a message is to be sent, and another one specifying that a message is to be received. Another bit enables a timeout, given by the timeout parameter. Other bits in options allow a SEND that cannot complete immediately to return control anyway, with a status report being sent to notify_port later.
The send_size and rcv_size parameters tell how large the outgoing message is and how many bytes are available for storing the incoming message, respectively.
Rcv_port is used for receiving messages. It is the capability name of the port or port set being listened to.
The Mach message format
Message size
Capability index for destination port
Capability index for reply port
Message kind
Function code
Descriptor 1
Data field 1
Descriptor 2
Data field 2
Reply rights Dest. rights Complex/Simple
Header
Message
body
Not examined
by the
kernel
Complex message field descriptor
Data field size
In bits
Data field type Number of
in the data field
Bits 1 1 1 1 12 8 8
Bit
Byte
Unstructured word
Integer(8,16,32 bits)
Character
32 Booleans
Floating point
String
Capability
0: Out-of-line data present
1: No out-of-line data
0: Short form descriptor
1: Long form descriptor
0: Sender keeps out-of-line data
1: Deallocate out-of-line data from sender
Reliability/Fault Tolerance: the SEQUOIA System
Sequoia system – a loosely coupled multiprocessor system.
Attains a high level of fault tolerance by performing fault detection in hardware and fault recovery in the OS.
Design Issues Fault detection and isolation
Fault recovery
Efficiency
The sequoia Architecture
The Sequoia Architecture
Reliability/Fault Tolerance: the SEQUOIA System
Fault detection Error detecting codes
Comparison of duplicated operations
Protocol monitoring
Fault Recovery Recovery from processor failures
Recovery from main memory failures
Recovery from I/O failures
Database Operating Systems
Database system have been implemented as an application on top of general purpose OS
Requrements of DBOS Transaction
Management
Support for complex, persistent data
Buffer Management
Concurrency Control
CC is the process of controlling concurrent access to a database to ensure that the correctness of the database is maintained.
Database systems
Set of shared data objects that can be accessed by users.
Transactions
A transaction consists of a sequence of R, compute & W s/m that refer to the data objects of a database.
Conflicts
Transactions conflicts if they access the same data objects.
Transaction processing
A transaction is executed by executing its actions one by one from the beginning to the end.
A concurrency control model of DBS
3 software modules
Transaction manager (TM)
Supervises the execution of a transaction
Data manager (DM)
Responsible for enforcing concurrency control
Scheduler
Distributed Database System
A distributed database is a database in which storage devices are not all attached to a common processing unit such as the CPU.
It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers.
Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.
Model of Distributed Database System
Distributed Database System Motivations: DDBS offers several advantages over a centralized
database system such as Sharing
Higher system availability (reliability)
Improved performance
Easy expandability
Large databases
Transaction Processing Model
Serializability condition in DDBS
Data replication
Complications due to Data replication
Fully Replicated Database Systems 1. Enhanced reliability 2. Improved responsiveness 3. No directory
management 4. Easier load balancing
Concurrency Control Algorithms
It controls the interleaving of conflicting actions of transactions so that the integrity of a database is maintained, i.e., their net effect is a serial execution.
Basic synchronization primitives
Locks A transaction can request, hold or release the lock on a data
object.
lock a data object in 2 modes: exclusive and shared
Timestamps Unique number is assigned to a transaction or a data object and is
chosen from a monotonically increasing sequence.
Commonly generated using Lamport’s scheme
Lock based algorithms
Static locking
Two Phase Locking (2PL)
Problems with 2PL: Price for Higher concurrency
2PL in DDBS
Timestamp Based locking
Conflict Resolution
Wait Restart Die Wound
Non-two-phase locking
Timestamp Based Algorithms
Basic timestamp ordering algorithm
Thomas Write Rule (TWR)
Multiversion timestamp ordering algorithm
Conservative timestamp ordering algorithm
Thank U