gr3ans
TRANSCRIPT
-
8/2/2019 GR3ANS
1/15
1. What are the implications of DSM for page replacement policies?
Explain. [8]
When there is no free space in the memory, a page may need to be
replaced. Traditionally, we use Least recently used (LRU). In DSM, LRU mayneed to be modified, since data may be accessed in different modes such as
shared, private, read only, writable. Private pages may be replaced before shared
pages, as shared pages would have to be moved over the network, from their
owner. Read-only pages can simply be deleted as their owners have a copy.
Once a page is selected for replacement, the DSM must ensure that the page is
not lost forever. One option is to swap the page onto disk. Another option is to
use reserved memory, wherein each node is responsible for certain portions of
the global virtual space and reserves memory space for those portions.
In a DSM system, as in any system using virtual memory, it can happen
that a page is needed but that there is no free page frame in memory to hold it.
When this situation occurs, a page must be evicted from memory to make room
for the needed page. Two sub problems immediately arise: which page to evict
and where to put it.
To a large extent, the choice of which page to evict can be made using
traditional virtual memory algorithms, such as some approximation to the least
recently used algorithm. As with conventional algorithms, it is worth keeping
track of which pages are 'clean' and which are 'dirty'. In the context of DSM, a
replicated page that another process owns is always a prime candidate to evict
because it is known that another copy exists. Consequently, the page does not
have to be saved anywhere. If a directory scheme is being used to keep track ofcopies, the owner or page manager must be informed of this decision, however.
When a kernel wishes to replace a page belonging to a DSM segment, itcan choose between pages that are read-only, pages that are read-only but which
the kernel owns, and pages that the kernel has write access to (and has modified).
Of these options, the least cost is associated with deleting the unowned read-only
page (which the kernel can always obtain again if necessary); if the kernel
deletes a read-only page that it owns, then it has lost a potential advantage if
write access is soon required; and if it deletes the modified page then it must first
transfer it elsewhere over the network or onto a local disk. So the kernel would
prefer to delete pages in the order given. Of course it can discriminate betweenpages with equal status by choosing, for example, the least recently accessed.
2. Explain in which respects DSM is suitable or unsuitable for client-server
systems. [4]
-
8/2/2019 GR3ANS
2/15
DSM is in general less suitable in client-server systems, where clients
normally view server-held resources as abstract data and access them by request
(for reasons of modularity and protection). However, servers can provide DSM
that is shared between clients. For example, memory-mapped files that are
shared and for which some degree of consistency is maintained are forms ofDSM. (Mapped files were introduced with the MULTICS operating system.
DSM may be suitable for client-server systems in some application
domains, e.g. where a set of clients share server responses.
DSM is unsuitable for client-server systems in that it is not conducive to
heterogeneous working. Furthermore, for security we would need a shared region
per client, which would be expensive.
3. Write short notes on
a. Thrashing in DSM:
Thrashing is said to occur when the system spends a large amount of time
transferring shared data blocks from one node to another, compared to the time
spent doing the useful work of executing application processes. Thrashing may
occur in following situations:
When interleaved data accesses made by processes on two or more nodes
causes a data block to move back and forth from one node to another in
quick succession (a ping-pong effect)
When blocks with read only permissions are repeatedly invalidated soon
after they are replicated.
If not properly handled thrashing degrades system performance considerably.
Therefore steps must be taken to solve this problem. The following methods may
be used to solve thrashing problem in DSM systems:
1. Providing application- controlled locks: Locking data to prevent other
nodes from accessing that data for a short period of time can reduce
threshing. An application controlled lock can be associated with each data
block to implement this method.
2. Nailing a block to a node for a minimum amount of time: Another
method to reduce threshing is to disallow a block to be taken away from a
node until a minimum amount of time t elapses after its allocation to that
node. The time t can either be fixed statically or be tuned dynamically on
the basis of access patterns.
3. Tailoring the coherence algorithm to the shared-data usage patterns:
-
8/2/2019 GR3ANS
3/15
Thrashing can also be minimized by using different coherence protocols
for shared data having different characteristics.
b. Reliability and Ordering of Multicast:
Reliable Multicast:
Reliable multicast satisfies criteria for validity, integrity and agreement; it
provides operations R-multicast and R-deliver.
Integrity - a correct process, p delivers m at most once. Furthermore, p group
(m) and m was supplied to a multicast operation by sender (m).
Validity - if a correct process multicasts m, it will eventually deliver m.
Agreement - if a correct process delivers m then all correct processes in group
(m) will eventually deliver m.
An integrity property is analogous to that for reliable one-to-one
communication. The validity property guarantees liveness for the sender. This
may seen unusual property, because it is symmetric. The agreementcondition is
related to atomicity, the property of all or nothing applied to delivery of message
group.
Implementing reliable multicast over B-multicast:
The following algorithm gives reliable multicast algorithm with the
primitives R-multicast and R-deliver, which allows processes to belong to
several closed groups simultaneously. To R-multicast a message, a process B-
multicast the message to the processes in the destination group. When the
message B-delivered, the R-delivers the message. This algorithm clearly satisfies
validity, since correct processes will eventually B-deliver the message to itself.
By the integrity property of the underlying communication channels used in B-,
multicast, the algorithm also satisfies the integrity property.
Fig. Reliable Multicast algorithm
Reliable multicast over IP multicast:
-
8/2/2019 GR3ANS
4/15
An alternative realization of R-multicast is to use a combination of IP
multicast, piggybacked acknowledgement and negative acknowledgements. This
R-multicast protocol is based on the observation that IP multicast communication
is often successful. The hold back queue is not necessary for reliability as in the
implementation using IP multicast, but it simplifies the protocol, allowingsequence numbers to represent sets of messages. Hold-back queues are also used
for ordering protocols.
Messageprocessing
Delivery queueHold-back
queue
deliver
Incoming
messages
When deliveryguarantees aremet
Fig. The hold-back queue for arriving multicast messages
Order Multicast:The basic multicast algorithm delivers messages to processes in an
arbitrary order, due to arbitrary delays in the underlying 1-to-1 send operations.
Common ordering requirements:
FIFO ordering
Causal ordering
Total ordering
FIFO ordering: if a correct process issues multicast (g, m) and then multicast(g, m) (multicast (g, m) i multicast(g, m)), then every correct process that
delivers m will deliver m before m. Partial relation.
-
8/2/2019 GR3ANS
5/15
Fig.FIFO Ordering
Casual ordering: If multicast (g, m) multicast (g, m), where is the
happened-before relation induced only by message sent between the members of
g, then any correct process that delivers m will deliver m before m. Partial
relation.
Fig.Casual Ordering
Total ordering: If a correct process delivers message m before it delivers m,
then any other correct process that delivers m will deliver m before m.
Fig. Total Ordering
-
8/2/2019 GR3ANS
6/15
Example: Bulletin Board
Reliable multicast required if every user is to receive every posting
eventually.
Consider an application in which users post messages to bulletin boards.
Each user runs a bulleting-board application process. Every topic of discussion has its own process group.
When a user posts a message to a bulletin board, the application multicasts
the users posting to the corresponding group.
Each users process is a member of the group for the topic he/she is
interested ==> the user will receive just the postings concerning that topic.
The following figure shows the posting as they appear from particular user. FIFO
ordering desirable since then every posting from a given user- A.Hanlon, say
will bereceived in the same order.Re: Microkernels (25)andRe: Mach(27).Causal ordering needed toguarantee this relationship. If multicast deliverywas totally ordered, then the items would be consistent between the users (userscould refer unambiguously, for example, to message 24).
Fig. Display from bulletin board program
Implementing FIFO Ordering:
FIFO ordered multicast with operationsFO-multicastand
FO-deliverfor non-overlapping groups. It can be implemented on top of
any basic multicast
Each process p holds:
Sp
ga count of messages sent byp togand
Rqgthe sequence number of the latest message togthatp delivered from q
Forp toFO-multicasta message tog, it piggybacks Spgon the message,
B-multicasts it and increments Spgby 1.
On receipt of a message from q with sequence numberS,p checks
Bulletin board:os.interesting
Item From Subject
23A.Hanlon
Mach
24G.Joseph Microkernels
25A.Hanlon Re: Microkernels
26T.LHeureux RPC performance
27M.Walker Re: Mach
end
-
8/2/2019 GR3ANS
7/15
whetherS=Rqg+ 1. If so, itFO-delivers it.
IfS>Rqg+ 1 thenpplaces message in hold-back queue until intervening
messages have been delivered. (note thatB-multicastdoes eventually
deliver messages unless the sender crashes)
Implementing total Ordering:
The basic approach to implementing total ordering is to assign totally
ordered identifies to multicast messages so that each process makes the same
ordering decision based upon these identifiers.
There are two methods for assigning identifiers to messages.
1. Total Ordering Using a Sequencer
2. The ISIS Algorithm for Total Ordering
1. Total Ordering Using a Sequencer:
First of these is for process calledsequencer to assign to them shown infollowing algorithm. A process wishing to TO-multicast m to g attaches a unique
id, id (m) and sends it to the sequencer and the members. The messages for gsent to the sequence for g, sequencer (g), as well as to the member of g. The
process sequencer (g) maintains a group of specific sequence sg, which it uses to
assign increasing and consecutive sequence numbers to the message that it B-
delivers. It announces the sequence numbers by B-multicasting order to g.
1. Algorithm for group member p
2. Algorithm for sequencer of g
Fig. Total ordering using a sequencer
-
8/2/2019 GR3ANS
8/15
2. The ISIS Algorithm for Total Ordering:
The processes collectively agree on the assignment of sequence numbers
to messages in a distributed fashion. The following figure shows the ISIS
algorithm for total ordering
The process P1 B-multicasts a 3 message to members of the group ,
receiving processes propose numbers and return them to the sender and
the sender uses the proposed numbers to generate an agreed number
Fig. The ISIS algorithm for total ordering
The algorithm for process g to multicast a message m to group g is as follows:
Each process, q keeps: A
qg- the largest agreed sequence number it has seen and
Pq
g- its own largest proposed sequence number
1. Processp B-multicasts tog, where i is a unique identifier form.
2. Each process q replies to the senderp with a proposal for the messages agreed
sequence number of
Pq
g := Max(Aqg, P
qg)+1
assigns the proposed sequence number to the message and places it in its
hold-back queue hold.
3. p collects all the proposed sequence numbers and selects the largest as thenext agreed sequence number, a.
-
8/2/2019 GR3ANS
9/15
ItB-multicasts tog. Recipients set Aq
g := Max(Aqg ,a ) ,attach a to the
message and re-order hold-back queue.
Implementing casual Ordering:
An algorithm of Birman 1991 for causally ordered multicast in non-overlapping, closed groups. It uses the happened before relation (on multicast
messages only).It uses vector timestamps - that count the number of multicast
messages from each process that happened before the next message to be
multicast. In the following algorithm, the causally order multicast operations
CO-multicast and CO-deliver. Each process (i=1, 2,N) maintains its own
vector timestamp. To CO-multicastmessage mto groupg, a process adds 1 to its
entry in the vector timestamp and B-multicasts m and the vector timestamp.
When a processB-delivers m, it places it in a hold-back queue before it can CO-
deliver it: until messages earlier in the causal ordering have been delivered any
message that casually preceded it .
Fig. Casual ordering using vector timestamp
c. Consistency models:
-
8/2/2019 GR3ANS
10/15
A consistency model basically refers to the degree of consistency that has
to be maintained for the shared memory data for the applications must for a
certain set of applications. It is defined as set of rules that must be obeyed if they
want the DSM system to provide the degree of consistency guaranteed by the
consistency model. There are various consistency models are available which arelisted as below:
1. Strict Consistency Model
2. Sequential Consistency Model
3. Causal Consistency Model
4. Pipelined Random-Access Memory (PRAM) Consistency Model
5. Processor Consistency Model
6. Weak Consistency Model
7. Release Consistency Model
1. Strict Consistency Model:
The strict consistency model is the strongest form of the memory
coherence, having the most stringent consistency requirements. A shared-
memory system is support to the strict consistency model if the value written by
a read operation on memory address is always the same as the value written by
most recent write operation to the address, irrespective of locations of the
process performing the read and write operations. Implementations of the strict
consistency model requires the existence of an absolute global time so that
memory read/write operations can be correctly order to make the meaning of
most recent clear.
2. Sequential Consistency Model:The sequential consistency model was proposed by Lamport .A shared
memory system is said to support the sequential consistency model if all theprocesses see the same order of all memory access operations on the shared
memory. The exact order in which the memory access is interleaved does not
matter. A DSM system supporting the sequential consistency model can be
implemented by ensuring the no memory operation is started until all the
previous ones have been completed.
3. Causal Consistency Model:
The causal consistency model proposed by Hutto and Ahamad. A shared
memory system is said to support the casual consistency model if all the writeoperations casually related are seen by all process in the same or correct order.
Implementations of a shared memory supporting the casual consistency model,
there is a need of keep track of which memory reference operation is dependent
on which memory reference operation.
4. Pipelined Random-Access Memory (PRAM) Consistency Model:
The pipelined random-access memory (PRAM) consistency model,
proposed by the Lipton and Sandberg. The PRAM consistency model is simple
and easy to implement and also has good performance. It can be implemented by
simply sequencing the write operations performed at each node independently of
the write operation performed by a single process are in a pipeline.5. Processor Consistency Model:
-
8/2/2019 GR3ANS
11/15
A processor consistency model, proposed by Goodman is very similar to
the PRAM consistency model with an additional restriction of memory
coherence. That is the processor consistency memory means that for any memory
is both coherent and adheres to the PRAM consistency model. Memory
coherence means memory location all processes agree on the same order of allwrite operations to that location. In effect, processor consistency model ensures
that all the write operations performed on the same memory location are seen by
all the process in the same order.
6. Weak Consistency Model:
The weak consistency model, proposed by the Dubois et al. ,is designed
to take advantage of the following two characteristics common to many
applications:
It is not necessary to show the change in memory done by every write operation
to the processor. The result of the several write operation can be combined and
sent to other processor only when they need.
Isolated access to shared variables is rare. That is, in many applications a process
makes several accesses to a set of shared variable and then no access at all to the
variables in this set for long time .
A DSM system that supports the weak consistency model uses the special
variable called a synchronization variable. For supporting weak consistency
model, the following requirements must be met:
1. All accesses to synchronize variables must obey sequential consistency
semantics,2. All previous write operations must be completed everywhere before as access
to a synchronize variable is allowed.3. All previous accesses to synchronize variables must be completed, before
access to a nonsynchronize variable is allowed.
7. Release Consistency Model:
The release consistency model provides a mechanism to clearly tell the
system whether a process is entering in critical section or exiting from critical
section so that the system can decide and perform only either the first or the
second operation when synchronization variables accessed by a process. This is
achieved by using two synchronization variables called acquire and release.A acquire is used by process to tell the system that it is about to enter a critical
section. A release is used by process to tell the system that it has just exited a
critical section.
For supporting release consistency model ,the following requirements must be
met:
1. All access to acquire and release synchronization variables obey processor
consistency semantics.
2. All previous acquires performed by a process must be completed successfully
before the process is allowed to perform a data access operation on the memory.
3. All previous data access operation performed by process must be completedsuccessfully before a release access done by the process is allowed.
-
8/2/2019 GR3ANS
12/15
d. Access matrix:
An access matrix is large and sparse. Most domains have no access at all
to most objects, that is, most of the entries are empty. Therefore, a direct
implementation of an access matrix as a two-dimensional matrix would be veryinefficient and expensive.
The two most commonly used methods that have gained popularity in
contemporary distributed systems for implementing access matrix are Access
Control Lists (ACLs) and capabilities. These two methods are described below.
Access Control Lists:In this method, the access matrix is decomposed by
columns, and each column of the matrix is implemented as an access list
for the object corresponding to that column. The empty entries of the
matrix are not stored in the access list. Therefore, for each object, a list ofordered pairs (domain, rights) is maintained, which defines all domains
with a nonempty set of access rights for that object.
Capabilities: Rather than decomposing the access matrix by columns,
in this method the access matrix is decomposed by rows, and each row is
associated with its domain. Obviously, the empty entries are discarded.
Therefore, for each domain, a list of ordered pairs (object, rights) is
maintained, which defines all objects for which the domain possesses
some access rights. Each (object, rights) pair is called a capability and thelist associated with a domain is called a capability list. A capability is
used for the following two purposes:
1. To uniquely identify an object
2. To allow its holder to access the object it identifies in one or
more permission modes.
Therefore two basic parts of capabilities,
I. Object identifier
II. Rights information
4. Discuss whether message passing or DSM is preferable for fault-tolerant
applications. [6]
Consider two processes executing at failure-independent computers. In a
message passing system, if one process has a bug that leads it to send spurious
messages, the other may protect itself to a certain extent by validating the
messages it receives. If a process fails part-way through a multi-messageoperation, then transactional techniques can be used to ensure that data are left in
-
8/2/2019 GR3ANS
13/15
a consistent state. Now consider that the processes share memory (DSM),
whether it is physically shared memory or page-based DSM. Then one of them
may adversely affect the other if it fails, because now one process may update a
shared variable without the knowledge of the other.
For example, it could incorrectly update shared variables due to a bug. Itcould fail after starting but not completing an update to several variables. If
processes use middleware-based DSM, then it may have some protection against
aberrant processes.
For example, processes using the Linda programming primitives must
explicitly request items (tuples) from the shared memory. They can validate
these, just as a process may validate messages.
There is no definitive answer as to whether DSM or message passing is
preferable for fault-tolerant application.
5. Why should we want to implement page-based DSM largely at user-
level, and what is required to achieve this? [8]
The basic model to be considered is one in which a collection of
processes shares a segment of DSM. The segment is mapped to the same range
of addresses in each process, so that meaningful pointer values can be stored in
the segment. The processes execute at computers equipped with a paged memory
management unit. We shall assume that there is only one process per computer
that accesses the DSM segment. There may in reality be several such processes
at a computer. However, these could then share DSM pages directly (the same
page frame can be used in the page tables used by the different processes). The
only complication would be to coordinate fetching and propagating updates to apage when two or more local processes access it. The following figure shows the
system model for the page-based DSM.
Fig. System model for page-based DSM
The page-based approach has the advantage of imposing no particular
structure on the DSM, which appears as a sequence of bytes. In principle, it
enables programs designed for a shared-memory multiprocessor to run oncomputers without shared memory, with little or no adaptation. Microkernel such
-
8/2/2019 GR3ANS
14/15
as Mach and Chorus provide native support for DSM (and other memory
abstractions the Mach virtual memory facilities are described in
www.cdk4.net/mach). Page-based DSM is more usually implemented largely at
user level to take advantage of the flexibility that that provides.
A page-based DSM implementation at user level facilitates1. application-specific memory (consistency) models and
2. protocol options.
The implementation utilizes kernel support for user-level page fault
handlers. UNIX and some variants of Windows provide this facility.
Microprocessors with 64-bit address spaces widen the scope for page-based
DSM by relaxing constraints on address space management.
We require achieving this kernel to export interfaces for
(a) Handling page faults from user level (in UNIX, as a signal) and
(b) Setting page protections from user level (see the UNIX memory map system
call).
Q.6 Explain why thrashing is an important issue in DSM systems and what
methods are available for dealing with it? [8]
Ans: Thrashing is an important issue in DSM systems because,
In DSM system, data blocks migrate between nodes on demand.
Therefore, if two nodes compete for write access to a single data item, the
corresponding data block may be transferred back and forth at such a high
rate that no real work can get done The problem of threshing may occur when data items in the same data
block are being updated by multiple nodes at the same time, causing large
numbers of data block transfers among the nodes without much progress
in the execution of the application. While a threshing problem may occur
with any block size, it is more likely with large block sizes, as different
regions in the same block may be updated by processes on different
nodes, causing data block transfers that are not necessary with smaller
block sizes.
In DSM system, methods are available for dealing with threshing,
1. Providing application- controlled locks: Locking data to prevent othernodes from accessing that data for a short period of time can reduce
threshing. An application controlled lock can be associated with each data
block to implement this method.
2. Nailing a block to a node for a minimum amount of time: Another
method to reduce threshing is to disallow a block to be taken away from a
node until a minimum amount of time t elapses after its allocation to that
node. The time t can either be fixed statically or be tuned dynamically on
the basis of access patterns.
3. Tailoring the coherence algorithm to the shared-data usage patterns:
Thrashing can also be minimized by using different coherenceprotocols for shared data having different characteristics.
-
8/2/2019 GR3ANS
15/15
11. Explain how to deal with the problem of differing data representations
for a middleware based implementation of DSM on heterogeneous
computers? [4]
The middleware calls can include marshalling and unmarshallingprocedures. In a page-based implementation, pages would have to be marshalled
and unmarshalled by the kernels that send and receive them. This implies
maintaining a description of the layout and types of the data, in the DSM
segment, which can be converted to and from the local representation. A
machine that takes a page fault needs to describe which page it needs in a way
that is independent of the machine architecture. Different page sizes will create
problems here, as will data items that straddle page boundaries, or items that
straddle page boundaries when unmarshalled A solution would be to use a
virtual page as the unit of transfer, whose size is the maximum of the page
sizes of all the architectures supported. Data items would be laid out so that the
same set of items occurs in each virtual page for all architectures. Pointers can
also be marshalled, as long as the kernels know the layout of data, and can
express pointers as pointing to an object with a description of the form Offset o
in data item i, where o and i are expressed symbolically, rather than physically.
This activity implies huge overheads.