gr3ans

8/2/2019 GR3ANS

1/15

1. What are the implications of DSM for page replacement policies?

Explain. [8]

When there is no free space in the memory, a page may need to be

replaced. Traditionally, we use Least recently used (LRU). In DSM, LRU mayneed to be modified, since data may be accessed in different modes such as

shared, private, read only, writable. Private pages may be replaced before shared

pages, as shared pages would have to be moved over the network, from their

owner. Read-only pages can simply be deleted as their owners have a copy.

Once a page is selected for replacement, the DSM must ensure that the page is

not lost forever. One option is to swap the page onto disk. Another option is to

use reserved memory, wherein each node is responsible for certain portions of

the global virtual space and reserves memory space for those portions.

In a DSM system, as in any system using virtual memory, it can happen

that a page is needed but that there is no free page frame in memory to hold it.

When this situation occurs, a page must be evicted from memory to make room

for the needed page. Two sub problems immediately arise: which page to evict

and where to put it.

To a large extent, the choice of which page to evict can be made using

traditional virtual memory algorithms, such as some approximation to the least

recently used algorithm. As with conventional algorithms, it is worth keeping

track of which pages are 'clean' and which are 'dirty'. In the context of DSM, a

replicated page that another process owns is always a prime candidate to evict

because it is known that another copy exists. Consequently, the page does not

have to be saved anywhere. If a directory scheme is being used to keep track ofcopies, the owner or page manager must be informed of this decision, however.

When a kernel wishes to replace a page belonging to a DSM segment, itcan choose between pages that are read-only, pages that are read-only but which

the kernel owns, and pages that the kernel has write access to (and has modified).

Of these options, the least cost is associated with deleting the unowned read-only

page (which the kernel can always obtain again if necessary); if the kernel

deletes a read-only page that it owns, then it has lost a potential advantage if

write access is soon required; and if it deletes the modified page then it must first

transfer it elsewhere over the network or onto a local disk. So the kernel would

prefer to delete pages in the order given. Of course it can discriminate betweenpages with equal status by choosing, for example, the least recently accessed.

2. Explain in which respects DSM is suitable or unsuitable for client-server

systems. [4]

8/2/2019 GR3ANS

2/15

DSM is in general less suitable in client-server systems, where clients

normally view server-held resources as abstract data and access them by request

(for reasons of modularity and protection). However, servers can provide DSM

that is shared between clients. For example, memory-mapped files that are

shared and for which some degree of consistency is maintained are forms ofDSM. (Mapped files were introduced with the MULTICS operating system.

DSM may be suitable for client-server systems in some application

domains, e.g. where a set of clients share server responses.

DSM is unsuitable for client-server systems in that it is not conducive to

heterogeneous working. Furthermore, for security we would need a shared region

per client, which would be expensive.

3. Write short notes on

a. Thrashing in DSM:

Thrashing is said to occur when the system spends a large amount of time

transferring shared data blocks from one node to another, compared to the time

spent doing the useful work of executing application processes. Thrashing may

occur in following situations:

When interleaved data accesses made by processes on two or more nodes

causes a data block to move back and forth from one node to another in

quick succession (a ping-pong effect)

When blocks with read only permissions are repeatedly invalidated soon

after they are replicated.

If not properly handled thrashing degrades system performance considerably.

Therefore steps must be taken to solve this problem. The following methods may

be used to solve thrashing problem in DSM systems:

1. Providing application- controlled locks: Locking data to prevent other

nodes from accessing that data for a short period of time can reduce

threshing. An application controlled lock can be associated with each data

block to implement this method.

2. Nailing a block to a node for a minimum amount of time: Another

method to reduce threshing is to disallow a block to be taken away from a

node until a minimum amount of time t elapses after its allocation to that

node. The time t can either be fixed statically or be tuned dynamically on

the basis of access patterns.

3. Tailoring the coherence algorithm to the shared-data usage patterns:

8/2/2019 GR3ANS

3/15

Thrashing can also be minimized by using different coherence protocols

for shared data having different characteristics.

b. Reliability and Ordering of Multicast:

Reliable Multicast:

Reliable multicast satisfies criteria for validity, integrity and agreement; it

provides operations R-multicast and R-deliver.

Integrity - a correct process, p delivers m at most once. Furthermore, p group

(m) and m was supplied to a multicast operation by sender (m).

Validity - if a correct process multicasts m, it will eventually deliver m.

Agreement - if a correct process delivers m then all correct processes in group

(m) will eventually deliver m.

An integrity property is analogous to that for reliable one-to-one

communication. The validity property guarantees liveness for the sender. This

may seen unusual property, because it is symmetric. The agreementcondition is

related to atomicity, the property of all or nothing applied to delivery of message

group.

Implementing reliable multicast over B-multicast:

The following algorithm gives reliable multicast algorithm with the

primitives R-multicast and R-deliver, which allows processes to belong to

several closed groups simultaneously. To R-multicast a message, a process B-

multicast the message to the processes in the destination group. When the

message B-delivered, the R-delivers the message. This algorithm clearly satisfies

validity, since correct processes will eventually B-deliver the message to itself.

By the integrity property of the underlying communication channels used in B-,

multicast, the algorithm also satisfies the integrity property.

Fig. Reliable Multicast algorithm

Reliable multicast over IP multicast:

8/2/2019 GR3ANS

4/15

An alternative realization of R-multicast is to use a combination of IP

multicast, piggybacked acknowledgement and negative acknowledgements. This

R-multicast protocol is based on the observation that IP multicast communication

is often successful. The hold back queue is not necessary for reliability as in the

implementation using IP multicast, but it simplifies the protocol, allowingsequence numbers to represent sets of messages. Hold-back queues are also used

for ordering protocols.

Messageprocessing

Delivery queueHold-back

queue

deliver

Incoming

messages

When deliveryguarantees aremet

Fig. The hold-back queue for arriving multicast messages

Order Multicast:The basic multicast algorithm delivers messages to processes in an

arbitrary order, due to arbitrary delays in the underlying 1-to-1 send operations.

Common ordering requirements:

FIFO ordering

Causal ordering

Total ordering

FIFO ordering: if a correct process issues multicast (g, m) and then multicast(g, m) (multicast (g, m) i multicast(g, m)), then every correct process that

delivers m will deliver m before m. Partial relation.

8/2/2019 GR3ANS

5/15

Fig.FIFO Ordering

Casual ordering: If multicast (g, m) multicast (g, m), where is the

happened-before relation induced only by message sent between the members of

g, then any correct process that delivers m will deliver m before m. Partial

relation.

Fig.Casual Ordering

Total ordering: If a correct process delivers message m before it delivers m,

then any other correct process that delivers m will deliver m before m.

Fig. Total Ordering

8/2/2019 GR3ANS

6/15

Example: Bulletin Board

Reliable multicast required if every user is to receive every posting

eventually.

Consider an application in which users post messages to bulletin boards.

Each user runs a bulleting-board application process. Every topic of discussion has its own process group.

When a user posts a message to a bulletin board, the application multicasts

the users posting to the corresponding group.

Each users process is a member of the group for the topic he/she is

interested ==> the user will receive just the postings concerning that topic.

The following figure shows the posting as they appear from particular user. FIFO

ordering desirable since then every posting from a given user- A.Hanlon, say

will bereceived in the same order.Re: Microkernels (25)andRe: Mach(27).Causal ordering needed toguarantee this relationship. If multicast deliverywas totally ordered, then the items would be consistent between the users (userscould refer unambiguously, for example, to message 24).

Fig. Display from bulletin board program

Implementing FIFO Ordering:

FIFO ordered multicast with operationsFO-multicastand

FO-deliverfor non-overlapping groups. It can be implemented on top of

any basic multicast

Each process p holds:

Sp

ga count of messages sent byp togand

Rqgthe sequence number of the latest message togthatp delivered from q

Forp toFO-multicasta message tog, it piggybacks Spgon the message,

B-multicasts it and increments Spgby 1.

On receipt of a message from q with sequence numberS,p checks

Bulletin board:os.interesting

Item From Subject

23A.Hanlon

Mach

24G.Joseph Microkernels

25A.Hanlon Re: Microkernels

26T.LHeureux RPC performance

27M.Walker Re: Mach

end

8/2/2019 GR3ANS

7/15

whetherS=Rqg+ 1. If so, itFO-delivers it.

IfS>Rqg+ 1 thenpplaces message in hold-back queue until intervening

messages have been delivered. (note thatB-multicastdoes eventually

deliver messages unless the sender crashes)

Implementing total Ordering:

The basic approach to implementing total ordering is to assign totally

ordered identifies to multicast messages so that each process makes the same

ordering decision based upon these identifiers.

There are two methods for assigning identifiers to messages.

1. Total Ordering Using a Sequencer

2. The ISIS Algorithm for Total Ordering

1. Total Ordering Using a Sequencer:

First of these is for process calledsequencer to assign to them shown infollowing algorithm. A process wishing to TO-multicast m to g attaches a unique

id, id (m) and sends it to the sequencer and the members. The messages for gsent to the sequence for g, sequencer (g), as well as to the member of g. The

process sequencer (g) maintains a group of specific sequence sg, which it uses to

assign increasing and consecutive sequence numbers to the message that it B-

delivers. It announces the sequence numbers by B-multicasting order to g.

1. Algorithm for group member p

2. Algorithm for sequencer of g

Fig. Total ordering using a sequencer

8/2/2019 GR3ANS

8/15

2. The ISIS Algorithm for Total Ordering:

The processes collectively agree on the assignment of sequence numbers

to messages in a distributed fashion. The following figure shows the ISIS

algorithm for total ordering

The process P1 B-multicasts a 3 message to members of the group ,

receiving processes propose numbers and return them to the sender and

the sender uses the proposed numbers to generate an agreed number

Fig. The ISIS algorithm for total ordering

The algorithm for process g to multicast a message m to group g is as follows:

Each process, q keeps: A

qg- the largest agreed sequence number it has seen and

Pq

g- its own largest proposed sequence number

1. Processp B-multicasts tog, where i is a unique identifier form.

2. Each process q replies to the senderp with a proposal for the messages agreed

sequence number of

Pq

g := Max(Aqg, P

qg)+1

assigns the proposed sequence number to the message and places it in its

hold-back queue hold.

3. p collects all the proposed sequence numbers and selects the largest as thenext agreed sequence number, a.

8/2/2019 GR3ANS

9/15

ItB-multicasts tog. Recipients set Aq

g := Max(Aqg ,a ) ,attach a to the

message and re-order hold-back queue.

Implementing casual Ordering:

An algorithm of Birman 1991 for causally ordered multicast in non-overlapping, closed groups. It uses the happened before relation (on multicast

messages only).It uses vector timestamps - that count the number of multicast

messages from each process that happened before the next message to be

multicast. In the following algorithm, the causally order multicast operations

CO-multicast and CO-deliver. Each process (i=1, 2,N) maintains its own

vector timestamp. To CO-multicastmessage mto groupg, a process adds 1 to its

entry in the vector timestamp and B-multicasts m and the vector timestamp.

When a processB-delivers m, it places it in a hold-back queue before it can CO-

deliver it: until messages earlier in the causal ordering have been delivered any

message that casually preceded it .

Fig. Casual ordering using vector timestamp

c. Consistency models:

8/2/2019 GR3ANS

10/15

A consistency model basically refers to the degree of consistency that has

to be maintained for the shared memory data for the applications must for a

certain set of applications. It is defined as set of rules that must be obeyed if they

want the DSM system to provide the degree of consistency guaranteed by the

consistency model. There are various consistency models are available which arelisted as below:

1. Strict Consistency Model

2. Sequential Consistency Model

3. Causal Consistency Model

4. Pipelined Random-Access Memory (PRAM) Consistency Model

5. Processor Consistency Model

6. Weak Consistency Model

7. Release Consistency Model

1. Strict Consistency Model:

The strict consistency model is the strongest form of the memory

coherence, having the most stringent consistency requirements. A shared-

memory system is support to the strict consistency model if the value written by

a read operation on memory address is always the same as the value written by

most recent write operation to the address, irrespective of locations of the

process performing the read and write operations. Implementations of the strict

consistency model requires the existence of an absolute global time so that

memory read/write operations can be correctly order to make the meaning of

most recent clear.

2. Sequential Consistency Model:The sequential consistency model was proposed by Lamport .A shared

memory system is said to support the sequential consistency model if all theprocesses see the same order of all memory access operations on the shared

memory. The exact order in which the memory access is interleaved does not

matter. A DSM system supporting the sequential consistency model can be

implemented by ensuring the no memory operation is started until all the

previous ones have been completed.

3. Causal Consistency Model:

The causal consistency model proposed by Hutto and Ahamad. A shared

memory system is said to support the casual consistency model if all the writeoperations casually related are seen by all process in the same or correct order.

Implementations of a shared memory supporting the casual consistency model,

there is a need of keep track of which memory reference operation is dependent

on which memory reference operation.

4. Pipelined Random-Access Memory (PRAM) Consistency Model:

The pipelined random-access memory (PRAM) consistency model,

proposed by the Lipton and Sandberg. The PRAM consistency model is simple

and easy to implement and also has good performance. It can be implemented by

simply sequencing the write operations performed at each node independently of

the write operation performed by a single process are in a pipeline.5. Processor Consistency Model:

8/2/2019 GR3ANS

11/15

A processor consistency model, proposed by Goodman is very similar to

the PRAM consistency model with an additional restriction of memory

coherence. That is the processor consistency memory means that for any memory

is both coherent and adheres to the PRAM consistency model. Memory

coherence means memory location all processes agree on the same order of allwrite operations to that location. In effect, processor consistency model ensures

that all the write operations performed on the same memory location are seen by

all the process in the same order.

6. Weak Consistency Model:

The weak consistency model, proposed by the Dubois et al. ,is designed

to take advantage of the following two characteristics common to many

applications:

It is not necessary to show the change in memory done by every write operation

to the processor. The result of the several write operation can be combined and

sent to other processor only when they need.

Isolated access to shared variables is rare. That is, in many applications a process

makes several accesses to a set of shared variable and then no access at all to the

variables in this set for long time .

A DSM system that supports the weak consistency model uses the special

variable called a synchronization variable. For supporting weak consistency

model, the following requirements must be met:

1. All accesses to synchronize variables must obey sequential consistency

semantics,2. All previous write operations must be completed everywhere before as access

to a synchronize variable is allowed.3. All previous accesses to synchronize variables must be completed, before

access to a nonsynchronize variable is allowed.

7. Release Consistency Model:

The release consistency model provides a mechanism to clearly tell the

system whether a process is entering in critical section or exiting from critical

section so that the system can decide and perform only either the first or the

second operation when synchronization variables accessed by a process. This is

achieved by using two synchronization variables called acquire and release.A acquire is used by process to tell the system that it is about to enter a critical

section. A release is used by process to tell the system that it has just exited a

critical section.

For supporting release consistency model ,the following requirements must be

met:

1. All access to acquire and release synchronization variables obey processor

consistency semantics.

2. All previous acquires performed by a process must be completed successfully

before the process is allowed to perform a data access operation on the memory.

3. All previous data access operation performed by process must be completedsuccessfully before a release access done by the process is allowed.

8/2/2019 GR3ANS

12/15

d. Access matrix:

An access matrix is large and sparse. Most domains have no access at all

to most objects, that is, most of the entries are empty. Therefore, a direct

implementation of an access matrix as a two-dimensional matrix would be veryinefficient and expensive.

The two most commonly used methods that have gained popularity in

contemporary distributed systems for implementing access matrix are Access

Control Lists (ACLs) and capabilities. These two methods are described below.

Access Control Lists:In this method, the access matrix is decomposed by

columns, and each column of the matrix is implemented as an access list

for the object corresponding to that column. The empty entries of the

matrix are not stored in the access list. Therefore, for each object, a list ofordered pairs (domain, rights) is maintained, which defines all domains

with a nonempty set of access rights for that object.

Capabilities: Rather than decomposing the access matrix by columns,

in this method the access matrix is decomposed by rows, and each row is

associated with its domain. Obviously, the empty entries are discarded.

Therefore, for each domain, a list of ordered pairs (object, rights) is

maintained, which defines all objects for which the domain possesses

some access rights. Each (object, rights) pair is called a capability and thelist associated with a domain is called a capability list. A capability is

used for the following two purposes:

1. To uniquely identify an object

2. To allow its holder to access the object it identifies in one or

more permission modes.

Therefore two basic parts of capabilities,

I. Object identifier

II. Rights information

4. Discuss whether message passing or DSM is preferable for fault-tolerant

applications. [6]

Consider two processes executing at failure-independent computers. In a

message passing system, if one process has a bug that leads it to send spurious

messages, the other may protect itself to a certain extent by validating the

messages it receives. If a process fails part-way through a multi-messageoperation, then transactional techniques can be used to ensure that data are left in

8/2/2019 GR3ANS

13/15

a consistent state. Now consider that the processes share memory (DSM),

whether it is physically shared memory or page-based DSM. Then one of them

may adversely affect the other if it fails, because now one process may update a

shared variable without the knowledge of the other.

For example, it could incorrectly update shared variables due to a bug. Itcould fail after starting but not completing an update to several variables. If

processes use middleware-based DSM, then it may have some protection against

aberrant processes.

For example, processes using the Linda programming primitives must

explicitly request items (tuples) from the shared memory. They can validate

these, just as a process may validate messages.

There is no definitive answer as to whether DSM or message passing is

preferable for fault-tolerant application.

5. Why should we want to implement page-based DSM largely at user-

level, and what is required to achieve this? [8]

The basic model to be considered is one in which a collection of

processes shares a segment of DSM. The segment is mapped to the same range

of addresses in each process, so that meaningful pointer values can be stored in

the segment. The processes execute at computers equipped with a paged memory

management unit. We shall assume that there is only one process per computer

that accesses the DSM segment. There may in reality be several such processes

at a computer. However, these could then share DSM pages directly (the same

page frame can be used in the page tables used by the different processes). The

only complication would be to coordinate fetching and propagating updates to apage when two or more local processes access it. The following figure shows the

system model for the page-based DSM.

Fig. System model for page-based DSM

The page-based approach has the advantage of imposing no particular

structure on the DSM, which appears as a sequence of bytes. In principle, it

enables programs designed for a shared-memory multiprocessor to run oncomputers without shared memory, with little or no adaptation. Microkernel such

8/2/2019 GR3ANS

14/15

as Mach and Chorus provide native support for DSM (and other memory

abstractions the Mach virtual memory facilities are described in

www.cdk4.net/mach). Page-based DSM is more usually implemented largely at

user level to take advantage of the flexibility that that provides.

A page-based DSM implementation at user level facilitates1. application-specific memory (consistency) models and

2. protocol options.

The implementation utilizes kernel support for user-level page fault

handlers. UNIX and some variants of Windows provide this facility.

Microprocessors with 64-bit address spaces widen the scope for page-based

DSM by relaxing constraints on address space management.

We require achieving this kernel to export interfaces for

(a) Handling page faults from user level (in UNIX, as a signal) and

(b) Setting page protections from user level (see the UNIX memory map system

call).

Q.6 Explain why thrashing is an important issue in DSM systems and what

methods are available for dealing with it? [8]

Ans: Thrashing is an important issue in DSM systems because,

In DSM system, data blocks migrate between nodes on demand.

Therefore, if two nodes compete for write access to a single data item, the

corresponding data block may be transferred back and forth at such a high

rate that no real work can get done The problem of threshing may occur when data items in the same data

block are being updated by multiple nodes at the same time, causing large

numbers of data block transfers among the nodes without much progress

in the execution of the application. While a threshing problem may occur

with any block size, it is more likely with large block sizes, as different

regions in the same block may be updated by processes on different

nodes, causing data block transfers that are not necessary with smaller

block sizes.

In DSM system, methods are available for dealing with threshing,

1. Providing application- controlled locks: Locking data to prevent othernodes from accessing that data for a short period of time can reduce

threshing. An application controlled lock can be associated with each data

block to implement this method.

2. Nailing a block to a node for a minimum amount of time: Another

method to reduce threshing is to disallow a block to be taken away from a

node until a minimum amount of time t elapses after its allocation to that

node. The time t can either be fixed statically or be tuned dynamically on

the basis of access patterns.

3. Tailoring the coherence algorithm to the shared-data usage patterns:

Thrashing can also be minimized by using different coherenceprotocols for shared data having different characteristics.

8/2/2019 GR3ANS

15/15

11. Explain how to deal with the problem of differing data representations

for a middleware based implementation of DSM on heterogeneous

computers? [4]

The middleware calls can include marshalling and unmarshallingprocedures. In a page-based implementation, pages would have to be marshalled

and unmarshalled by the kernels that send and receive them. This implies

maintaining a description of the layout and types of the data, in the DSM

segment, which can be converted to and from the local representation. A

machine that takes a page fault needs to describe which page it needs in a way

that is independent of the machine architecture. Different page sizes will create

problems here, as will data items that straddle page boundaries, or items that

straddle page boundaries when unmarshalled A solution would be to use a

virtual page as the unit of transfer, whose size is the maximum of the page

sizes of all the architectures supported. Data items would be laid out so that the

same set of items occurs in each virtual page for all architectures. Pointers can

also be marshalled, as long as the kernels know the layout of data, and can

express pointers as pointing to an object with a description of the form Offset o

in data item i, where o and i are expressed symbolically, rather than physically.

This activity implies huge overheads.

gr3ans

Documents