distributed deadlock

DISTRIBUTED DEADLOCK

1 | P a g e

Abstract:

A deadlock is a condition in a system where a process cannot proceed because it needs to obtain a resource held by another process but it itself is holding a resource that the other process needs. In a system of processes which communicate only with a single central agent, deadlock can be detected easily because the central agent has complete information about every process. Deadlock detection is more difficult in systems where there is no such central agent and processes may communicate directly with one another. If we could assume that message communication is instantaneous, or if we could place certain restrictions on message delays, deadlock detection would become simpler. However, the only realistic general assumption we can make is that message delays are arbitrary but finite. Deadlock is a fundamental problem in distributed systems. A process may request resources in any order, which may not be known a priori and a process can request resource while holding others. If the sequence of the allocations of resources to the processes is not controlled, deadlocks can occur. Moreover, a deadlock is a state where a set of processes request resources that are held by other processes in the set.

Types of Deadlock:

Two types of deadlock can be considered: Communication Deadlock Resource Deadlock

Communication deadlock occurs when process A is trying to send a message to process B, which is trying to send a message to process C which is trying to send a message to A. Resource deadlock occurs when processes are trying to get exclusive access to devices, files, locks, servers, or other resources. We will not differentiate between these types of deadlock since we can consider communication channels to be resources without loss of generality.

Conditions for Deadlock: Four conditions have to be met for a deadlock to occur in a system: 1. Mutual exclusion A resource can be held by at most one process. 2. Hold and wait Processes that already hold resources can wait for another resource.


2 | P a g e

3. Non-preemption A resource, once granted, cannot be taken away. 4. Circular wait Two or more processes are waiting for resources held by one of the other processes. Resource allocation can be represented by directed graphs: P1 R1 means that resource R1 is allocated to process P1. P1 R1 means that resource R1 is requested by process P1. Deadlock is present when the graph has cycles. An example is shown in Figure 1.

Figure 1: Deadlock.

Wait for Graph: The state of the system can be modeled by directed graph, called a wait for graph (WFG). In a WFG, nodes are processes and there is a directed edge from node P1 to mode P2 if P1 is blocked and is waiting for P2 to release some resource. A system is deadlocked if and only if there exists a directed cycle or knot in the WFG.A Figure 1 shows a WFG, where process P11 of site 1 has an edge to process P21 of site 1 and P32 of site 2 is waiting for a resource which is currently held by process P21. At the same time process P32 is waiting on process P33 to release a resource. If P21 is waiting on process P11, then processes P11, P32 and P21 form a cycle and all the four processes are involved in a deadlock depending upon the request model.


3 | P a g e

Figure 2: A Wait-for-Graph.

Handling Deadlocks in Distributed Systems: Deadlocks in distributed systems are similar to deadlocks in centralized systems. In centralized systems, we have one operating system that can oversee resource allocation and know whether deadlocks are present. With distributed processes and resources it becomes harder to detect, avoid, and prevent deadlocks. Several strategies can be used to handle deadlocks: Ignore: We can ignore the problem. This is one of the most popular solutions. Detect: We can allow deadlocks to occur, then detect that we have a deadlock in the system, and then deal with the deadlock. Prevent: We can place constraints on resource allocation to make deadlocks impossible. Avoid: We can choose resource allocation carefully and make deadlocks impossible.


4 | P a g e

Deadlock avoidance is never used (either in distributed or centralized systems). The problem with deadlock avoidance is that the algorithm will need to know resource usage requirements in advance so as to schedule them properly. Whereas, the first of these is trivially simple. The other two are described in details:

Deadlock Detection:

General methods for preventing or avoiding deadlocks can be difficult to find. Detecting a deadlock condition is generally easier. When a deadlock is detected, it has to be broken. This is traditionally done by killing one or more processes that contribute to the deadlock. Unfortunately, this can lead to annoyed users. When a deadlock is detected in a system that is based on atomic transactions, it is resolved by aborting one or more transactions. But transactions have been designed to withstand being aborted. Consequences of killing a process in a transactional system are less severe.

Centralized: Centralized deadlock detection attempts to imitate the nondistributed algorithm through a central coordinator. Each machine is responsible for maintaining a resource graph for its processes and resources. A central coordinator maintains the resource utilization graph for the entire system. This graph is the union of the individual graphs. If this coordinator detects a cycle, it kills off one process to break the deadlock. In the non-distributed case, all the information on resource usage lives on one system and the graph may be constructed on that system. In the distributed case, the individual sub graphs have to be propagated to a central coordinator. A message can be sent each time an arc is added or deleted. If optimization is needed, a list of added or deleted arcs can be sent periodically to reduce the overall number of messages sent. Here is an example. Suppose machine A has a process P0, which holds the resource S and wants resource R, which is held by P1. The local graph on A is shown in Figure 2. Another machine, machine B has a process P2, which is holding resource T and wants resource S. Its local graph is shown in Figure 3. Both of these machines send their graphs to the central coordinator, which maintains the union (Figure 3). All is well. There are no cycles and hence no deadlocks. Now two events occur. Process P1 releases resource R and asks machine B for resource T. Two messages are sent to the coordinator: Message 1 (from machine A): “releasing R” Message 2 (from machine B): “waiting for T” This should cause no problems (no deadlock). However, if message 2 arrives first, the coordinator would then construct the graph in Figure 4 and detect a deadlock. Such a condition is known as false deadlock. A way to fix this is to use Lamport’s algorithm to impose global time ordering on all machines. Alternatively, if the coordinator suspects deadlock, it can send a reliable message to every machine asking whether it has any release


5 | P a g e

messages. Each machine will then respond with either a release message or a negative acknowledgement to acknowledge receipt of the message.

Figure 3: Centralized Deadlock Detection.

Figure 4: False Deadlock.


6 | P a g e

Distributed: An algorithm for detecting deadlocks in a distributed system was proposed by Chaudy, Misra, and Haas in 1983. It allows that processes to request multiple resources at once (this speeds up the growing phase). Some processes may wait for resources (either local or remote). Cross-machine arcs make looking for cycles (detecting deadlock) hard. The algorithm works this way: When a process has to wait for a resource, a probe message is sent to the process holding the resource. The probe message contains three components: the process that blocked, the process that is sending the request, and the destination. Initially, the first two components will be the same. When a process receives the probe: if the process itself is waiting on a resource, it updates the sending and destination fields of the message and forwards it to the resource holder. If it is waiting on multiple resources, a message is sent to each process holding the resources. This process continues as long as processes are waiting for resources. If the originator gets a message and sees its own process number in the blocked field of the message, it knows that a cycle has been taken and deadlock exists. In this case, some process (transaction) will have to die. The sender may choose to commit suicide or a ring election algorithm may be used to determine an alternate victim (e.g., youngest process, oldest process).

Deadlock prevention An alternative to detecting deadlocks is to design a system so that deadlock is impossible. One way of accomplishing this is to obtain a global timestamp for every transaction (so that no two transactions get the same timestamp). When one process is about to block waiting for a resource that another process is using, check which of the two processes has a younger timestamp and give priority to the older process. If a younger process is using the resource, then the older process (that wants the resource) waits. If an older process is holding the resource, the younger process (that wants the resource) kills itself. This forces the resource utilization graph to be directed from older to younger processes, making cycles impossible. This algorithm is known as the wait-die algorithm (Figure 5). An alternative method by which resource request cycles may be avoided is to have an old process preempt (kill) the younger process that holds a resource. If a younger process wants a resource that an older one is using, then it waits until the old process is done. In this case, the graph flows from young to old and cycles are again impossible. This variant is called the wound-wait algorithm (Figure 6).


7 | P a g e

Figure 5: Wait-die Algorithm.

Figure 6: Wound-wait Algorithm.


8 | P a g e

Conclusion: Deadlock is the state of permanent blocking of a set of processes each of which is waiting for an event that only another process in the set can cause. However, handling of deadlocks in distributed systems is more complex than in centralized systems because the resources, the processes and other relevant information are scattered on different nodes of the system. ……………………………………………………………….X……………………………………………………………………

distributed deadlock

Technology

types of deadlock

distributed processes

resource r1

system of processes

resource usagerequirements

resource allocationand

process p33

distributed systems