Busy Waiting



Goals for this lecture : Click   here   for these notes in postscript.

Introduction

And now we try and write a successful solution in Java to the MEP. Before trying to write any protocols let's first assure ourselves that they are actually needed. In other words let's write an ME program mutex0 which has no protocols and see what happens ( test0 ). We see that one process can enter its critical section before another process has left its critical section. Thus this program is unsafe according to the correctness property of mutual exclusion discussed in Slide 5 of the MEP lecture.

First Attempt at Mutual Exclusion

We are going to try five different algorithms in this lecture to attempt to solve the MEP, but only the last of these will producce a satisfactory solution. Starting with naive ideas we shall soon appreciate their limitations and so look for better approaches. Our first attempt mutex1 is inspired by the four by 100 meteres relay race ( test1 ). To prevent the four atheletes all running at the same time a baton is used, of which only the holder has permission to run. In other words, only the holder of the baton is allowed to be in their critical section. Note that a random number generator is used to keep processes busy and our concurrent system probably fair. The baton is represented by an integer variable turn whose value is the identity number of the only process currently allowed to be in its critical section. Although this algorithm is safe in the sense of ME it still has a potentially serious flaw. Suppose that the processes are distributed among many machines, and that the one running the process currently in its critical section crashes. The remaining processes each become livelocked in their busy wait loop. Alternatively, suppose that the current holder of the baton becomes livelocked in its critical section, then everyone will soon be livelocked ( try coding up this situation in Exercise 1 ). The troube is that turn is a centralised approach to the MEP, and consequently if one process fails all will fail. Now try Exercise 2.

The Second Attempt

Now we try a distributed approach, the exact opposite of the first attempt which is centralised. The algorithm is that each process should first wait for all other processes to clear their critical sections, then declare their own intention to enter (the intention being to exclude all other processes from entering), then enter, then relinquish their intention after clearing the critical section. Our second attempt is mutex2. Test output test2 would appear to suggest that this algorithm is perfectly safe. However, such test runs do not often help us reveal the slight (but nonetheless existant!) possibility of an unsafe interleaving. When run on two processes the following unsafe sequence of events for mutex2 is possible.
  1. Process 0 checks and finds that process 1 is not in its critical section.
  2. Process 1 checks and finds that process 0 is not in its critical section.
  3. Process 0 declares intention to enter its critical section.
  4. Process 1 declares intention to enter its critical section.
  5. Process 0 enters its critical section.
  6. Process 1 enters its critical section.
The problem is that after waiting for all other processes to clear their critical sections a process may not be able to get into theirs before another process does so. In other words, the waiting and declaring intention is `critical' in the sense that it must not be intefered with. Thus, resource mutex2 is unsafe.

The Third Attempt

So what is the cause of our problem with mutex2? The pre_protocol (as described in Slide 4 of the MEP lecture) must not be interfered with during its execution, or else its intended affect may not be achieved. In other words the pre_protocol is just as critical as the critical section itself. In our third attempt mutex3 we try to recognise this fact by making the waiting after the declaration of intention, rather than have it before. As can only be exemplified in test3 it can be proven that mutex3 is safe in the sense that it has the mutual exclusion property. However, we have gone too far and made our program too safe. It is now possible for no process to enter its critical section even though nobody else is in theirs. For example, in the following sequence of events for mutex3 run on two processes both parties end up livelocked in their waiting loops.
  1. Process 0 sets its flag to be true.
  2. Process 1 sets its flag to be true.
  3. Process 0 is now waiting.
  4. Process 1 is now waiting.
The problem with mutex3 is that it is so overly cautiously safe that sometimes nobody is allowed into their critical section.

The Fourth Attempt

When in mutex3 a process declares its intention to enter its critical section there is no way this can be relinquished until after that critical section has been entered. Thus livelock occurs when all processes insist on entering their critical sections. In our fourth attempt mutex4 (tested in test4 ) we try to maintain safety while removing the ability of any process to insist that they enter their critical section. While waiting a process should not be eternally insistent, but rather graciously give others the opportunity to enter their critical section first. In each preprotocol we thus introduce the loop,

do { flag[id] = false ; flag[id] = true ; } while (wait(id)) ;

in which the intention is repeatedly temporarily relinquished in order to avoid being uncompromisingly insistent. But does this idea really work ? For any interleaving in which the two assignments in the above loop are always executed immediately after each other we can see that livelock could occur just as before.
  1. Process 0 sets its flag to be true outside of its loop.
  2. Process 1 sets its flag to be true outside of its loop.
  3. Process 0 is now waiting.
  4. Process 1 is now waiting.
  5. Process 0 sets its flag to be true inside of its loop.
  6. Process 1 sets its flag to be true inside of its loop.
However, if it is any consolation, we can argue that even though livelock still exists it is less likely.

So, where does all this leave us ? We have established a safe algorithm using flags of intention, but have yet to resolve how to remove the possbility of livelock. That is, we still have yet to find a mechanism to resolve the contention for entry to critical sections which can arise.

Dekker's Algorithm

Dekker's algorithm combines two ideas. First, we use the flag of intention approach from the fourth attempt in order to ensure safety (i.e. mutual exclusion). Secondly, we introduce a global authority who will arbitrate when contention for entry to critical sections occurs. That authority is represented by an integer variable insist. If insist has the value i then only process i is to have the right to enter its critical section. This gives us mutex5 tested in test5. The variable insist is analogus to the global variable turn used in our the first attempt. Now try Exercise 3. Although Dekker's Algorithm resolves the problem of livelock discussed in the fourth attempts it does not resolve the kind of livelock/crashing discussed in the first attempt. Try Exercise 4 and see what you can do to improve Dekker's Algorithm.

Further study

For more on the MEP and its solutions why not do some further study.