Discussion questions: monitors

Please work out these problems before your next discussion section. The GSIs and IAs will go over these problems during the discussion section.

1. Restroom access

U-M decides to save money on the construction of the new CSE building by building only unisex restrooms (as opposed to redundant men's and women's rooms on each floor). For modesty's sake, however, they impose the rule that only people of one gender may occupy a given restroom at the same time.

Your task is to write a program which models access to a restroom, using Mesa monitor primitives.

Write the following procedures: woman_wants_to_enter(), man_wants_to_enter(), woman_leaves(), man_leaves().

Use the monitor primitives lock(), unlock(), signal(), wait(), and broadcast() in these functions to control access to the restroom. You may assume there is no limit on the number of people of the same gender who may occupy the restroom at a given time.

Next, we want to modify our solution in order to prevent starvation. In other words, a person should not have to wait indefinitely to enter, assuming that each person already in the restroom stays there for some finite time period. We will try to ensure fairness using a bool called TURN. TURN can be set to either MEN or WOMEN. Write a solution that uses the TURN variable to alternate priorities between men and women when both groups are waiting. For example, if there are women in the bathroom and men waiting for the bathroom and a woman arrives, the woman should only be allowed in if TURN = WOMEN. Flip the TURN variable when appropriate.

Notice that we can change the TURN variable in either the woman_leaves() function, or the woman_wants_to_enter() function.

2. Locks

Assume n threads are accessing m independent shared objects (e.g., shared variables). How many locks are required to provide maximum concurrency within the threads? Does more concurrency imply better performance?

3. File access

Several threads are accessing a file. Each thread has a a unique priority number. The file can be accessed simultaneously by several threads, subject to the following constraint: the sum of all unique priority numbers of the threads currently accessing the file must be less than n.

4. Barriers

(This problem has optional background material. The background reading tries to give you a sense of how you might find parallelism and use monitors in a real world problem, and attempts to motivate the usefulness of barriers. If you'd like to read the background material, click here. It is not necessary to understand the background material to do the problem.)

Certain classes of partial differential equations can be solved by an iterative method called relaxation. One such PDE is the one dimentional form of Poisson's equation: ∂²V/∂x² = f(x). If you create a vector of doubles representing the values of V at SIZE + 1 discretized points, you can calculate an approximate solution through iteration. At each point i from 1 to SIZE-1, you can compute the next value of V[i] = (V[i+1] + V[i-1] - f(i))/2. Eventually, in enough iterations, V will converge to the solution of the PDE.

So, you write the following code:

void relaxation(int * V, int (*f)(int) f, int size, int count) {
  // A temporary buffer to hold alternate iterations
  int * tmp = new int[size]; 
  // A pointer point at the next iteration array
  int * p = tmp;
  // A pointer pointing at the old iteration array
  int * p_prime = V + 1;

  for (int j = 0; j < count; j++) {
    for (int i = 0; i < size; i++) {
      //Computing the next iteration
      p[i] = (p_prime[i-1] + p_prime[i+1] - f(i))/2;
    }   
    // Swap the two pointers, so that the old array can be
    // reused for the next
    swap(p, p_prime);
  }

  // If when we finished, the result was not in the input array
  // copy it back there.
  if (p != V + 1) {
    memcpy(V + 1, p, size);
  }

  delete tmp;
}

You run the code and are satisfied with its correctness, but because you make SIZE large and f is difficult to compute, this function is slow. You decide that you could divide up the array and give part of the array to multiple threads.

struct relaxation_config {
  int * V;
  int * p;
  int * p_prime;
  int (*f)(int);
  int size;
  int count;
};

void relaxation_thread(void * arg) {
  // Cast the input to a relaxation_config struct pointer.
  relaxation_config * config = arg;

  // Compute our portion of the iteration.
  for (int j = 0; j < config->count; j++) {
      // For each cell in our range, evaluate the expression.
      for (int i = 0; i < config->size; i++) {
         config->p[i] = (config->p_prime[i-1] + 
           config->p_prime[i+1] - config->f(i))/2;
      }   
      swap(config->p, config->p_prime);
  }

  if (config->p != config->V) {
    memcpy(config->V, config->p, size);
  }

  delete config;
}

void relaxation(int * V, int (*f)(int), int size, int count) {
  int * tmp = new int[size]; 
  int * p = tmp;
  int * p_prime = V + 1;
  thread * threads[NUM_THREADS];

  // Each thread will do thread_size entries in the matrix.
  int thread_size = size/NUM_THREADS;

  for (int i = 0; i < NUM_THREADS; i++) {
    // Configure the child thread
    relaxation_config * config = new relaxation_config;
    // Each thread gets a segment beginning i * thread_size through
    config->V = V + 1 + i * thread_size;
    config->p = p + i * thread_size;
    config->p_prime = p_prime + i * thread_size;
    config->f = f;
    config->size = thread_size;
    config->count = count;
    // Create the thread
    threads[i] = new thread(relaxation_thread, (void*) config);
  }

  // Join with all the threads. Function cannot return until 
  // all threads have finished!
  for (int i = 0; i < NUM_THREADS; i++) {
    threads[i]->join();
    delete threads[i];
  }

  delete tmp;
}

But this code has a bug! Each thread computes all of its cells, but what happens on the edges? When a thread is computing it's j'th i=0 iteration of the inner loop, it is dependent on another thread having completed it's j-1'th i=threadd_size - 1 iteration! We must enforce this restriction.

So, you consult your EECS 482 textbook and see that the solution is to use a barrier. But alas, the simple EECS 482 thread library doesn't provide them. So, you decide to implement barriers with monitors.

A barrier is a tool that provides a way for various threads to synchronize their progress. Threads "check in" or wait at a barrier and are only allowed to proceed past the barrier when a certain number of threads have "checked in" to the barrier. This tool allows a program to run phases in parallel (particularly useful in matrix operations).

Your job is to implement barriers using monitors. Implement the constructor and wait function adding private variables as needed.

Now, armed with your new Barrier class, you fix up your code so that it works correctly in parallel by having each thread check in to the barrier before continuing on to its next iteration of the outer loop.

struct relaxation_config {
  int * V;
  int * p;
  int * p_prime;
  int (*f)(int);
  int size;
  int count;
  Barrier * barrier;
};

void relaxation_thread(void * arg) {
  relaxation_config * config = arg;
  for (int j = 0; j < config->count; j++) {
      for (int i = 1; i <= config->size; i++) {
         config->p[i] = (config->p_prime[i-1] + 
           config->p_prime[i+1] - config->f(i))/2;
      }   
      swap(config->p, config->p_prime);

      // Wait at the barrier until all thread have completed
      // j'th iteration.
      barrier->wait();
  }

  if (config->p != config->V) {
    memcpy(config->V, config->p, size);
  }
  delete config;
}

void relaxation(int * V, int (*f)(int), int size, int count) {
  int * tmp = new int[size]; 
  int * p = tmp;
  int * p_prime = V;
  thread * threads[NUM_THREADS];

  // Create a new barrier
  Barrier * barrier = new barrier(NUM_THREADS);
  int thread_size = size/NUM_THREADS;

  for (int i = 0; i < NUM_THREADS; i++) {
    relaxation_config * config = new relaxation_config;
    config->V = V + i * thread_size;
    config->p = p + i * thread_size;
    config->p_prime = p_prime + i * thread_size;
    config->f = f;
    config->size = thread_size;
    config->count = count;
    config->barrier = barrier;
    threads[i] = new thread(relaxation_thread, (void*) config);
  }

  for (int i = 0; i < NUM_THREADS; i++) {
    threads[i]->join();
    delete threads[i];
  }

  delete barrier;
}