CS330 Intro to Threads


Highlights of this lab:


Lab Code

To get the sample and exercise code, please use the following commands in your cs330 directory:
  curl https://www.labs.cs.uregina.ca/330/Threads/Lab5.zip -O -s
  unzip Lab5.zip


Preamble

Sometimes it is more efficient to divide a problem into smaller problems that can be solved at the same time. One way to do this in modern operating systems is to fork the program into multiple programs that run in parallel and use files, pipes, and interprocess communications to coordinate their activities. However, creating a whole process for a sub-problem and coordinating communication between them can be resource intensive and may eliminate any benefits gained from running the processes in parallel. This is where threads come in. The following is from Interprocess Communications in Linux:

Aware of such limitations, the designers of modern versions of UNIX anticipated the need for constructs that would facilitate concurrent solutions but not be as system-intensive as separate processes. ... [W]hy not allow individual processes the ability to simultaneously take different execution paths through their process space? This idea led to a new abstraction called a thread. Conceptually, a thread is a distinct sequence of execution steps performed within a process.

For a thread to be able to run on its own it needs its own program counter, register set, call stack, and the ability to create thread specific variables. Everything else needed for a process is shared between its various threads including instructions, files, and virtual memory. This means there is less need to communicate via pipes, files or IPC shared memory. However, because files and global variables, and static variables can be altered by any thread, there is a greater need to synchronize access to shared resources. Because of this most threading systems also include include a simplified binary semaphore-type thing called a mutex, or mutual exclusion lock.


POSIX Threads Commands

In this lab you will see how to do the following with POSIX threads (pthreads):

Since threads are a deep topic and we only have one lab to cover them, we will cover them superficially. We will survey the basic forms of the following calls:

We will not cover how to create anything other than the default thread and mutex types.  You should explore the references section if you want to learn more.

To use these commands you must include pthreads.h.

In Linux, you can request different versions of POSIX compliance. Consult the man page for feature_test_macros and search for POSIX to see what options are available.

Also in Linux, to compile you need to compile with -pthread to links against the pthread library and enable necessary pthread features in some headers:

g++ sample.cpp -pthread

Insead of using -pthread you can also define _REENTRANT and link against libpthread (-lpthread) but this is discouraged (see man pthreads for more).

If you are interested in writing POSIX threads programs on a Mac, you will be happy to know that you don't even need to specify -pthread – Mac programs are heavily threaded, and threading is enabled by default.

(What's reentrant you ask? Learn more from Wikipedia - Reentrant. Reentrancy is part of being thread-safe. You should look that up too: Wikipedia - Thread Safe. The basic idea is that, when you are using threads and can interrupt a function at any point, changes to shared global and static variables can cause all sorts of trouble. You are already familiar with one function that is not reentrant and not thread safe: strtok() ). You will find a full list of C library functions that are not thread-safe in man pthreads in the section "Thread-safe functions".



Creating Threads

pthread_create()

    #include <pthread.h>
int pthread_create(pthread_t *tid,
pthread_attr_t * attr_t,
void *(* start_routine) (void *),
void *arg );

Notes:

When a pthread_thread() system call is made, the resulting operation is very system dependent. Some operating systems implement user level controls for managing threads. Some operating systems create lightweight kernel level processes managing threads. On some systems a process and its threads will only execute on one processor. On others they can execute on multiple processors. Modern Linux kernels support threads with the NPTL (Native POSIX Threads Library), which, according to the pthreads man page, is "a so-called 1:1 implementation, meaning that each thread maps to a  kernel  scheduling  entity", meaing they can be scheduled to any processor on a correctly configured system.

The new POSIX thread will share some attributes with all other threads in the same process:

It will also have some distinct attributes:

pthread_self()

    #include <pthread.h>
pthread_t pthread_self();
Notes:

The following is an example of pthread_create

//
// Creating threads
//
// Based on p11.1.cxx from Interprocess Communications in Linux
// By: John Shapley Gray
// Adapted for CS330 by Alex Clarke
//

#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>

using namespace std;

extern "C" 
{
   void * say_it( void * );
}

int main(int argc, char *argv[])
{
   int num_threads;
   pthread_t *thread_ids;

   //Use unbuffered output on stdout
   setvbuf(stdout, (char *) NULL, _IONBF, 0);

   cout << "How many threads? ";
   cin >> num_threads;
   thread_ids = new pthread_t[num_threads];

   cout << "Making Threads" << endl;

   // generate threads 
   for (int i = 0; i < num_threads; i++)
   {
      if( pthread_create(&thread_ids[i],NULL,say_it,&thread_ids[i]) > 0)
      {
            cerr << "pthread_create failure" << endl;
            return 2;
      }
   }

   // wait a bit
   cout << "Making Changes in a Moment" << endl;
   sleep(1);

   // modify contents of arguments to threads
   for (int i = 0; i < num_threads; i++)
   {
      thread_ids[i] = i;
   }

   //wait a bit more
   sleep(2);

   system("bash -c 'read -sn 1 -p \"Press any key to quit...\" ' ");
   cout << endl;
   delete [] thread_ids;
   return 0;
}

// Print out the thread number twice
void * say_it(void *num)
{
   cout << "I am thread #" << *(unsigned int *)(num) << "." << endl;
   sleep (2);
   cout << "I am thread #" << *(unsigned int *)(num) << "." << endl;
   return NULL;
}

This program creates a user specified number of threads, passing them their thread id. Each thread prints that id twice.



Managing Threads

If you do not detach or join a thread, then when it exits it will become a zombie thread and consume some system resources. This can really cause trouble, so make sure you take care of the zombies before they take care of you. They will eat your computer's brains.

All threads return a void * when they are done. This can happen at the end of the start routine, or when pthread_exit() is called. The value returned can be a pointer to anything in process memory, which makes the a thread's returned value much more powerful than that of a process. This value, along with the thread's state, will be either accepted and freed by another thread that is waiting on it with pthread_join(), discarded if the thread is detached, or be freed when the process quits if neither of the other two conditions is met.

pthread_exit()

    #include <pthread.h>
    void pthread_exit(void *retval);

Notes:

pthread_join()

    #include <pthread.h>
    int pthread_join(pthread_t tid, void **retval);

Notes:

pthread_detach()

    #include <pthread.h>
int pthread_detach(pthread_t tid);
Notes:

The following demonstrates a sample use of pthread_exit() and pthread_join() :

//
// Joining threads and interpreting exit values
//
// Based on p11.1.cxx from Interprocess Communications in Linux
// By: John Shapley Gray
// Adapted for CS330 by Alex Clarke
//

#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>

using namespace std;

//Thread start 
extern "C" 
{
   void * say_it( void * );
}


int main(int argc, char *argv[])
{
   int num_threads;
   pthread_t *thread_ids;
   void  *p_status;

   //Use unbuffered output on stdout
   setvbuf(stdout, (char *) NULL, _IONBF, 0);

   cout << "How many threads? ";
   cin >> num_threads;
   thread_ids = new pthread_t[num_threads];

   cout << "Displaying" << endl;

   // generate threads 
   for (int i = 0; i < num_threads; i++)
   {
      int *arg = new int;
      *arg = i;
      if( pthread_create(&thread_ids[i],NULL,say_it,arg) > 0)
      {
                perror("creating thread:");
              return 2;
      }
   }

   // join threads and print their return values
   for (int i = 0; i < num_threads; i++)
   {
      if (pthread_join(thread_ids[i], &p_status) != 0)
      {
         perror("trouble joining thread: ");
         return 3;
      }
      cout << "Thread " << i << ": " << (char *)p_status << endl;

      delete [] (char *)p_status;
   }

   delete [] thread_ids;

   return 0;
}

// Build a message and return it at exit
void * say_it(void *num)
{
   int t_num = *(int *)num;
   char *msg = new char[255];
   cout << "Building message for thread" << t_num << endl;
   sleep(1);
   if (t_num == 5)
   {
      snprintf(msg, 255, "I am not %lX. I am #%d. I. AM. ALIVE.",
               pthread_self(), t_num);
      pthread_exit(msg);
   }
   snprintf(msg, 255, "My thread id was %lX. Goodbye...", pthread_self());
   return msg;
}

 


Synchronizing Threads

Just as with forked processes, and perhaps moreso, it is important to synchronize access to resources between threads. You need to worry about access to global variables now as well. You can't do this with the semaphores you have already learned because they are not thread safe. Fortunately the POSIX threads API offers many methods to synchronize access to resources. This week we will focus on the mutex.  A mutex is like a single binary semaphore with a couple small differences:

Other synchronization methods supported by the POSIX threads API include condition variables, read/write locks, and multithread semaphores.

pthread_mutex_init()

    #include <pthread.h>
    int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t *attr);

pthread_mutex_lock()
pthread_mutex_trylock()

    #include <pthread.h>
    int pthread_mutex_lock(pthread_mutex_t *mutex);
    int pthread_mutex_trylock(pthread_mutex_t *mutex);

pthread_mutex_unlock()

    #include <pthread.h>
    int pthread_mutex_unlock(pthread_mutex_t *mutex);

pthread_mutex_destroy()

    #include <pthread.h>
    int pthread_mutex_destroy(pthread_mutex_t *mutex);
Mutex Example - Using Mutexes to Control Output

You may have noticed that the output of the previous examples is a bit... messy. This program adds a mutex that controls access to stdout. Now our eyes rejoice as only one thread at a time may have access to this precious resource.

//
// Controlling output with mutexes.
//
// Based on p11.1.cxx from Interprocess Communications in Linux
// By: John Shapley Gray
// Adapted for CS330 by Alex Clarke
//

#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>

using namespace std;

pthread_mutex_t output_lock;

void * say_it( void * );

int main(int argc, char *argv[])
{
   int num_threads;
   pthread_t *thread_ids;
   void  *p_status;

   //Use unbuffered output on stdout
   setvbuf(stdout, (char *) NULL, _IONBF, 0);

   //Set up an output lock so that threads wait their turn to speak.
   if (pthread_mutex_init(&output_lock, NULL)!=0)
   {
      perror("Could not create mutex for output: ");
      return 1;
   }

   cout << "How many threads? ";
   cin >> num_threads;
   thread_ids = new pthread_t[num_threads];

   cout << "Displaying" << endl;

   // generate threads 
   for (int i = 0; i < num_threads; i++)
   {
      int *arg = new int;
      *arg = i;
      if( pthread_create(&thread_ids[i],NULL,say_it,arg) > 0)
      {
                perror("creating thread:");
              return 2;
      }
   }

   // join threads and print their return values
   for (int i = 0; i < num_threads; i++)
   {
      if (pthread_join(thread_ids[i], &p_status) != 0)
      {
         perror("trouble joining thread: ");
         return 3;
      }

      //Threads may still be building their return, so lock stdout
      if (pthread_mutex_lock(&output_lock) != 0)
      {
          perror("Could not lock output: ");
          return 4;
      }
      cout << "Thread " << i << ": " << (char *)p_status << endl;
      if (pthread_mutex_unlock(&output_lock) != 0)
      {
          perror("Could not unlock output: ");
          return 5;
      }

      delete [] (char *)p_status;
   }

   return 0;
}

// 
void * say_it(void *num)
{
   int t_num = *(int *)num;
   char *msg = new char[255];

   if (pthread_mutex_lock(&output_lock) != 0)
   {
       perror("Could not lock output: ");
       exit(4); //something horrible happened - exit whole program with error
   }
   cout << "Building message for thread" << t_num << endl;
   if (pthread_mutex_unlock(&output_lock) != 0)
   {
       perror("Could not unlock output: ");
       exit(5); //something horrible happened - exit whole program with error
   }

   if (t_num == 6)
   {
      snprintf(msg, 255, "My thread id is %lX, but I am so much more. I. AM. ALIVE.",
              pthread_self());
      pthread_exit(msg);
   }
   snprintf(msg, 255, "My thread id was %lX. Goodbye...", pthread_self());
   return msg;
}

References and Related Materials