CS210 Lab: Computational Complexity and Performance Measurement

Prelab Questions:

For a review of relevant topics click here.

Highlights of This Lab:

Computational Complexity and Performance Measurement
Creating a Library and Using it

Lab Exercise:

Click the little computer above for a detailed description.
In the exercise, the students can add the library, which has been created in this lab, to existing code to test the runtime of three different search algorithms and three different sorting algorithms.

1. Computational Complexity and Performance Measurement

A routine's performance can be judged in many ways. In CS210, you have been/will be learning Big O. The idea is that you can estimate the routine's projected execution times as a function of the number of data items (N) that it manipulates as it performs its task. The results are estimates of the form O(N), O(LogN) and so on.

Big O allows us to group routines based on their projected performance under different conditions (best case, worst case, etc). Remember that Big O is only an estimate. It does not take into account factors specific to a particular environment, such the specific implementation, the type of computer system we are using, and the data that we are processing.

To determine how well or poorly a routine will perform in a particular environment, we need to evaluate the routine in that environment.

Big O classifications

Big O notation is typically divided into predefined classifications that group functions based on them having similar running speed. For example, we have:

O(logn) - typically has a higher start up cost, but increases slowly over time
O(n) - has a linear run time. Increases to n increase the runtime in a one to one ratio, making a straight line
O(nlogn) - typically start off with a small curve, but will grow faster over time than O(n)
O(n^2) - One of the more expensive run times. Is parabolic in shape

In practice, the lines will never exactly match these ones, but you can still identify the classification by examining the general shape

Big O and constants

When we consider Big O notation, we do not really care about the specific numbers involved - we simply want to measure the rate something is increasing. This means that when you are using Big O notation, you simplify out the constants. A linear function of O(n) will still be considered to have the same rate of growth as O(10n), as both will be lines, even if one is steeper. All of the lines below would still be considered to have the same rate of growth of O(n).
Three lines, one multiplied by five, one multiplied by 10. Despite the difference in steepness, they are all O of n

In "Part 1" of the laboratory exercise, you will measure the performance of three search routines:

binarySearch()
linearSearch()
unknownSearch()

The first two search routines were discussed in CS115. For a review, click here

In summary, for the binary search, you are given a sorted array. The idea is to divide the array into two equal parts and narrow down the range of the array to be searched either to the low part of the array or high part of the array. We will continue narrowing down the range until the value is located, or until we realize that the value searched for is not in the array.

The linear search is also known as the sequential search. The idea is start with the first element, and sequentially search through the array until we find a match.

Before we can measure the performance of these routines, we must first develop a set of tools that allow us to measure the execution time.

The general approach will be the following:

Get the current system time (startTime)
Execute the routine
Get the current system time (stopTime)
Calculate the routine's execution time=stopTime-startTime

If the routine executes very rapidly, then the difference between startTime and stopTime may be too small for your computer system to measure. If that is the case, we can execute the routine several times (say n) and then divide the resulting time by the number of repetitions (n) as follows:

Get the current system time (startTime)
Execute the routine n times
Get the current system time (stopTime)
Calculate the routine's execution time = (stopTime-startTime)/n

To capture the startTime and stopTime, we will use a class called Timer which has the following member functions:

start()
stop()
getElapsedTime()

To add some complexity to this lab, we will turn this Timer class into a library and use this library for the lab exercise at the end.

2. Creating a Library and Using it (not part of the lab assginment in replit)

Our first step will be to download the Timer class and a sample program just to try it out

2.1 Download the C++ programs

Download three files. Get the files by right click on the file name and save as:

complexity.cpp: 
      	  This is the main program. 
   
    
timer.cpp: 
        This is the implementation of the Timer module. 
        It can be compiled into a library once it is bug-free.
  
    
timer.h: 
        This is the header file of the Timer module

Store the three files where you can easily find them.

2.2 Compile the C++ programs

Using the procedures discussed in Lab 1 do the following:

Set your .replit configuration file to compile and run the complexity and timer cpp files
Run the project and observe the results

Let's have a look at this code:

timer.h and timer.cpp are useful for calculating running time measurements. Because we may want to use the class Timer over and over again, we may want to put it in a library.

Our second step will be to create a library

2.3 What is a library?

A library is a collection of object files - .obj or .o files which you've been using in Unix. . Object files can contain individual functions, collections of functions or an entire class implementation. Each object file is stored individually. When your program references a function or class contained in a library, it links the object it belongs to and adds its code to your program. This way you can have one big file to store related things but only the pieces that you actually use in your program are added to the executable file.

2.4 What are header files?

Each function or class defined in the C or C++ standard library has a header file associated with it. The header files that relate to the functions that you use in your programs should be included (using #include) in your program.
There are two reasons for this:

to get the data types that work with many functions or classes in the standard library. Your program must have access to these data types that are defined in the header file related to each function or class
to obtain the prototypes for the standard library functions

Interestingly one library may have more than one header file.

2.5 Why create your own Library

Libraries can be regarded as pre-compiled functions or classes that do not need to be re-compiled if other parts of your project change. If you have a "bug-free" module (a piece of self-contained code) that could be reused with other projects, you can compile this module to a library. After you link this library to your new project, you will then be able to use any of the functions or classes in this library as if they are defined locally in your project. One major advantage is that the functions or classes in the library will not be recompiled, which saves time compiling your entire project.

In our example, "timer" is a perfect candidate: it does not contain any bugs and we can use it again in other projects.

To use "timer" in a new project, we will have to first compile the C++ programs into a library, and then link this library to the new project.

2.6 Compile the C++ programs using a library.

In this program, timer.h and timer.cpp define and implement a time measuring class Timer. This class is independent of the kind of program it is measuring. In other words, we can reuse this Timer class to measure the running time of other programs. It naturally follows that once we make this class bug-free, we can compile it into a library. To use this time-measuring class, we simply include the header file (i.e.timer.h) in the program and link the library to this program.

The following shows you how we do this:

Step 1. Create the object file

In the command line, write the following to create a timer object file:

	g++ -c timer.cpp -o timer.o

Step 2. Create the archive (static library)

next, compile the object(s) into an archive file (.a):

	ar rcs timerlib.a timer.o

ar stands for archive
r inserts the file with replacement, deleting old files with the same name
c creates the archive
s creates an object index within the archive

Step 3. Test the library

To test the library:

delete the timer.cpp file to make sure you aren't compiling using it - make sure you keep the timer.h file, you still need this

write:

	g++ complexity.cpp timerlib.a 
	
	./a.out

Lab Exercise

Part 1

You will analyse the execution times of three searching routines.

Get the files:

 timesrch.cpp , search.h , and sort.h

There are three files used in this program:
- search.h contains the implementation of a set of searching routines.
- sort.h contains the implementation of a set of sorting routines (used to sort the array so that binary search will work).
- timesrch.cpp — the main program. Outputs the runtime of the three searching routines.
Your primary tasks for this exercise are:
Steps include:
Either use command line or create a .replit configuration file to use the timer library:
```
	language = "c++"

	run = "g++ timesrch.cpp timerlib.a -o output; ./output"
```

Compile and run the project. You will notice that the values are printed in the following format:


Enter the number of keys (must be at least 1000) : 1000


    Search    | Number of|  Total Time  |   Time  per
     Type     | Searches |     (s)      |   Search(s)
--------------|----------|--------------|--------------
linearSearch  |   1400000|    3.898e+000|    2.784e-006
binarySearch  |  12600000|    1.813e+000|    1.439e-007
unknownSearch |    700000|    3.866e+000|    5.523e-006

The 2.087e-004 is equivalent to 2.087*10^-4 or 0.0002087.

Answer the questions on the Worksheet_PartOne.md in Replit.
You can use the following four sets of (Time per Search) data

Part 2

You will analyse the execution times of three sorting routines.

Get the files:
```
 timesort.cpp  
and sort.h  
```
Extract all of the files. There are two files used in this program:
- sort.h contains the implementation of a set of sorting routines (including selectionSort(), quickSort(), and unknownSort())
- timesort.cpp — the main program. Outputs the runtime of the three sorting routines.
Your primary tasks for this exercise are:
Answer the questions on the Worksheet_PartTwo.md in Replit.
You can use the following four sets of (Time per Sort) data

When you are finished you should have:

Three questions about search routines answered on Worksheet_PartOne.md
A graph of the search routines. Make sure to add this file to your Replit
Three questions about sorting routines answered on Worksheet_PartTwo.md
A graph of the sort routines. Make sure to add this file to your Replit

4. Postlab Exercises

For postlab exercises, click here.