CS330 Process Memory Addresses

Highlights of this lab:

Focus on Unix Commands
Some Differences Between C and C++
Preamble on Process Memory
User Versus Kernel Space
Three Segments of User Space
References

Focus on Unix Commands

To get the sample code, please use the following command:

wget www.labs.cs.uregina.ca/330/ProcessMem/Lab1.zip

Although you can do a lot with a graphical user interface (GUI), it is often necessary to execute commands on the command-line. The basic format is:

prompt$ command-name options arguments

For example:

titan[1]% g++ -c main.cpp

a037094[7]% ls -l

A summary of a few commands is provided in tabular format here:

Command	Example	Comments
`ssh`	`ssh smithj@titan.cs.uregina.ca`	Secure Shell (ssh) enables you to log in, execute commands, and run applications on a remote system. SSH encrypts any communication between the remote user and a system on your network.
`scp`	`scp input.txt smithj@titan.cs.uregina.ca:CS330lab1/` `scp -r CS330 smithj@titan.cs.uregina.ca:classes/`	With Secure Copy (scp) you can copy files between the remote host and a network host. scp actually uses ssh to transfer data and employs the same authentication and encryption. This first example copies the file input.txt from a users current directory to the user smithj's CS330lab1 directory, located on the titan host. The second example uses the -r option (which allows whole directories to be copied); it, thus, copies the entire CS330 directory to the classes directory of the user smithj
`touch`	`touch myfile`	Update the access and modification time of "myfile". If "myfile" does not exist, create it.
`mkdir`	`mkdir reports`	Creates a directory named "reports"
`rmdir`	`rmdir letters`	Erases a directory named "letters"
`rm`	`rm myfile` `rm -r mydir` `rm -rf mydir2`	Erases a file named "myfile" Erases the directory "mydir" and any contents/subdirectories in it Erases the directory "mydir2" and any contents/subdirectories in it. F stands for force; this prevents prompts on unwriteable files.
`ls`	`ls -F` `ls -R` `ls -a` `ls -l` `ls -i` `ls -t`	Lists working directory with trailing characters for file types. Most common are / for directory and * for executable. Lists working directory as well as all subdirectories Lists all files including "hidden files" Lists files with permissions, owner, group, time stamp List the files inode number--a unique number used by the system to identify a specific file. List files by time last modified.
`cd`	`cd reports` `cd` `cd ..`	Changes to the "reports" directory, making it the working directory. Changes back to the home directory Moves you up one directory level
`pwd`	`pwd`	Print Working Directory - prints full path to current directory.
`cp`	`cp lab1 mylab` `cp lab1 mydirectory` `cp lab1 mydirectory/mylab` `cp -r mydirectory dirname`	Copies file "lab1" to "mylab" file Copies "lab1" in your working directory to "mydirectory" Copies "lab1" to "mydirectory" and renames it "mylab". Copies "mydirectory" and all its contents into "dirname".
`mv`	`mv lab1 lab2` `mv lab1 labdirectory` `mv lab1 labdirectory/newfile` `mv labdirectory newdirectory`	Renames "lab1" to "lab2" Moves "lab1" to the "labdirectory" Moves "lab1" to the "labdirectory" and renames it "newfile" Renames a whole directory to a new directory name
`g++`	`g++ -o prog_run main.cpp` `g++ -c main.cpp` `g++ -o prog_run main.o part1.o part2.o`	compiles and links "main.cpp", calls the executable "prog_run" compiles "main.cpp" (creates the object file) links the object files (when you have different files), calls the executable "prog_run"

For another summary of Unix commands, click here.

For a Unix tutorial, click here.

Some Differences Between C and C++

C++ is NOT C.

Here are some differences that might mess you up:

The #include's are different (see the last code examples below).

You compile using gcc instead of g++.

Pass by reference in C is implemented using pointers. Consider the following example that swaps two integer values:

C++ Style	C Style
// function for passing by reference in C++ void swap (int& x, int& y) { int temp; temp = x; x = y; y = temp; return; } // end swap //In main, the call will look something like this swap(a,b);	/* function for passing by reference using pointers / void swap (int x, int* y) { int temp; temp = x; x = y; y = temp; return; } /* end swap / / In main, the call will look something like this */ swap(&a,&b);

C++ Style

C Style

// function for passing by reference in C++
void swap (int& x, int& y)
{
   int temp;

   temp = x;
   x = y;
   y = temp;
   return;
} // end swap

//In main, the call will look something like this
   swap(a,b);

/* function for passing by reference using pointers */
void swap (int* x, int* y)
{
   int temp;

   temp = *x;
   *x = *y;
   *y = temp;
   return;
} /* end swap */

/* In main, the call will look something like this */
   swap(&a,&b);

Dynamic allocation is done using malloc or calloc instead of using new. To free up memory, you use (SURPRISE!) free instead of delete. The following is an example of dynamically allocating and freeing an array of integers:

C++ Style	C Style
int *i4; //create an array of 4 integers i4 = new int[4]; //... delete[] i4;	int ic4; / create an array of 4 integers / ic4 = (int ) malloc (sizeof (int) * 4); /* ... */ free(ic4);

Instead of cout and cin, you use function calls to printf and scanf, respectively. For both of these functions, you send a "format string" as a first argument. For instance, if you are wanting to print an integer with a description first, your format string would look something like this: "The integer value is %i", where %i indicates a placeholder for a integer that will come as an additional argument. The following compares printing and inputting from the user:

C++ Style	C Style
#include <iostream> using namespace std; int main() { int a; cout << "Please enter an integer: "; cin >> a; cout << "The number is: " << a << endl; return 0; }	#include <stdio.h> int main() { int a; printf("Please enter an integer: "); scanf("%i", &a); printf("The number is: %i\n", a); return 0; }

C++ Style

C Style

#include <iostream>
using namespace std;

int main()
{
    int a;
    cout << "Please enter an integer: ";
    cin >> a;
    cout << "The number is: " << a << endl;

    return 0;
}

#include <stdio.h>

int main()
{
    int a;
    printf("Please enter an integer: ");
    scanf("%i", &a);
    printf("The number is: %i\n", a);

    return 0;
}

For more of these differences, see Dr. Hilderman's notes: Differences Between C and C++

For a couple of references on the format specifiers of printf and scanf:

printf
scanf

Preamble on Process Memory

"Every linux process has its own dedicated memory address space. The kernel maps these "virtual" addresses in the process address space to actual physical memory addresses as necessary. In other words, the data stored at address 0xDEADBEEF in one process is not necessarily the same as the data stored at the same address in another process. When an attempt is made to access the data, the kernel translates the virtual address into an appropriate physical address in order to retrieve the data from memory... we are not really concerned with physical addresses. Rather, we are investigating exactly how the operating system allocates more virtual addresses to a process." (from: http://www.linux-mag.com/2001-07/compile_01.html)

User Versus Kernel Space

A process may operate in 'user' mode or 'system' mode (kernel mode). The programs that we have been programming actually switch between these two modes. When a process changes mode, it is termed a context switch.

To summarize these two modes:

User Mode

application processes.
These are executed in an isolated environment so that multiple processes cannot interfere with each other's resources.

Kernel Mode

when the kernel acts on behalf of the process.
This would be when a system call is made, or when an exception or interrupt occurs.
Processes running in kernel mode are privileged and have access to key system data structures.

When a program is loaded as a process, it is allocated a section of virtual memory which is known as user space.

By contrast, there is another section of memory which is known as the kernel space. This is where the kernel executes and provides its services.

The remainder of these notes focus on user space.

Three Segments of User Space

The user context of a process is made up of the portions of the address space that are accessible to the process while it is running in user mode.

There are a few definitions that come in handy when we are talking about memory:

global variable—one declared outside of any function
local static variable—one declared inside a function with the static keyword. Such a variable keeps its value across function calls. For instance, if it has the value 57 after you return from the function call, when that function is called again, it will still be 57.
local automatic variable—any variable declared inside a function without the static keyword
heap—is dynamically allocated space.
- In C, you use malloc or calloc
- In C++, you use new

The portions of address space are: text, data, and stack. They are described in further detail below:

Text

sometimes called the instruction or code segment
contains executable program code and constant data
is read only
multiple processes can share this segment if a second copy of the program is to be executed concurrently

Data

contiguous (in a virtual sense) with the text segment
subdivided in to three sections:
1. initialized data (static or global variables)
2. uninitialized data (static or global variables)
3. heap (dynamically allocated)
addresses increase as the heap grows

Stack

used for function call information including:
- address to return to
- the values or addresses of all parameters
- local automatic variables
addresses decrease as the stack grows

Notice that the stack grows towards the uninitialized data and the heap grows towards the stack.

Some compilers and linkers call uninitialized data and heap bss (Block Started by Symbol—not bs) for historical reasons.

Memory references to Text, Data and Stack in a user space program are done with virtual addresses. The kernel translates these virtual addresses to physical memory addresses.

In working with these virtual addresses, you have access to three external variables:

etext—first valid address after the text segment
edata—first valid address after the initialized data segment
end—first valid address after the uninitialized data segment

Program 1.4, in Interprocess Communications in UNIX: The Nooks & Crannies, provides an example of displaying these external variables. The program also displays the address of some key identifiers to verify that the identifiers (variables) are put into the correct segments.

// Program 1.4 in Interprocess Communications in UNIX: The Nooks & Crannies.
#include <stdio.h>
#include <stdlib.h>   //needed for exit()
#include <string.h>
#include <sys/types.h>
#include <unistd.h>

#define SHOW_ADDRESS(ID, I) printf("The id %s \t is at:%p\n", ID, &I)

extern int etext, edata, end;

char *cptr = "Hello World\n"; // static by placement
char buffer1[25];

int main(void) 
{
    void showit(char *); // function prototype
    int i=0; //automatic variable, display segment adr


    printf("Adr etext: %p\t Adr edata: %p\t Adr end: %p\n\n",
            &etext, &edata, &end);

    // display some addresses
    SHOW_ADDRESS("main", main); 
    SHOW_ADDRESS("showit", showit);
    SHOW_ADDRESS("cptr", cptr);
    SHOW_ADDRESS("buffer1", buffer1);
    SHOW_ADDRESS("i", i);
    strcpy(buffer1, "A demonstration\n");   // library function
    write(1, buffer1, strlen(buffer1) + 1); // system call 
    for (; i<1; ++i)
        showit(cptr); /* function call */

    return 0;
}


void showit(char *p) 
{
    char *buffer2;
    SHOW_ADDRESS("buffer2", buffer2);
    buffer2 = (char *)malloc(strlen(p)+1);
    if (buffer2 != NULL) 
    {
        strcpy(buffer2, p);    // copy the string
        printf("%s", buffer2); // display the string
        free(buffer2);         // release location
    }
    else 
    {
        printf("Allocation error.\n");
        exit(1);
    }
}

A sample run on a linux machine produced the following output:

Adr etext: 0x56032bc00b6d    Adr edata: 0x56032be01018   Adr end: 0x56032be01050

The id main      is at:0x56032bc008ba
The id showit    is at:0x56032bc00a22
The id cptr      is at:0x56032be01010
The id buffer1   is at:0x56032be01030
The id i     is at:0x7fff24a142e4
A demonstration
The id buffer2   is at:0x7fff24a142c0
Hello World

References

Interprocess Communications in UNIX, The Nooks & Crannies by John Shapley Gray (pages 11 to 15)
Unix Network Programming by W. Richard Stevens (pages 19 to 22)
For more on User versus Kernel Mode: http://www.linuxgazette.com/issue23/flower/intro.html
For more about Virtual Addressing and BSS: http://www.cs.duke.edu/courses/spring01/cps110/slides/process.pdf
For more on the Three Segments of Process memory (Dr. Hamilton's notes): http://www2.cs.uregina.ca/~hamilton/courses/330/notes/memory/memmngment.html
For more on Dynamic Memory Allocation: http://www.linux-mag.com/2001-07/compile_01.html