CS330 Unix, Linux, and Tokenizing Strings


Highlights of this lab:


Lab Code

To get the sample and exercise code, please use the following commands in the directory that you have created for this lab:
    wget www.labs.cs.uregina.ca/330/Linux/Lab3.zip
    unzip Lab3.zip
    

Unix Introduction

Brief History

Advantages of Unix

Unix Flavours

There are many different implementations of Unix. They all have subtle differences in the way that they operate. Most modern Unixes try to comply with the Single UNIX Specification (SUS), and most commercial UNIXes are officially registered as SUS complient.

A few commercial implementations available in the CS Department are:

Linux comes in many different versions, called distributions or distros. Some are free. Others, the user must pay for a support contract. Many smaller distributions are based on one of these. No Linux distros are officially SUS compliant because of the costs involved, but the Linux Standard Base (LSB) includes SUS standards.

Here's a list of some Linux distributions:

By the way, to see the current version of Linux running on a machine, you can try this command: lsb_release -a


Parts of Unix

Unix is organized at three levels:

  • kernel

“The UNIX kernel is built specifically for a machine when it is installed. It has a record of all the pieces of hardware it needs to talk to and knows what languages they speak (how to turn switches on and off to get a desired result).” http://www.extropia.com/tutorials/unix/kernel.html

  • shell

The Unix shell provides a user interface. “The most basic UNIX shell provides a 'command line' which allows you to type in commands which are translated by the shell into kernel speak and sent off to the kernel.” http://www.extopia.com/tutorials/unix/shells.html

  • tools and applications

These provide additional functionality to the operating system. To see some tools that you have access to check out: /bin or /usr/bin


More on the Unix Shell

There are several different shells, they offer their own advantages and disadvantages. For instance, some allow for auto completion using the tab key; others don't.

A few common shells are the following:

For more on these shells, click here.

The following has a side by side comparison of some shells https://hyperpolyglot.org/unix-shells

To see what shells exist on your current Unix system, try the following command:

$ cat /etc/shells

How do you get your shell?

Before you get a shell, you identify yourself with a login name and password. That login name is looked up in a file called /etc/passwd, which describes each user's account. More specifically, it tells your unique numeric ID, principal group ID, general information, home directory, and shell.

Try the following command:

$ cat /etc/passwd | grep yourusername

The format of the data in the /etc/passwd file is

Name:Password:UserID:PrincipleGroup:Gecos:HomeDirectory:Shell

Note, each of these attributes are separated by a : (colon). The last attribute on the line is your shell.

If you are feeling daring you can change your shell with the chsh command.

More information about /etc/passwd can be found here.


Focus on Permissions

Each file and directory in Unix contains a set of permissions that determine who can access it and how. There are three levels of access to set:

  1. You can restrict access to yourself alone (user)
  2. You can allow users in a predesignated group to have access (group)
  3. You can permit anyone on your system to have access (world)

How do you view permissions?

The ls command with the -l option allows you to view a file's permissions (among other information).

 $ ls -l mydata
 -rw-r--r-- 1 chris weather 207 Feb 20 11:55 mydata
 
The breakdown of this information is as follows:
File Type Permissions Number of Links Owner Name Group Name Size of File in Bytes Date and Time Last Modified File Name
- rw-r--r-- 1 chris weather 207 Feb 20 11:55 mydata

Right now, the owner of mydata has read and write permissions, and the group, and the world have read permissions. How do I know? The permissions are organized in groups of three:

In addition,

What would the following permissions represent?

  1. -rwxr--r--
  2. drwxr-xr-x
  3. -rwxrw-r--

How does Unix determine who has permissions to access files?

Again it comes down to the /etc/passwd file. In this file, you have a unique numeric id, and a principle group id (also numeric). When you create a file, your unique numeric id and principle group id are assigned to that file. If there is a match of these numbers, then you will have specific permissions (according to whether you are user/group/world).

You have a principle group id, but you may also belong to other groups that are not your principle group. To know what groups you belong to, try the following command:

$ groups

This command gets its information from the /etc/group file as well using your principle group id.

How do I set permissions?

To set permissions, you use chmod. There are two main usages of chmod:

Symbolic Permission Mode:

The general format for using the symbolic permission mode is the following:
chmod 'access class' operator 'access type' filename
For example, this would add executable access for the user:
$ chmod u+x testfile
The following summarizes the values of "access class", "operator", and "access type" in the above syntax:
  1. Access Class
  2. Operator
  3. Permissions

Given a base permission of -rw------ for a file called "myfile", what would the resulting permission be after the following chmod calls?

  1. chmod u+x myfile
  2. chmod a+x myfile
  3. chmod g+r myfile

Absolute Permission Mode

Another way to change permissions is by using a numeric (octal) code. Typically, you will use three octal numbers: one for the user, one for the group and one for other (world).

The syntax for using chmod in absolute permission mode is:

chmod 'octal permissions' filename
For example:
$ chmod 744 myfile
Each of the three octal digits represent the read, write, and execute permissions for the user, group, and world respectively.

The following table summarizes the octal digits and how the permissions are affected:

Octal Binary Permissions
0 000 ---
1 001 --x
2 010 -w-
3 011 -wx
4 100 r--
5 101 r-x
6 110 rw-
7 111 rwx

What would the permissions look like on "myfile" after the following chmod calls?

  1. chmod 755 myfile
  2. chmod 644 myfile
  3. chmod 711 myfile

For more on chmod click here


Review of Strings and C Strings

  C++ Strings C Strings
general dynamic length, can change length during the program fixed length determined when declared, ends in '\0'
#include #include<string> #include<cstring>
declaring string theString; char cString[100];
copying theString=theString2; strncpy(cString,cString2,100);
getting a line getline(cin,theString); cin.getline(cString,100);
determining length theString.length(); strlen(cString);
comparing if (theString==theString2) if(!strncmp(cString,cString2,100))

A handy thing to know is how to convert a String into a C String (for copying, perhaps?). The syntax is:

strncpy(cString,theString.c_str(),100);

You may also need a review of using getline to read lines until the end of a file. The following is meant as a refresher:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
    ifstream inFile("test.txt");
    string strOneLine;

    while (inFile)
    {
       getline(inFile, strOneLine);
       cout << strOneLine << endl;
    }

    inFile.close();

    return 0;
} 
Note:

Dynamically Allocating C Strings

The following is meant as a review of how to dynamically allocate and free up space.

C++ Style C Style
char *s4;

//determine size + 1 for null
s4=new char[strlen("hello") + 1];   
//Copy the strings
strncpy(s4,"hello",6);  

//... 

delete[] s4;
char *s4;

//determine size + 1 for null
s4=(char*)malloc(strlen("hello") + 1);   
//Copy the strings
strncpy(s4,"hello",6);  

//...

free(s4);

Well, maybe it's not quite a review. Some of you may not have worked with malloc and free. The reason for introducing it now is that you can reduce the above code to the following:

char *s4;
   
s4=strdup("hello");   //make copy of "hello"

//...

free(s4);

strdup is not a part of the C or C++ standard; it is included in the POSIX standards. If you are lucky enough to be programming on a POSIX compliant OS such as Linux (the lab) or Solaris or Mac OS X then you can use strdup:

  • strdup(const char *s)
returns a pointer to a new string that is a duplicate of the string pointed to by s. The returned pointer should be released with free() because the space for the new string is obtained using malloc. If the new string cannot be created, a null pointer is returned.

Notes:


Splitting C Strings into Tokens

Sometimes you may want to split a line into tokens or words. To do that, there is a C String function called strtok. The prototype is:
char * strtok (char * str, const char * delimiters)
where str is the line (or C string) that you want to split into tokens or words, and delimiters are an array of characters in which any one of the characters delimits or marks the boundaries between words.

The following is an example of using strtok:

#include <iostream>
#include <cstring>
using namespace std;

int main(int argc, char *argv[])
{
   char cstr1[]="This is a sample string. Is it working?";
   char delim[]=" ,.-;!?";
   char *token;

   cout << "cstr1 before being tokenized: " << cstr1 << endl << endl;

   //In the first call to strtok, the first argument is the line to be tokenized
   token=strtok(cstr1, delim);
   cout << token << endl;

   //In subsequent calls to strtok, the first argument is NULL
   while((token=strtok(NULL, delim))!=NULL)
   {
         cout << token << endl;
   }
}

The output:

cstr1 before being tokenized: This is a sample string. Is it working?

This
is
a
sample
string
Is
it
working

There are a couple of "catches" with strtok:

  1. In the first call to strtok, the first argument is the line or C string to be tokenized; in subsequent calls to strtok, the first argument is NULL. Notice the two calls from the lines above:
  2. The original C string is modified when it is tokenized so that delimiters are replaced by null terminators ('\0'). The following represents what the C string in the sample code will look like after tokenizing:
    after tokenizing

 


Dynamic Arrays of C Strings

Sometimes you want to have a dynamically created array of C Strings. The following code demonstrates this:

#include <iostream>
#include <cstring>
using namespace std;

int main ()
{
  char **words;
  char tempWord[100];
  char endWord[]="330!";

  words = new char *[3]; //allocate pointers to three words
  //OR words = (char **) malloc (sizeof(char *) * 3);

  //--------------
  //get two words from the user input--use strdup to dynamically allocate space 
  cout << "Please input a word (less than 100 characters): ";
  cin >> tempWord;
  words[0]=strdup(tempWord);

  cout << "Please input a second word (less than 100 characters): ";
  cin >> tempWord;
  words[1]=strdup(tempWord);

  //--------------
  //the third one hard code copy of "330!" (endWord)
  words[2]=strdup(endWord);
  
  //--------------
  //print and clean up individual words as you go
  for (int i=0; i<3; i++)
  {
     cout << words[i] << endl;
     free(words[i]);   //remember that space was set aside by strdup 

  }
  
  //--------------
  //Clean up the array of words
  delete [] words;     // cleans up words = new char *[3];
  //OR if you used malloc: free (words);

}

References and More Info

Extra Info: