CS330 Pipes and Filters

Highlights of this lab:

Command Line Pipes
Command Line Filters
Related System Calls and Pipes Preamble
Unnamed Pipes
Named Pipes
References

Lab Code

To get the sample and exercise code, please use the following commands in your cs330 directory:

   curl -O -s https://www.labs.cs.uregina.ca/330/Pipe/Lab8.zip
   unzip Lab8.zip

Command Line Pipes

The purpose of this lab is to introduce you to a way that you can construct powerful Unix commands by chaining together several Unix commands.

Unix commands alone are powerful, but when you combine them together, you can accomplish complex tasks easily. One way you can combine Unix commands is through using pipes and filters on the command line.

Background of Input and output

Programs normally have some input and some output. In other words, some data has to be entered and some kind of analysis will come out afterwards.
The input to a program might come from a file, or might be typed in to the program by the user.
The output might appear on the screen, be sent to a file or to a printer.
Programs for UNIX are usually written as though input is typed in by the user, and output appears on the screen.

Using Pipes and Redirection

The symbol | is the Unix pipe symbol that is used on the command line.

What it means is that the standard output of the command to the left of the pipe gets sent as standard input of the command to the right of the pipe.

Example 1:

There is a program (or UNIX command) in UNIX which reports who is logged onto the system: who
If I wanted to print out the list of who is on my system, I would type:

$ who | lpr -Pcl115

The "|" is a pipe, and this type of pipe sends the stream of data to another program, in this case, a program called lpr which sends all incoming data to the printer in CL115.

Example 2:

$ cat weather.txt
input
string shell
signal
$ cat weather.txt | wc 
      3       4      26

In this example, at the first shell prompt, the contents of the file weather.txt are displayed.

In the next shell prompt, the cat command is used to display the contents of the weather.txt file, but the display is not sent to the screen; it goes through a pipe to the wc (word count) command.

The wc command then does its job and counts the lines, words, and characters of what it got as input.

If I wanted to store the information from the who command in a file I could redirect standard output to a file

$ who > current_users

The ">" takes standard output and redirects it to a file.
In this case the file is called "current_users".
It's important to note that you can not use ">" with an existing file. You will get a message: "File Exists".
If you want to append to the end of an existing file, you can use ">>"

Another type of redirection takes data from a file and puts it into a program (as standard input):

$ grep "smithp" < current_users

Here the contents of the file current_users is given to a program call "grep" which filters out lines in its input that contain a particular string of characters.

Command Line Filters

A filter is a Unix command that does some manipulation of the text of a file. In this section, we will talk about three popular Unix filters are sed, awk, and grep.

sed

Here is a simple way to use the sed command to alter the contents of the weather.txt file:

$ cat weather.txt
input
string shell
signal
$ cat weather.txt | sed -e "s/string/signal/g"
input
signal shell
signal                     
$ cat weather.txt | sed -e "s/i/WWW/"
WWWnput
strWWWng shell
sWWWgnal             
$

In this example, the first shell prompt displays the contents of the weather.txt file.

The second shell prompt, uses the cat command to display the contents of the weather.txt file, and sends that display through a pipe to the sed command.

The sed command changed every occurrence of the word "string" to the word "signal."
The sed took as input the information it got through the pipe.
The sed command displayed its output to the screen.

The third shell prompt, uses the cat command on the weather.txt file and pipes the output to the sed command to change the first occurrence of an "i" on each line to "WWW".

It is important to note that, in this example, the contents of the weather.txt file itself were not changed in the file. Only the display of its contents changed.

awk

The Unix command awk is another powerful filter. You can use awk to manipulate the contents of a file.

Here is an example:

$ cat basket.txt
Layer1 = cloth
Layer2 = strawberries
Layer3 = fish
Layer4 = chocolate
Layer5 = punch cards
$ cat basket.txt | awk -F= '{print $1}'
Layer1
Layer2
Layer3
Layer4
Layer5                   
$ cat basket.txt | awk -F= '{print "HAS: " $2}'
HAS:  cloth
HAS:  strawberries
HAS:  fish
HAS:  chocolate
HAS:  punch cards              
$

First, the contents of the basket.txt file are displayed.
Then, the contents are sent through a pipe to the awk command.
- The awk command displays the first word on each line that comes before the = sign.
Then, thirdly, the display of basket.txt is sent through a pipe to awk
- This time, HAS: is printed at the beginning of every output line followed by the second word on each line of basket.txt.
- The "=" is the field delimiter. $1 represents the first field. $2 represents the second field

grep

The Unix grep command helps you search for strings in a file.
Here is how I can find the lines that contain the string "jewel" and display those lines to the standard output:

$ cat apple.txt
core
worm seed
jewel
$ grep jewel apple.txt
jewel                                    
$

Related System Calls and Pipes Preamble

The main system calls that will be needed for this lab are:

read()
write()
close()
pipe()
dup2()

First, we will talk a little bit about this concept of pipes:

Preamble

Think of a pipe as a special file that can store a limited amount of data in a first-in-first-out (FIFO) manner.

There are two kinds of pipes:

named pipes
unnamed pipes

Unnamed pipes can only be used with related processes (eg. parent/child, or child/child) and exists only as long as the process using them.

Named pipes exist as directory entries that have file access permissions. They can, therefore, be used with unrelated processes.

These notes will focus on the unnamed pipes, which use the pipe system call.

First we will review the read(), write(), and close() system calls:

read():

   #include <unistd.h>
   ssize_t read(int fildes, void *buf, size_t nbyte);

Data is read from the pipe using the unbuffered I/O read() system call.

The read() system call will read nbytes from the open file associated with the file descriptor filedes into the buffer referenced by buf.

If the read call is successful the number of bytes actually read is returned.

NOTE* All reads are initiated from the current position (i.e. no seeking supported)

write():

   #include <unistd.h>
   ssize_t write(int fildes, const void *buf, size_t nbyte);

Data is written to the pipe using the unbuffered I/O write() system call.

Using the file descriptor specified by filedes, the write() system call will attempt to write nbyte bytes from the buffer referenced by buf.

close():

   #include <unistd.h>
   int close(int fildes);

close() closes the file indicated by the file descriptor fildes.

Unnamed Pipes

An unnamed pipe is constructed using the pipe system call.

   #include <unistd.h>
   int pipe(int filedes[2]);

If successful, the pipe system call creates a pair of file descriptors, pointing to a pipe inode, and places them in the array pointed to by filedes.

The file descriptors reference two data streams.

filedes[0] is for reading
filedes[1] is for writing

Example (halfpipe.cpp):

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

#define BUFSIZE 50

main(int argc, char *argv[])
{
   int f_des[2];
   static char message[BUFSIZE];

   // Print usage if wrong number of arguments
   if (argc!=2)
   {
      fprintf(stderr, "Usage: %s message\n", *argv);
      exit(1);
   }

   // Open a pipe and report error if it fails
   if (pipe(f_des)==-1)
   {
      perror("Pipe");
      exit(2);
   }

   // Use switch for fork, because parent doesn't need child's pid.
   switch (fork())
   {
      case -1:  // Error
	 perror("Fork");
	 exit(3);

      case 0:   // Child
         //Close pipe out and read from pipe. Report errors if any.
	 close(f_des[1]);
	 if (read(f_des[0], message, BUFSIZE)!=-1)
	 {
	    printf("Message received by child: [%s]\n", message);
	    fflush(stdout);
	 }
	 else
	 {
	    perror("Read");
	    exit(4);
	 }
	 break;

      default:  // Parent
         //Close pipe in and write to pipe. Report errors if any.
	 close(f_des[0]);
	 if (write(f_des[1], argv[1], strlen(argv[1])) !=-1)
	 {
	    printf("Message sent by parent: [%s]\n", argv[1]);
	    fflush(stdout);
	 }	
	 else
	 {
	    perror("Write");
	    exit(5);
	 }
   }

   exit (0);
}

Sample Run:

% a.out HELLO
Message sent by parent: [HELLO]
Message received by child: [HELLO]

In the parent process, the pipe file descriptor f_des[0] is closed and the message (the string referenced by argv[1]) is written to the pipe file descriptor f_des[1].

In the child process, the pipe file descriptor f_des[1] is closed and pipe file descriptor f_des[0] is read to obtain the message.

While the closing of the unused pipe file descriptors is not required, it is good practice.

Remember that for read to be successful the number of bytes requested must be present in the pipe or all the write file descriptors for the pipe must be closed so that an end-of-file can be returned.

The pipe file descriptors f_des[0] in the child and f_des[1] in the parent will be closed when each process exits.

Sometimes we may want to "tie" standard output and/or input to either end of the pipe. This is so that we can emulate things such as:

%last | sort

To do that, we can use the dup2 system call.

dup2()

   #include <unistd.h>
   int dup2(int fd1, int fd2);

After successful return of dup or dup2, the [file descriptors (fd1 and fd2)] may be used interchangeably. They share locks, file position pointers and flags; for example, if the file position is modified by using lseek on one of the descriptors, the position is also changed for the other. (modified from the Linux man pages)

dup2 copies file descriptor table entries from fd1 to fd2, closing the fd2 entry first if necessary.

Example (pipeline.cpp):

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

main (void)
{
        int f_des[2];

        if (pipe(f_des)==-1)
        {
                perror("Pipe");
                exit(1);
        }
        switch (fork())
        {
                case -1:
                        perror("Fork");
                        exit(2);
                case 0:         /*In the child*/
                        dup2(f_des[1], fileno(stdout));
                        close(f_des[0]);
                        close(f_des[1]);
                        execlp("last", "last", NULL);
                        exit(3);
                default:      /*In the parent*/
                        dup2(f_des[0], fileno(stdin));
                        close(f_des[0]);
                        close(f_des[1]);
                        execlp("sort", "sort", NULL);
                        exit(4);
        }
}

Named Pipes

For completeness, named pipes are mentioned here. The following paragraph is taken from pg 132 of Interprocess Communication in UNIX:

UNIX provides for a second type of pipe called a named pipe or FIFO (we will use the terms interchangeably). Named pipes are similar in spirit to unnamed pipes but have additional benefits. When generated, named pipes have a directory entry. With the directory entry are file access permissions and the capability for unrelated processes to use the pipe file. Named pipes can be created at the shell level (on the command line) or within a program.

Example:

[1]% mknod PIPE p
[2]% ls -l PIPE
prw-------    1 me     csfac           0 Oct 21 10:16 PIPE
[3]% cat lab7.txt >> PIPE &
   [1] 9735
[4]% cat < PIPE

Line 1 creates a named pipe called PIPE (the p as an argument specifies that an unnamed pipe will be created)
Line 2 shows the pipe as a directory entry (the "p" at the start of the permission string shows that the file called PIPE is a unnamed pipe)
Line 3 causes the display of the contents of the lab7.txt to be redirected to the named pipe PIPE.
Line 4 will read its input from the named pipe PIPE and display its output to the screen.

You can create unnamed pipes in your program using the mknod() system call. See man 2 mknod for more details.

References

Interprocess Communications in UNIX, The Nooks & Crannies by John Shapley Gray
Man pages on UNIX