Pipelines and Filters

Section 7: Pipelines and Filters

CONCEPT: Unix allows you to connect processes, by letting the standard output of one process feed into the standard input of another process. That mechanism is called a pipe.

Connecting simple processes in a pipeline allows you to perform complex tasks without writing complex programs.

EXAMPLE: Using the more command, and a pipe, you can manage the screen presentation of command output. Examine the contents of the /etc directory by typing

ls -l /etc | more

to the shell.

If you type "q" to exit the more command, where does the remaining output of the ls command go? The answer lies in the way pipelined processes communicate. When the kernel creates a process, each of the process's three file descriptors (standard input, standard output, standard error) is assigned an area, occupying a small block of the computer's memory. The pipeline is established when the contents of the ls command's output buffer is copied into the input buffer of the more command. When the more command is terminated, the Unix operating system will terminate the ls command, as it has nowhere to copy it's output.

EXERCISE: How could you use head and tail in a pipeline to display lines 25 through 75 of a file?

ANSWER: The command

cat file | head -75 | tail -50

would work. The cat command feeds the file into the pipeline. The head command gets the first 75 lines of the file, and passes them down the pipeline to tail. The tail command then filters out all but the last 50 lines of the input it received from head. It is important to note that in the above example, tail never receives the original file, but only sees the 75 lines that were passed to it by the head command.

It is easy for beginners to confuse the usage of the input/output redirection symbols < and >, with the usage of the pipe. Remember that input/output redirection connects processes with files, while the pipe connects processes with other processes.

Grep

The grep utility is one of the most useful filters in Unix. Grep searches line-by-line for a specified pattern, and outputs any line that matches the pattern. The basic syntax for the grep command is grep [-options] pattern [file]. If the file argument is omitted, grep will read from standard input. It is always best to enclose the pattern within single quotes, to prevent the shell from misinterpreting the command.

The grep utility recognizes a variety of patterns, and the pattern specification syntax was taken from the vi editor. Here are some of the characters you can use to build grep expressions:

The caret (^) matches the beginning of a line.
The dollar sign ($) matches the end of a line.
The period (.) matches any single character.
The asterisk (*) matches zero or more occurrences of the previous character.
The expression [a-b] matches any characters that are lexically between a and b.

Note that some of the pattern matching characters are also shell meta characters. If you use one of those characters in a grep command, make sure to enclose the pattern in single quote marks, to prevent the shell from trying to interpret them.

EXAMPLE: Type the command

grep 'jon' /etc/passwd

to search the /etc/passwd file for any lines containing the string "jon".

EXAMPLE: Type the command

grep '^jon' /etc/passwd

to see the lines in /etc/passwd that begin with the character string "jon".

EXERCISE: List all the files in the /tmp directory owned by the user root.

EXPLANATION: The command

ls -l /tmp | grep 'root'

would show a long listing of all files in the /tmp directory that contain the word "root". Note that files not owned by the root user may contain the string "root" somewhere in the name, and would appear in the output, but the grep filter can cut the down the number of lines of output you will have to look at.