Ultimately the shell's purpose is to take a user command and put it in the form Unix requires for starting execution of new programs: execl( PROGFILE, ARG0, ARG1, ARG2, ..., 0 ). For example, if your command were "nroff -ms myfile", the shell's job would be to execl( "/usr/bin/nroff", "nroff", "-ms", "myfile", 0 ), where "/usr/bin/nroff" tells Unix in which file to find the nroff program. In this case the shell had very little work to do. If your next command were "!! | lpr ; wc * > ~/wcout", the shell would have much more work to do and end up with 3 execl calls bearing little resemblance to your command. This is important because what the shell winds up sending to execl as arguments are what the programs involved really see.
A program that is executing, as opposed to one that is stored in a file, is called a process. When you login, Unix finds the C shell program in the file "/bin/csh" and starts it running as a process on your terminal. The same happens to everyone else when they login, but each of the resulting processes is independent and has no knowledge of any other processes except those it might create. Thus you have your own shell when you login, and can in fact personalize it to some extent.
In a little greater detail than before, here is what the C shell does with a command. To illustrate this suppose you enter the command
% nroff -ms chap* > outfile
Your shell process ...
[1] reads the command and breaks it into separate command words:
"nroff", "-ms", "chap*" ">", "outfile";
[2] makes new command words if necessary: in this case replaces
the command word "chap*" by all filenames beginning with "chap",
for example, "chapintro", "chapter1", "chapter2";
[3] finds a file (assumed to contain the program) named by the
first command word: "/usr/bin/nroff";
[4] makes a copy of itself -- a child process -- which will later
be transformed into the nroff process.
Here the child and parent processes do different things.
[5] The child sets up input and output, removing command words
which indicate redirection: in this case opens a file called
"output" to which all future output from this child process will
be written instead of the terminal and removes the words ">" and
"outfile" from your command;
[6] the child transforms itself into the program found in step 3
above using execl: execl( "/usr/bin/nroff", "nroff", "-ms",
"chapintro", "chapter1", "chapter2", 0 );
[7] the child dies, either because it is done or there was an er-
ror, at which point the Unix kernel removes all traces of it and
sends a signal of this event to the parent process;
[8] the parent process meanwhile literally waits idly for the
child process to finish, and then issues a prompt for another
command.
Each of these steps have interesting and important ramifications. Some are
explained below, others are mentioned below and explained elsewhere.
[1] Reads the command and breaks it into separate command words.
This step (lexical analysis) is needed to get the command words
(arguments) into the execl format. It gives the typist some flexibility
while imposing some restrictions. In particular, the shell breaks the
line into separate words at blanks and tabs, treating multiple blanks and
tabs as if they were one blank. So, for example, if you accidentally type
extra blanks at the beginning or end of the command, or between words, the
shell will probably do what you had in mind. On the other hand, if you
leave out blanks between two adjacent arguments, it will go ahead and
bundle them up as one word. For example, the shell considers the command
% nroff-ms myfile
as having only two words, the name of the command being "nroff-ms", then
tries unsuccessfully to locate the program (step 3) in a file of that
name and responds with
nroff-ms: Command not found.
The last argument would have been correctly interpreted as "my-file". To
add another twist, the command
nroff -ms-o1,5 myfile
would be execl'd successfully (step 6) but would provoke an error message
from nroff.One additional rule says that any one of the characters &|;<>() is considered a separate word, except when one of &|<> appear doubled, in which case the doubled character is one word. For example, the commands
are interpreted identically, each consisting of 9 words.% neqn > outfile& % neqn < paper | nroff -ms >> outfile &
On the other hand, if you want a blank, tab, or one of &|;<>() to be considered part of another word, you must surround it with quote marks of the type ", `, or ', or precede it with a \ (use of \ is also termed quoting). If you want a carriage-return (newline) to be part of a word, you must surround it with quote marks AND precede it with a \, since preceding it with a \ and not using quote marks is treated as a blank.
Beware. Strictly speaking, quoting prevents the shell from interpreting the quoted characters according to its usual practice, and this discussion only mentions how the usual practice is suspended with respect to word separation. There are other much more profound side-effects of quoting depending on both the quoted and the quoting characters. The documentation is perhaps more unyielding, incomplete, and confusing on this issue than on any other.
[2] Makes new command words if necessary.
The C Shell recognizes a large variety of characters and constructs as having
special meanings and substitutes other words in their place. This means that if
your command line contains any of them, as in "!!
| lpr ; wc * > ~/wcout" from before, the resulting call (or calls) to execl
(step 6) may be the result of sweeping changes made in this step. Note that the
programs being called never see your original command and never have to know anything
about the special characters. Consequently, the same substitution rules apply
to ALL programs called from the shell (for example, "lpr", "vi", "nroff", etc.).
Substitutions are classified by type and are applied in a definite order. The shell scans command words for characters or constructs of the first type, making substitutions if it finds any. Then it takes the resulting command words and scans them to find and make substitutions of the second type, if any, and so forth. Here is a list of substitution types in order with an indication of the kinds of special characters that will trigger them.
Type Triggered By Typical Uses
-------------------------------------------------------------------
History !event, ^old^new re-use earlier commands
Alias first command word re-name commands
Variable $var, $#var, $var[n] scripts, personalized shell
Command `shell command` use command output as args
Filename *, ?, [], {}, ~ abbreviate groups of files
Input/Output <, >, |, <<, >>, $< re-route input and output
Expressions ( x <>=!~+-*/()&|^ y ) arithmetic and branching
In the hands of a sober, well-informed user, substitutions are very
useful: (1) they can save tremendous amounts of typing, (2) they need only
be learned for the shell, since all programs called by users have to go
through the shell, and (3) they make it possible to write programs
consisting of shell commands.In the wrong hands, however, substitutions can be tricky. To help you practice, the shell provides a way for you to see exactly what it comes up with just before it calls execl. The command "set echo" will cause it to print your command after all substitutions have been made, just before calling execl. To avoid the danger of executing a possibly incorrect command, you can test whether a construct will end up the way you think just by entering it as an argument to the "echo" command. The "echo" command does nothing more than print its arguments on the terminal and like all commands is subject to substitutions. So, for example, "echo *" prints the words that would result, on any command line, from substituting for * (which lists all your files).
[3] Finds a file named by the first command word.
The whole point of the shell is to run programs other than itself, such as
"vi", "cc", "troff", etc. Occasionally there is a need for a command that
the shell can perform internally, that is, without locating a program file
or creating another process. So in this step the shell usually tries to
locate a file containing the program named by the first command word, but
not before checking to see if it belongs to the set of commands built-in
to itself.If a command is non-built-in, the shell scans a list of directories called the searchpath, which may be personalized for each user. It appends the first command word to the first directory on the list and checks to see if the resulting file name exists. If not, it checks the second directory in a similar fashion, and so forth, until a file is found, and that file name is used when execl is called in step 6. In the case that no file is found, the shell reports this and prompts for another command.
If your searchpath becomes garbled, usually because you were experimenting with it, the shell may not find some or all of the usual non-built-in commands. Besides panicking, there are two things to do. Fortunately, the command to correct the searchpath is built-in and can still be used, but only if you recognize that that is the problem. Also, if the first command word begins with a /, the shell considers it to be the name of the program file to execute, for example, the command "/usr/ucb/vi .cshrc" would work.
If a command is built-in, the shell bypasses steps 4, 6, 7, and 8, which reduces run time greatly, and performs the command in its own way. For the sake of efficiency, a built-in command is preferred to a non-built-in command if they perform the same function, and that is why some of the built-in commands were created. Other commands were built-in because they would not have worked otherwise, due to the way that processes disappear completely in step 7; in particular, if a command is needed to change the behavior of your shell from that point on, a non-built-in command would only be able to change the characteristics of a child process of your shell, the shell process that will read your next command when the child dies leaving no trace of the change.
The "echo" command, for example, is built-in to the C shell because it is used so often. A quick and ugly way to list the files in your directory, without using the "ls" command, is to type "echo *". A very quick way to create a one line file, without "vi", is "echo This is a one line file. > oneliner". Some commands that have to be built-in are "cd", "set", "alias", and "history". Unfortunately, most built-in commands do not have separate manual sections, so the command "man set" will yield nothing, while "man csh" will tell you about "cd" after printing the first 9 pages or so. Ironically, "man echo" will display a manual page because users of the Bourne shell do not have a built-in "echo" command.
[4] Makes a copy of itself -- a child process.
The Unix kernel requires the C shell -- in fact, requires all programs
that run other programs -- to use execl. Unfortunately, that causes the
process running the new program to die when it is done. Your shell
therefore has to create a new process to do the execl in order that the
old process survive to prompt you for the next command. The only way to
create a new process on Unix, though, is for an existing process to make a
copy of itself by executing a program statement called fork. The new and
old processes are identical except that one knows it is a parent and the
other knows it is a child, and the internal code of the program for both
processes can take different branches on the basis of this information.
This step is time-consuming, and the documentation sometimes mentions
useful ways to avoid having to fork new processes, for instance, by using
built-in commands.
[5] The child sets up input and output.
In this step, the command words are scanned for special input or output
redirection constructs. When these constructs have been interpreted, they
are removed from the list of command words. Any output file specified is
created if it does not already exist. If the file or directory does not
have the correct permissions, or an input file does not exist, the shell,
not the program named by the first command word, issues an error message
and prompts for another command. The program to be run has no knowledge
that its inputs and outputs have been changed.In the presence of a pipe between commands, the shell removes the pipe constructs from the command line after first breaking it up into separate subcommands. Each of these subcommands is processed like any other command, with a separate fork and execl for each. The main difference is that the parent sets up input and output between processes and has them all started up before beginning to wait on any of them.
[6] The child transforms itself into the program found in step 3.
This is where the child does the execl, but not precisely. For simplicity I did
not mention that the actual call is of the form: execve(
PROGFILE, ARG0, ARG1, ARG2, ..., 0 , ENV0, ENV1, ENV2, ..., 0 ). The new
arguments (after the first 0) contain definitions of all the current process's
environment variables. These may contain any information the user may choose to
store in them using the built-in command "setenv" and have the property that besides
input/output redirection, the current directory, and a handful of other data,
they are some of the very few things that can be inherited by the new program
after execl.
[7] The child dies.
Processes can finish normally or abnormally, but all of them die
eventually. For example, when you leave "vi" by typing ZZ, or when
"nroff" stops because of a macro/diversion overflow, then the associated
processes die. Your shell itself is a process which dies when you
logout.When the child process running the new program dies, the Unix kernel sends a signal to the parent process (your shell) notifying it of the event.
[8] The parent waits for the child to die, then prompts the user.
In the meantime, the parent process has executed a program statement
called wait which just puts it on hold until Unix sends a signal notifying
the shell that the child has died. If you had entered an & at the end of
the original command, your shell would not wait for notification of the
child's death but would print the child's process number and then prompt
you for the next command. That procedure is called backgrounding a
process.While the C shell is waiting for the child (only on 4.1 or 4.2 BSD Unix) you can type ^Z to wakeup the parent and freeze the child for the time being. At that point you could enter other commands to shell and at a later time you could issue commands to resume execution, kill it altogether, or resume execution in the background. This useful feature is called job control.