Survey of Operating Systems:
§ 5: UNIX Control FeaturesInstructor: M.S. Schmalz
Reading Assignments and Exercises
UNIX has a scripting language that has several interesting control features. These control structures can cause various UNIX commands or programs to execute, and this provides the user with a powerful high-level control capability.
This section reviews basic control structures available in UNIX that support the building of scripts and high-level UNIX command structures. This section is organized as follows:
5.1. Regular Expressions
5.2. Iteration and Control Statements
5.3. Piping and I/O Redirection and ControlInformation in this section was compiled from a variety of text- and Web-based sources, and is not to be used for any commercial purpose.
5.1. Regular Expressions
Reading Assignments and Exercises
A regular expression is a concise way of expressing any pattern of characters or abstractions denoted by a regular sub-expression. Regular expressions are constructed by combining ordinary characters with one or more metacharacters, which are characters that have special meaning for the given shell (e.g., Bourne, Korn, or C-shell) that your UNIX implementation supports.
Regular expressions can be used in conjunction with UNIX programs to:
- Specify filenames
- Search file contents
- Change file contents
Each UNIX command differs in terms of the types of regular expressions that command supports. One way to find out what regular expressions are supported is to check the manual page for the given command. In this section, we will discuss regular expressions in greater detail, with examples of command usage.
5.1.1. Practical Aspects of Regular Expressions
Regular expressions (REs) often describe patterns within text. The use of REs extends to configuration files, mail filters, text editors, and numerous programming languages. A UNIX application that manipulates text can use regular expressions.
A regular expression evaluates input data and returns an answer of true or false, similar to a relational operator in a high-level programming language. For instance, a regular expression might be configured to recognize a string S, for which a new string S' might be substituted.
Since regular expressions can be used by many applications, the result of applying a regular expression depends in part on the application. For example, after recognizing a string S, an application might substitute new text in place of S, save S in a buffer for later use, or execute a UNIX program with S as one of the arguments of the program.
For example, the UNIX
grep
utility searches a file for a particular text string. Thegrep
program can accept a search string specification as either a string, quoted string, or a regular expression.Example.Suppose one searches for all the
<title>
tags in a directory of HTML files. Thegrep
command would look like this:
grep -i '<title>' *.html
Here,
grep
evaluates whether or not each line in each*.html
file matches the description<title>
. If the line is a match, thengrep
prints out the file name and the matching line. When applied to this directory of HTML files, the result is:flea:92% grep -i '<title>' *.html OpSysOvw.html: <TITLE>Survey of Operating Systems: Overview</TITLE> Top-Level.html:<TITLE>Class Notes: COP 3610</TITLE> UNIX-cmd.html:<TITLE>Survey of Operating Systems: Overview;</TITLE> UNIX-cmp.html:<TITLE>Survey of Operating Systems: Overview;</TITLE>In this example, the regular expression is '<title>', which is a quoted character string.
Occasionally a more involved search of text datais required, using constraints that represent restrictions, qualifications, or abstractions. To do this, one needs to use regular expressions with descriptive "metacharacters", as follows.
5.1.2. Placeholders and Repetition.
For purposes of illustration, assume that an HTML directory has 225 files and 400
<title>
tags. In order to avoid searching manually through 400 tags to find one on a specific subject (e.g., "commands"), a regular expression must be employed.Example. To find only titles that reference commands, we use the following invocation of
grep
:
grep -i '<title>.*commands'
This example has two metacharacters. The period (.) says that any character can occur in that place. The asterisk (*) means that zero or more instances of the previous character can occur there. In plain English, this means "match any line that contains a < title > HTML tag followed by any number of characters, as long as the word commands appears before the end of the line".
In the preceding example, the period plays a very important role, as shown below.
Example. If the command line in the preceding example had been:
grep -i '<title>*commands'
then
grep
would search for lines that had many " > " signs after the word title, as follows:
<title>>>>>>>>>>>>>commands
.In other words, the
*
character would directgrep
to search for 0 or more instances of the character>
, which would usually not be found.5.1.3. Range Specifications
Occasionally, a regular expression should be generalized by including higher-level abstractions. One way of doing this is by using the range delimiters (
[ ]
), which can be used to specify a character set.Example. To match the digits 0-9, use the range specification
[0123456789]
.A peculiarity of the range brackets are backslash separators.
Example. The backslash separators (\) when used within the range delimiters (e.g.,
[\.\*]
) specify the period (.) as a punctuation character, and thus specify matching either (.) or (*).Note: Putting backslashes before dots and stars in order to turn off their behavior as special characters is called escaping the characters.
One can negate the range function to match anything but the specified characters, by preceding the range match with the caret (^).
Example. In
[^1234]
, the caret inside this range operator means match anything but the characters 1 through 4.Example. Suppose we want to find all the
href
codes that point to URLs that (mistakenly) have a space in the URL pathname. The UNIXegrep
program (enhancedgrep
) is used with the negated range expressions, as follows:
egrep -i 'href="[^"]* [^"]*"' *.html
The regular expression
"[^"]* [^"]*"
helpsegrep
find all thehref
lines that have a space between the begin quote and end quote. The range operator is used in a clever way, i.e., to signify any character other than a quote.5.1.4. Determining Position
In UNIX regular expressions, two characters restrict matching to a specific location within the string. The beginning of the input data is specified with (
^
), and the end with ($
).Example. To find the HTML tags that are not closed before a line break, we use
egrep
as follows:
egrep '<[^>]*$' *.html
where the regular expression
'<[^>]*$'
instructsegrep
to (1) find an HTML tag (all such tags start with <), then (2) find a string of characters that does not end with a (>) sign.The preceding examples are designed to provide some idea of how regular expressions are used in UNIX. Each UNIX program has its own regular expression syntax and regular expression processor. To find out more about UNIX commands and libraries that use regular expressions, type
man -k regular
at the UNIX prompt.5.2. Iteration and Control Statements
Reading Assignments and Exercises
Various shells handle control statements (e.g.,
if..then..else
,for
, andwhile
loops) in different ways. A new version of the Korn shell handles control structures more elegantly than the existing C shell, as summarized below.Example. Suppose one wants to determine the maximum string size within a list of strings, for example, to determine the initial number of columns in the multi-column display. This could also be used to determine the maximum width for a column of entries. A typical shell implementation would customarily be given as:
(( max_stringSize = 0 )) for fileName in * do if (( max_stringSize < ${#fileName} )) then (( max_stringSize = ${#fileName} )) fi donewhere
if..then..fi
are the if-statement keywords.The Korn shell also provides for function definitions using the following format:
function name { body } name() { body }With this technique, one can define a function that has as its body a segment of UNIX scripting language.
Variable definitions in the Korn shell functions have local scope, which means that the variable definition holds only within the function in which the variable is defined.
Example. Let a local variable
v
be defined such that it has precedence over the global variablev
.typeset v=5 <- Global definition function bar { typeset v=6 <- Local definition }At the conclusion of this code fragment, the variable
v
will have value equal to 6.Korn shell statements of the format (( expression )) denote arithmetic commands, which return True when the value of the enclosed expression is non-zero, and False when the expression evaluates to zero. The construct $((expression) can be used as a word or part of a word, and is replaced by the value of expression.
Example. Consider the code fragment
(( .sh.value = $(strlenList ${entries[@]}) + 3 ))The Korn shell evaluates the expression which includes an assignment to the
.sh.value
variable. Note that the expression$(strlenList ${entries[@]})invokes the
strlenList
built-in function and return the maximum width of the strings (given as integer values) in theentries[ ]
array. The preceding code fragment then adds 3 to the maximum width value (e.g., for formatting purposes).A conditional command in a Korn shell evaluates a test-expression and returns either True or False. For example, conditional commands can be used as part of an or list, and list, or as part of an if-elseif-else command. Conditional commands have the format:
[[ test-expression ]]When used in conjunction with an and list, Korn shell evaluates the test-expression and will execute the and component only if the
test-expression
evaluates to True. If a conditional command is part of an and list, then that the return statement will be executed only if thetest-expression
evaluates to True.[[ ${entries[0]} == $'*' ]] && return 2Iteration control in the Korn shell has two formats, namely, traditional and arithmetic-for.
Example. The traditional format is exemplified by iterating on each word in a list. For example,
for variableName [ in word-list ] do compound-list doneAn arithmetic-for command has been provided that is very similar to the C programming language for statement. The format is given as:
for (( initExpression ; condition ; loopExpression )) do compound-list doneThe
initExpression
is evaluated by the Korn shell prior to executing thefor
command. Thecondition
is then evaluated prior to each iteration ofcompound-list
. If the condition is nonzero, then the Korn shell executes the compound-list. TheloopExpression
is evaluated at the end of each iteration.5.3. Piping and I/O Redirection and Control
Reading Assignments and Exercises
Piping is a mechanism in UNIX for directing the data that a program consumes or produces to other programs. Input/Output redirection facilitates sorting the output of a program to a file (called output redirection) or using the contents of a file as input to a process (input redirection). UNIX has two I/O ports called
stdin
(standard input) andstdout
(standard output) that function like IOCS (I/O Control) buffers in selected operating systems. In practice, piping and redirection allows the user to specify a source that will respectively be written to, or a target to be read from,stdout
orstdin
.5.3.1. Piping
Piping of information is accomplished with the ("|") symbol.
Example. To pipe output from
program1
into the input ofprogram2
, the following syntax would be employed:
program1 | program2
Since the command line is scanned from left to right (standard lexicographical order employed in Western writing),
program1
is executed first. Then, the UNIX shell interpreter (command line processor) sees the pipe symbol (|) and redirects the output ofprogram1
, which is written tostdout
, to thestdin
buffer associated withprogram2
.Here follows a concrete example to help you understand the use of piping.
Example. Suppose you have a program, which we will call
program1
, that produces hundreds of lines of screen output. Further assume thatprogram1
has command-line options, which we will denote as-options
.Recall from our previous discussion that the
more
command can be used to display a file or input stream in UNIX one screen at a time. By using the piping command:
program1 -options | more
,the output of
program1
(under constraint of whatever options are specified) is sent to themore
program, which displays this information pagewise.For example, suppose you have a directory with many files, and you want to display the directory contents in detail (i.e., using the
ls -l
command). Instead of having to use the scrollbar to get through the directory listing, it is often more efficient to type:
ls -l | more
,which will display the current directory pagewise.
A similar feature is available in MS-DOS (not by coincidence).
5.3.2. I/O Redirection
Suppose one is running a program that writes to
stdout
and it is desired that the output go to a file instead, UNIX provides a feature for implementing this, called output redirection. A symmetric case exists for implementing file input to a program that otherwise receives its input fromstdin
.UNIX has two types of redirection, namely, creation/replacement and appending. In the former case, program output is directed to a file that is opened as new (if it did not exist before) or overwrites an older version (if it exists on disk). Creation or replacement uses the operator (">" or "<"), while appending to output uses the operator (">>"). Note that appending something to input is a meaningless concept in UNIX.
Example. To redirect output from
program1
into a fileout1.txt
, the following syntax would be employed:
program1 > out1.txt
.Example. If
program1
takes input from a fileinput1.txt
, then the following syntax would be employed:
program1 < input.txt
.Two concrete examples follow, which will help illustrate this powerful capability of UNIX.
Example. Suppose you have a program (
program1
) that has options denoted by-options
, and you want that program to take input from a filedata.in
and write output todata.out
. The following command could be used:
data.in > program1 > data.out
.Again scanning in lexicographical order, a UNIX shell command processor will see the file first, then the redirection command, then the program name, and will run
program1
with data fromdata.in
. Asprogram1
produces output, this output will be written todata.out
.Example. Suppose you want to redirect the output of a detailed directory listing into a file
directory.txt
. The following command line would be employed:
ls -l | cat > directory.txt
.Note that piping to the
cat
command is used to assemble the output of thels
command so it can be formatted for redirection or screen display (if redirection not used).Assume that a program (
prog1
) inputs data fromstdin
and writes output tostdout
. Suppose we want to runprog1
repeatedly, to build up an output file that consists of many instances of runningprog1
. The UNIX appending operator (>>) is useful in this respect, as shown in the following example.Example. Suppose you have a program (
prog1
) that computes the mean of a list of numbers fromstdin
, then writes the filename and the mean tostdout
. Further assume that you want to produce a report that portrays the results of applying this program to many different lists of numbers. The following command line could be employed:
input.dat > prog1 >> output.rpt
,where
input.dat
denotes one instance of the input file, andoutput.rpt
denotes the accumulated record of runningprog1
on many different instances of the input file.It is easy to see that the preceding commands are regular expressions. There are other UNIX operators that also allow different functions to occur on the command line, which will be discussed in Section 6 of these notes.
This concludes our overview of basic UNIX control structures. We next discuss the software development process with a UNIX operating system.
References