Unix Shells

Ksh93 and Bash Shells for Unix System Administrators

“The lyfe so short, the craft so long to lerne,”

This collection of links is oriented on students (initially it was provided as a reference material to my shell programming university course) and is designed to emphasize usage of  advanced shell constructs and pipes in shell programming (mainly in the context of ksh93 and bash 3.2+ which have good support for those constructs).  An introductory paper  Slightly Skeptical View on Shell discusses the shell as a scripting language and as one of the earliest examples of very high level languages.  The page might also be useful for system administrators who constitute the considerable percentage of  shell users and lion part of shell programmers.

This page is the main page to a set of sub-pages devoted to shell that collectively are known as Shellorama. The most important are:

  • Shell Papers,E-books and Tutorials
  • Language page – describes some of the exotic shell contracts and provides links to web resources about them.  Bash 3.x added several useful extensions. Among them (Bash Reference Manual ):
    • Arithmetic expansion allows the evaluation of an arithmetic expression and the substitution of the result. The format for arithmetic expansion is:
           $(( expression ))

      Older (( … )) construct borrowed from ksh93 is also supported.

    • Process substitution. It takes the form of  <(list) or   >(list). See Process substitution
    • =~ operator. An additional binary operator, ‘=~’, is available, with the same precedence as ‘==’ and ‘!=’. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex3)). The return value is 0 if the string matches the pattern, and 1 otherwise. If the regular expression is syntactically incorrect, the conditional expression’s return value is 2. If the shell option nocasematch (see the description of shopt in Bash Builtins) is enabled, the match is performed without regard to the case of alphabetic characters. Substrings matched by parenthesized sub-expressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized sub-expression.
    • C-style for loop. Bash implements the for (( expr1 ; expr2 ; expr3 )) loop, similar to the C language (see Looping Constructs).
    • Brace expansion — a mechanism for generation of a sequence of similar strings (see also Filename Expansion), but the file names generated need not exist. Patterns to be brace expanded take the form of an optional preamble, followed by either a series of comma-separated strings or a sequnce expression between a pair of braces, followed by an optional postscript. Brace expansions may be nested.  For example,
           bash$ echo a{d,c,b}e
           ade ace abe
    • Tilde notation (csh an tcsh feature). Content of DIRSTACK array can be referenced positionally:
      • ~N  — The string that would be displayed by ‘dirs +N
      • ~+N — The string that would be displayed by ‘dirs +N
      •  ~-N  — The string that would be displayed by ‘dirs –N

      which is rather convenient for implementing “directory favorites” concept with “push/pop/dirs” troika

  • Debugging Current bash has the best debugger and from this point of view represents the best shell.  Until  Actually for ksh93 absence of the debugger is more a weakness, it is a blunder and it is strange that such talented person David Korn did not realize this.
  • Command history reuse which is devoted to one of the most important command line features
  • Dotfiles
  • Command completion
  • pushd/pops/dirs troika
  • Advanced Unix filesystem navigation
  • History Substitution
  • Functions in Shell
  • BASH Debugging
  • Vi editing mode
  • Bash Built-in Variables
  • Arithmetic Expressions in BASH
  • bash Control Structures
  • Shell Input and Output Redirection
  • Readline and inputrc

I strongly recommend getting a so-called orthodox file manager (OFM). This tool can immensely simplify Unix filesystem navigation and file operations (Midnight Commander  while defective in handling command line can be tried first as this is an active project and it provides fpt and sftp virtual filesystem in remote hosts)

Actually filesystem navigation in shell is an area of great concern as there are several serious problems with the current tools for Unix filesystem navigation.  I would say that usage of cd command (the most common method) is conceptually broken and deprives people from the full understanding of Unix filesystem; I doubt that it can be fixed within the shell paradigm (C-shell made an attempt to compensate for this deficiency by introducing history and popd/pushd/dirs troika, but this proved to be neither necessary nor sufficient for compensating problems with the in-depth understanding of the classical Unix hierarchical filesystem  inherent in purely command line navigation ;-). Paradoxically sysadmins who use OFMs usually have much better understanding of the power and flexibility of the Unix filesystem then people who use command line.   All-in-all usage of OFM is system administration represents Eastern European school of administration and it might be a better way to administer system that a typical “North American Way”.

The second indispensable tool for shell programmer is Expect. This is a very flexible application that can be used for automation of interactive sessions as well as automation of testing of applications.

Usually people who know shell and awk and/or Perl well are usually considered to be advanced Unix system administrators (this is another way to say the system administrators who does not know shall/awk/Perl troika well are essentially a various flavors of entry-level system administrators no matter how many years of experience they have). I would argue that no system administrator can consider himself to be a senior Unix system administrator without in-depth knowledge of both one of the OFMs and Expect.

 

No system administrator can consider himself to be a senior Unix system administrator without in-depth knowledge of both one of the OFMs and Expect.

An OFM tends to educate the user about the Unix filesystem in some subtle, but definitely  psychologically superior way. Widespread use of OFMs in Europe, especially in Germany and Eastern Europe, tend to produce specialists with substantially greater skills at handling Unix (and Windows) file systems than users that only have experience with a more primitive command line based navigational tools.

And yes, cd navigation is conceptually broken. This is not a bizarre opinion of the author,  this is a fact:  when you do not even suspect that a particular part of the tree exists something is conceptually broken.  People using command line know only fragments of the file system structure like blinds know only the parts of the elephant.  Current Unix file system with, say,  13K directories for a regular Solaris installation, are just unsuitable for the “cd way of navigation”; 1K directories was probably OK. But when there are over 10K of directories you need something else. Here quantity turns into quality. That’s my point.

The page provides rather long quotes as web pages as web pages are notoriously unreliable medium and can disappear without trace.  That makes this page somewhat difficult to browse, but it’s not designed for browsing; it’s designed as a supplementary material to the university shell course and for self-education.

 

Note: A highly recommended shell site is SHELLdorado by Heiner Steven.
This is a really excellent site with the good coding practice section,
some interesting example scripts and tips and tricks

A complementary page with Best Shell Books Reviews is also available. Although the best book selection is to a certain extent individual, the selection of a bad book is not: so this page might at least help you to avoid most common bad books (often the book recommended by a particular university are either weak or boring or both;  Unix Shell by Example is one such example ;-). Still the shell literature is substantial (over a hundred of books) and that mean that you can find a suitable textbook. Please be aware of the fact that that few authors of shell programming books have a broad understanding of Unix necessary for writing a comprehensive shell book.

IMHO the first edition of O’Reilly Learning Korn Shell is probably one of the best and contains nice set of  examples (the second edition is more up to date but generally is weaker). Also the first edition has advantage of being available in HTML form too (O’Reilly Unix CD). It does not cover ksh93 but it presents ksh in a unique way that no other book does. Some useful examples can also be found in UNIX Power Tools Book( see Archive of all shell scripts (684 KB); the book is available in HTML from one of O’Reilly CD bookshelf collections).

Still one needs to understand that Unix shells are pretty archaic languages which were designed with compatibility with dinosaur shells in mind (and Borne is a dinosaur shell by any definition). Designers even such strong designers as David Korn were hampered by compatibility problems from the very beginning (in a way it is amazing how much ingenuity they demonstrate in enhancing  Borne shell; I am really amazed how David Korn managed to extend borne shell into something much more usable and much loser to “normal” scripting language. In this sense ksh93 stands like a real pinnacle of shell compatibility and the the testament of the art of shell language extension).

That means that outside of interactive usage and small one page scripts they generally outlived their usefulness. That’s why for more or less complex tasks Perl is usually used (and should be used) instead of shells. While shells continued to improve since the original C-shell and Korn shell, the shell syntax is frozen in space and time and now looks completely archaic.  There are a large number of problems with this syntax as it does not cleanly separate lexical analysis from syntax analysis.  Bash 3.2 actually made some progress of overcoming most archaic features of old shells but still it has it own share of warts (for example last stage of the pipe does not run in on the same level as encompassing the pipe script)

Some syntax features in shell are idiosyncratic as Steve Bourne played with Algol 68 before starting work on the shell. In a way, he proved to be the most influential bad language designer, the designer who has the most lasting influence on Unix environment (that does not exonerate the subsequent designers which probably can take a more aggressive stance on the elimination of initial shell design blunders by marking them as “legacy”).

For example there is very little logic in how  different types of blocks are delimitated in shell scripts. Conditional statements end with (broken) classic Algor-68 the reverse keyword syntax: 'if condition; then echo yes; else echo no; fi', but loops are structured like perverted version of PL/1 (loop prefix do; … done;) , individual case branches blocks ends with  ‘;;’ . Functions have C-style bracketing “{“, “}”.  M. D. McIlroy as Steve Borne manager should be ashamed. After all at this time the level of compiler construction knowledge was pretty sufficient to avoid such blunders (David Gries book was published in 1971) and Bell Labs staff were not a bunch of enthusiasts ;-).

Also the original Bourne shell was a almost pure macro language. It performed variable substitution, tokenization and other operations on one line at a time without understanding the underlying syntax. This results in many unexpected side effects: Consider a simple command

rm $file

If variable $file is accidentally contains space that will lead to treating it as two separate augments to the rm command with possible nasty side effects.  To fix this, the user has to make sure every use of a variable in enclosed in quotes, like in rm "$file".

Variable assignments in Bourne shell are whitespace sensitive. 'foo=bar' is an assignment, but 'foo = bar' is not. It is a function call with "= "and "bar" as two arguments. This is another strange idiosyncrasy.

There is also an overlap between aliases and functions. Aliases are positional macros that are recognized only as the first word of the command like in classic  alias ll='ls -l'.  Because of this, aliases have several limitations:

  • You can only redirect input/output to the last command in the alias.
  • You can only specify arguments to the last command in the alias.
  • Alias definitions are a single text string, this means complex functions are nearly impossible to create.

Functions are not positional and can in most cases can emulate aliases functionality:

ll() { ls -l $*; }

The curly brackets are some sort of pseudo-commands, so skipping the semicolon in the example above results in a syntax error. As there is no clean separation between lexical analysis and syntax analysis  removing the whitespace between the opening bracket and ‘ls’ will also result in a syntax error.

Since the use of variables as commands is allowed, it is impossible to reliably check the syntax of a script as substitution can accidentally result in key word as in example that I found in the paper about fish (not that I like or recommend fish):

if true; then if [ $RANDOM -lt 1024 ]; then END=fi; else END=true; fi; $END

Both bash and zsh try to determine if the command in the current buffer is finished when the user presses the return key, but because of issues like this, they will sometimes fail.

Dr. Nikolai Bezroukov

Leave a Comment