Go to the first, previous, next, last section, table of contents.


Process Startup and Termination

Processes are the primitive units for allocation of system resources. Each process has its own address space and (usually) one thread of control. A process executes a program; you can have multiple processes executing the same program, but each process has its own copy of the program within its own address space and executes it independently of the other copies.

This chapter explains what your program should do to handle the startup of a process, to terminate its process, and to receive information (arguments and the environment) from the parent process.

Program Arguments

The system starts a C program by calling the function main. It is up to you to write a function named main---otherwise, you won't even be able to link your program without errors.

In ISO C you can define main either to take no arguments, or to take two arguments that represent the command line arguments to the program, like this:

int main (int argc, char *argv[])

The command line arguments are the whitespace-separated tokens given in the shell command used to invoke the program; thus, in `cat foo bar', the arguments are `foo' and `bar'. The only way a program can look at its command line arguments is via the arguments of main. If main doesn't take arguments, then you cannot get at the command line.

The value of the argc argument is the number of command line arguments. The argv argument is a vector of C strings; its elements are the individual command line argument strings. The file name of the program being run is also included in the vector as the first element; the value of argc counts this element. A null pointer always follows the last element: argv[argc] is this null pointer.

For the command `cat foo bar', argc is 3 and argv has three elements, "cat", "foo" and "bar".

If the syntax for the command line arguments to your program is simple enough, you can simply pick the arguments off from argv by hand. But unless your program takes a fixed number of arguments, or all of the arguments are interpreted in the same way (as file names, for example), you are usually better off using getopt to do the parsing.

In Unix systems you can define main a third way, using three arguments:

int main (int argc, char *argv[], char *envp)

The first two arguments are just the same. The third argument envp gives the process's environment; it is the same as the value of environ. See section Environment Variables. POSIX.1 does not allow this three-argument form, so to be portable it is best to write main to take two arguments, and use the value of environ.

Program Argument Syntax Conventions

POSIX recommends these conventions for command line arguments. getopt (see section Parsing Program Options) makes it easy to implement them.

GNU adds long options to these conventions. Long options consist of `--' followed by a name made of alphanumeric characters and dashes. Option names are typically one to three words long, with hyphens to separate words. Users can abbreviate the option names as long as the abbreviations are unique.

To specify an argument for a long option, write `--name=value'. This syntax enables a long option to accept an argument that is itself optional.

Eventually, the GNU system will provide completion for long option names in the shell.

Parsing Program Options

Here are the details about how to call the getopt function. To use this facility, your program must include the header file `unistd.h'.

Variable: int opterr
If the value of this variable is nonzero, then getopt prints an error message to the standard error stream if it encounters an unknown option character or an option with a missing required argument. This is the default behavior. If you set this variable to zero, getopt does not print any messages, but it still returns the character ? to indicate an error.

Variable: int optopt
When getopt encounters an unknown option character or an option with a missing required argument, it stores that option character in this variable. You can use this for providing your own diagnostic messages.

Variable: int optind
This variable is set by getopt to the index of the next element of the argv array to be processed. Once getopt has found all of the option arguments, you can use this variable to determine where the remaining non-option arguments begin. The initial value of this variable is 1.

Variable: char * optarg
This variable is set by getopt to point at the value of the option argument, for those options that accept arguments.

Function: int getopt (int argc, char **argv, const char *options)
The getopt function gets the next option argument from the argument list specified by the argv and argc arguments. Normally these values come directly from the arguments received by main.

The options argument is a string that specifies the option characters that are valid for this program. An option character in this string can be followed by a colon (`:') to indicate that it takes a required argument.

If the options argument string begins with a hyphen (`-'), this is treated specially. It permits arguments that are not options to be returned as if they were associated with option character `\0'.

The getopt function returns the option character for the next command line option. When no more option arguments are available, it returns -1. There may still be more non-option arguments; you must compare the external variable optind against the argc parameter to check this.

If the option has an argument, getopt returns the argument by storing it in the variable optarg. You don't ordinarily need to copy the optarg string, since it is a pointer into the original argv array, not into a static area that might be overwritten.

If getopt finds an option character in argv that was not included in options, or a missing option argument, it returns `?' and sets the external variable optopt to the actual option character. If the first character of options is a colon (`:'), then getopt returns `:' instead of `?' to indicate a missing option argument. In addition, if the external variable opterr is nonzero (which is the default), getopt prints an error message.

Example of Parsing Arguments with getopt

Here is an example showing how getopt is typically used. The key points to notice are:

#include <unistd.h>
#include <stdio.h>

int 
main (int argc, char **argv)
{
  int aflag = 0;
  int bflag = 0;
  char *cvalue = NULL;
  int index;
  int c;

  opterr = 0;

  while ((c = getopt (argc, argv, "abc:")) != -1)
    switch (c)
      {
      case 'a':
        aflag = 1;
        break;
      case 'b':
        bflag = 1;
        break;
      case 'c':
        cvalue = optarg;
        break;
      case '?':
        if (isprint (optopt))
          fprintf (stderr, "Unknown option `-%c'.\n", optopt);
        else
          fprintf (stderr,
                   "Unknown option character `\\x%x'.\n",
                   optopt);
        return 1;
      default:
        abort ();
      }

  printf ("aflag = %d, bflag = %d, cvalue = %s\n", aflag, bflag, cvalue);

  for (index = optind; index < argc; index++)
    printf ("Non-option argument %s\n", argv[index]);
  return 0;
}

Here are some examples showing what this program prints with different combinations of arguments:

% testopt
aflag = 0, bflag = 0, cvalue = (null)

% testopt -a -b
aflag = 1, bflag = 1, cvalue = (null)

% testopt -ab
aflag = 1, bflag = 1, cvalue = (null)

% testopt -c foo
aflag = 0, bflag = 0, cvalue = foo

% testopt -cfoo
aflag = 0, bflag = 0, cvalue = foo

% testopt arg1
aflag = 0, bflag = 0, cvalue = (null)
Non-option argument arg1

% testopt -a arg1
aflag = 1, bflag = 0, cvalue = (null)
Non-option argument arg1

% testopt -c foo arg1
aflag = 0, bflag = 0, cvalue = foo
Non-option argument arg1

% testopt -a -- -b
aflag = 1, bflag = 0, cvalue = (null)
Non-option argument -b

% testopt -a -
aflag = 1, bflag = 0, cvalue = (null)
Non-option argument -

Parsing Long Options

To accept GNU-style long options as well as single-character options, use getopt_long instead of getopt. This function is declared in `getopt.h', not `unistd.h'. You should make every program accept long options if it uses any options, for this takes little extra work and helps beginners remember how to use the program.

Data Type: struct option
This structure describes a single long option name for the sake of getopt_long. The argument longopts must be an array of these structures, one for each long option. Terminate the array with an element containing all zeros.

The struct option structure has these fields:

const char *name
This field is the name of the option. It is a string.
int has_arg
This field says whether the option takes an argument. It is an integer, and there are three legitimate values: no_argument, required_argument and optional_argument.
int *flag
int val
These fields control how to report or act on the option when it occurs. If flag is a null pointer, then the val is a value which identifies this option. Often these values are chosen to uniquely identify particular long options. If flag is not a null pointer, it should be the address of an int variable which is the flag for this option. The value in val is the value to store in the flag to indicate that the option was seen.

Function: int getopt_long (int argc, char **argv, const char *shortopts, struct option *longopts, int *indexptr)
Decode options from the vector argv (whose length is argc). The argument shortopts describes the short options to accept, just as it does in getopt. The argument longopts describes the long options to accept (see above).

When getopt_long encounters a short option, it does the same thing that getopt would do: it returns the character code for the option, and stores the options argument (if it has one) in optarg.

When getopt_long encounters a long option, it takes actions based on the flag and val fields of the definition of that option.

If flag is a null pointer, then getopt_long returns the contents of val to indicate which option it found. You should arrange distinct values in the val field for options with different meanings, so you can decode these values after getopt_long returns. If the long option is equivalent to a short option, you can use the short option's character code in val.

If flag is not a null pointer, that means this option should just set a flag in the program. The flag is a variable of type int that you define. Put the address of the flag in the flag field. Put in the val field the value you would like this option to store in the flag. In this case, getopt_long returns 0.

For any long option, getopt_long tells you the index in the array longopts of the options definition, by storing it into *indexptr. You can get the name of the option with longopts[*indexptr].name. So you can distinguish among long options either by the values in their val fields or by their indices. You can also distinguish in this way among long options that set flags.

When a long option has an argument, getopt_long puts the argument value in the variable optarg before returning. When the option has no argument, the value in optarg is a null pointer. This is how you can tell whether an optional argument was supplied.

When getopt_long has no more options to handle, it returns -1, and leaves in the variable optind the index in argv of the next remaining argument.

Example of Parsing Long Options

#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

/* Flag set by `--verbose'. */
static int verbose_flag;

int
main (argc, argv)
     int argc;
     char **argv;
{
  int c;

  while (1)
    {
      static struct option long_options[] =
        {
          /* These options set a flag. */
          {"verbose", 0, &verbose_flag, 1},
          {"brief", 0, &verbose_flag, 0},
          /* These options don't set a flag.
             We distinguish them by their indices. */
          {"add", 1, 0, 0},
          {"append", 0, 0, 0},
          {"delete", 1, 0, 0},
          {"create", 0, 0, 0},
          {"file", 1, 0, 0},
          {0, 0, 0, 0}
        };
      /* getopt_long stores the option index here. */
      int option_index = 0;

      c = getopt_long (argc, argv, "abc:d:",
                       long_options, &option_index);

      /* Detect the end of the options. */
      if (c == -1)
        break;

      switch (c)
        {
        case 0:
          /* If this option set a flag, do nothing else now. */
          if (long_options[option_index].flag != 0)
            break;
          printf ("option %s", long_options[option_index].name);
          if (optarg)
            printf (" with arg %s", optarg);
          printf ("\n");
          break;

        case 'a':
          puts ("option -a\n");
          break;

        case 'b':
          puts ("option -b\n");
          break;

        case 'c':
          printf ("option -c with value `%s'\n", optarg);
          break;

        case 'd':
          printf ("option -d with value `%s'\n", optarg);
          break;

        case '?':
          /* getopt_long already printed an error message. */
          break;

        default:
          abort ();
        }
    }

  /* Instead of reporting `--verbose'
     and `--brief' as they are encountered,
     we report the final status resulting from them. */
  if (verbose_flag)
    puts ("verbose flag is set");

  /* Print any remaining command line arguments (not options). */
  if (optind < argc)
    {
      printf ("non-option ARGV-elements: ");
      while (optind < argc)
        printf ("%s ", argv[optind++]);
      putchar ('\n');
    }

  exit (0);
}

Parsing of Suboptions

Having a single level of options is sometimes not enough. There might be too many options which have to be available or a set of options is closely related.

For this case some programs use suboptions. One of the most prominent programs is certainly mount(8). The -o option take one argument which itself is a comma separated list of options. To ease the programming of code like this the function getsubopt is available.

Function: int getsubopt (char **optionp, const char* const *tokens, char **valuep)

The optionp parameter must be a pointer to a variable containing the address of the string to process. When the function returns the reference is updated to point to the next suboption or to the terminating `\0' character if there is no more suboption available.

The tokens parameter references an array of strings containing the known suboptions. All strings must be `\0' terminated and to mark the end a null pointer must be stored. When getsubopt finds a possible legal suboption it compares it with all strings available in the tokens array and returns the index in the string as the indicator.

In case the suboption has an associated value introduced by a `=' character, a pointer to the value is returned in valuep. The string is `\0' terminated. If no argument is available valuep is set to the null pointer. By doing this the caller can check whether a necessary value is given or whether no unexpected value is present.

In case the next suboption in the string is not mentioned in the tokens array the starting address of the suboption including a possible value is returned in valuep and the return value of the function is `-1'.

Parsing of Suboptions Example

The code which might appear in the mount(8) program is a perfect example of the use of getsubopt:

#include <stdio.h>
#include <stdlib.h>

int do_all;
const char *type;
int read_size;
int write_size;
int read_only;

enum
{
  RO_OPTION = 0,
  RW_OPTION,
  READ_SIZE_OPTION,
  WRITE_SIZE_OPTION
};

const char *mount_opts[] =
{
  [RO_OPTION] = "ro",
  [RW_OPTION] = "rw",
  [READ_SIZE_OPTION] = "rsize",
  [WRITE_SIZE_OPTION] = "wsize"
};

int
main (int argc, char *argv[])
{
  char *subopts, *value;
  int opt;

  while ((opt = getopt (argc, argv, "at:o:")) != -1)
    switch (opt)
      {
      case 'a':
        do_all = 1;
        break;
      case 't':
        type = optarg;
        break;
      case 'o':
        subopts = optarg;
        while (*subopts != '\0')
          switch (getsubopt (&subopts, mount_opts, &value))
            {
            case RO_OPTION:
              read_only = 1;
              break;
            case RW_OPTION:
              read_only = 0;
              break;
            case READ_SIZE_OPTION:
              if (value == NULL)
                abort ();
              read_size = atoi (value);
              break;
            case WRITE_SIZE_OPTION:
              if (value == NULL)
                abort ();
              write_size = atoi (value);
              break;
            default:
              /* Unknown suboption. */
              printf ("Unknown suboption `%s'\n", value);
              break;
            }
        break;
      default:
        abort ();
      }

  /* Do the real work. */

  return 0;
}

Environment Variables

When a program is executed, it receives information about the context in which it was invoked in two ways. The first mechanism uses the argv and argc arguments to its main function, and is discussed in section Program Arguments. The second mechanism uses environment variables and is discussed in this section.

The argv mechanism is typically used to pass command-line arguments specific to the particular program being invoked. The environment, on the other hand, keeps track of information that is shared by many programs, changes infrequently, and that is less frequently used.

The environment variables discussed in this section are the same environment variables that you set using assignments and the export command in the shell. Programs executed from the shell inherit all of the environment variables from the shell.

Standard environment variables are used for information about the user's home directory, terminal type, current locale, and so on; you can define additional variables for other purposes. The set of all environment variables that have values is collectively known as the environment.

Names of environment variables are case-sensitive and must not contain the character `='. System-defined environment variables are invariably uppercase.

The values of environment variables can be anything that can be represented as a string. A value must not contain an embedded null character, since this is assumed to terminate the string.

Environment Access

The value of an environment variable can be accessed with the getenv function. This is declared in the header file `stdlib.h'.

Function: char * getenv (const char *name)
This function returns a string that is the value of the environment variable name. You must not modify this string. In some non-Unix systems not using the GNU library, it might be overwritten by subsequent calls to getenv (but not by any other library function). If the environment variable name is not defined, the value is a null pointer.

Function: int putenv (const char *string)
The putenv function adds or removes definitions from the environment. If the string is of the form `name=value', the definition is added to the environment. Otherwise, the string is interpreted as the name of an environment variable, and any definition for this variable in the environment is removed.

The GNU library provides this function for compatibility with SVID; it may not be available in other systems.

You can deal directly with the underlying representation of environment objects to add more variables to the environment (for example, to communicate with another program you are about to execute; see section Executing a File).

Variable: char ** environ
The environment is represented as an array of strings. Each string is of the format `name=value'. The order in which strings appear in the environment is not significant, but the same name must not appear more than once. The last element of the array is a null pointer.

This variable is declared in the header file `unistd.h'.

If you just want to get the value of an environment variable, use getenv.

Unix systems, and the GNU system, pass the initial value of environ as the third argument to main. See section Program Arguments.

Standard Environment Variables

These environment variables have standard meanings. This doesn't mean that they are always present in the environment; but if these variables are present, they have these meanings. You shouldn't try to use these environment variable names for some other purpose.

HOME
This is a string representing the user's home directory, or initial default working directory. The user can set HOME to any value. If you need to make sure to obtain the proper home directory for a particular user, you should not use HOME; instead, look up the user's name in the user database (see section User Database). For most purposes, it is better to use HOME, precisely because this lets the user specify the value.
LOGNAME
This is the name that the user used to log in. Since the value in the environment can be tweaked arbitrarily, this is not a reliable way to identify the user who is running a process; a function like getlogin (see section Identifying Who Logged In) is better for that purpose. For most purposes, it is better to use LOGNAME, precisely because this lets the user specify the value.
PATH
A path is a sequence of directory names which is used for searching for a file. The variable PATH holds a path used for searching for programs to be run. The execlp and execvp functions (see section Executing a File) use this environment variable, as do many shells and other utilities which are implemented in terms of those functions. The syntax of a path is a sequence of directory names separated by colons. An empty string instead of a directory name stands for the current directory (see section Working Directory). A typical value for this environment variable might be a string like:
:/bin:/etc:/usr/bin:/usr/new/X11:/usr/new:/usr/local/bin
This means that if the user tries to execute a program named foo, the system will look for files named `foo', `/bin/foo', `/etc/foo', and so on. The first of these files that exists is the one that is executed.
TERM
This specifies the kind of terminal that is receiving program output. Some programs can make use of this information to take advantage of special escape sequences or terminal modes supported by particular kinds of terminals. Many programs which use the termcap library (see section `Finding a Terminal Description' in The Termcap Library Manual) use the TERM environment variable, for example.
TZ
This specifies the time zone. See section Specifying the Time Zone with TZ, for information about the format of this string and how it is used.
LANG
This specifies the default locale to use for attribute categories where neither LC_ALL nor the specific environment variable for that category is set. See section Locales and Internationalization, for more information about locales.
LC_COLLATE
This specifies what locale to use for string sorting.
LC_CTYPE
This specifies what locale to use for character sets and character classification.
LC_MONETARY
This specifies what locale to use for formatting monetary values.
LC_NUMERIC
This specifies what locale to use for formatting numbers.
LC_TIME
This specifies what locale to use for formatting date/time values.
_POSIX_OPTION_ORDER
If this environment variable is defined, it suppresses the usual reordering of command line arguments by getopt. See section Program Argument Syntax Conventions.

Program Termination

The usual way for a program to terminate is simply for its main function to return. The exit status value returned from the main function is used to report information back to the process's parent process or shell.

A program can also terminate normally by calling the exit function.

In addition, programs can be terminated by signals; this is discussed in more detail in section Signal Handling. The abort function causes a signal that kills the program.

Normal Termination

A process terminates normally when the program calls exit. Returning from main is equivalent to calling exit, and the value that main returns is used as the argument to exit.

Function: void exit (int status)
The exit function terminates the process with status status. This function does not return.

Normal termination causes the following actions:

  1. Functions that were registered with the atexit or on_exit functions are called in the reverse order of their registration. This mechanism allows your application to specify its own "cleanup" actions to be performed at program termination. Typically, this is used to do things like saving program state information in a file, or unlocking locks in shared data bases.
  2. All open streams are closed, writing out any buffered output data. See section Closing Streams. In addition, temporary files opened with the tmpfile function are removed; see section Temporary Files.
  3. _exit is called, terminating the program. See section Termination Internals.

Exit Status

When a program exits, it can return to the parent process a small amount of information about the cause of termination, using the exit status. This is a value between 0 and 255 that the exiting process passes as an argument to exit.

Normally you should use the exit status to report very broad information about success or failure. You can't provide a lot of detail about the reasons for the failure, and most parent processes would not want much detail anyway.

There are conventions for what sorts of status values certain programs should return. The most common convention is simply 0 for success and 1 for failure. Programs that perform comparison use a different convention: they use status 1 to indicate a mismatch, and status 2 to indicate an inability to compare. Your program should follow an existing convention if an existing convention makes sense for it.

A general convention reserves status values 128 and up for special purposes. In particular, the value 128 is used to indicate failure to execute another program in a subprocess. This convention is not universally obeyed, but it is a good idea to follow it in your programs.

Warning: Don't try to use the number of errors as the exit status. This is actually not very useful; a parent process would generally not care how many errors occurred. Worse than that, it does not work, because the status value is truncated to eight bits. Thus, if the program tried to report 256 errors, the parent would receive a report of 0 errors--that is, success.

For the same reason, it does not work to use the value of errno as the exit status--these can exceed 255.

Portability note: Some non-POSIX systems use different conventions for exit status values. For greater portability, you can use the macros EXIT_SUCCESS and EXIT_FAILURE for the conventional status value for success and failure, respectively. They are declared in the file `stdlib.h'.

Macro: int EXIT_SUCCESS
This macro can be used with the exit function to indicate successful program completion.

On POSIX systems, the value of this macro is 0. On other systems, the value might be some other (possibly non-constant) integer expression.

Macro: int EXIT_FAILURE
This macro can be used with the exit function to indicate unsuccessful program completion in a general sense.

On POSIX systems, the value of this macro is 1. On other systems, the value might be some other (possibly non-constant) integer expression. Other nonzero status values also indicate failures. Certain programs use different nonzero status values to indicate particular kinds of "non-success". For example, diff uses status value 1 to mean that the files are different, and 2 or more to mean that there was difficulty in opening the files.

Cleanups on Exit

Your program can arrange to run its own cleanup functions if normal termination happens. If you are writing a library for use in various application programs, then it is unreliable to insist that all applications call the library's cleanup functions explicitly before exiting. It is much more robust to make the cleanup invisible to the application, by setting up a cleanup function in the library itself using atexit or on_exit.

Function: int atexit (void (*function) (void))
The atexit function registers the function function to be called at normal program termination. The function is called with no arguments.

The return value from atexit is zero on success and nonzero if the function cannot be registered.

Function: int on_exit (void (*function)(int status, void *arg), void *arg)
This function is a somewhat more powerful variant of atexit. It accepts two arguments, a function function and an arbitrary pointer arg. At normal program termination, the function is called with two arguments: the status value passed to exit, and the arg.

This function is included in the GNU C library only for compatibility for SunOS, and may not be supported by other implementations.

Here's a trivial program that illustrates the use of exit and atexit:

#include <stdio.h>
#include <stdlib.h>

void 
bye (void)
{
  puts ("Goodbye, cruel world....");
}

int
main (void)
{
  atexit (bye);
  exit (EXIT_SUCCESS);
}

When this program is executed, it just prints the message and exits.

Aborting a Program

You can abort your program using the abort function. The prototype for this function is in `stdlib.h'.

Function: void abort (void)
The abort function causes abnormal program termination. This does not execute cleanup functions registered with atexit or on_exit.

This function actually terminates the process by raising a SIGABRT signal, and your program can include a handler to intercept this signal; see section Signal Handling.

Future Change Warning: Proposed Federal censorship regulations may prohibit us from giving you information about the possibility of calling this function. We would be required to say that this is not an acceptable way of terminating a program.

Termination Internals

The _exit function is the primitive used for process termination by exit. It is declared in the header file `unistd.h'.

Function: void _exit (int status)
The _exit function is the primitive for causing a process to terminate with status status. Calling this function does not execute cleanup functions registered with atexit or on_exit.

When a process terminates for any reason--either by an explicit termination call, or termination as a result of a signal--the following things happen:


Go to the first, previous, next, last section, table of contents.