Boost Filesystem Library

This Document
    Introduction
    Two-minute tutorial
    Examples
    Definitions
    Common Specifications
    Race-condition danger
    Implementation
    Building the object-library
      Notes for Cygwin users
    Acknowledgements
    Change history
Other Documents
    Library Design
    FAQ
    Portability Guide
    path.hpp documentation
    operations.hpp documentation
    fstream.hpp documentation
    exception.hpp documentation
    convenience.hpp documentation
    Do-list

Introduction

The Boost Filesystem Library provides portable facilities to query and manipulate paths, files, and directories.

The motivation for the library is the need to be able to perform portable script-like operations from within C++ programs. The intent is not to compete with Python, Perl, or shell languages, but rather to provide portable filesystem operations when C++ is already the language of choice. The design encourages, but does not require, safe and portable filesystem usage.

The Filesystem Library supplies several  headers, all in directory boost/filesystem:

The organizing principle is that purely lexical operations on paths are supplied as class path member functions in path.hpp, while operations performed by the operating system on the actual external filesystem directories and files are provided in operations.hpp, primarily as free functions.

Two-minute tutorial

First some preliminaries:

#include "boost/filesystem/operations.hpp" // includes boost/filesystem/path.hpp
#include "boost/filesystem/fstream.hpp"    // ditto
#include <iostream>                        // for std::cout
using boost::filesystem;                   // for ease of tutorial presentation;
                                           // a namespace alias is preferred in real code

A class path object can be created:

path my_path( "some_dir/file.txt" );

The string passed to the path constructor is in a portable generic path format. Access functions make my_path contents available in an operating system dependent format, such as "some_dir:file.txt", "[some_dir]file.txt", "some_dir/file.txt", or whatever is appropriate for the operating system.

Class path has conversion constructors from const char* and const std:: string&, so that even though the Filesystem Library functions in the following code snippet take const path& arguments, the user can just code C-style strings:

remove_all( "foobar" );
create_directory( "foobar" );
ofstream file( "foobar/cheeze" );
file << "tastes good!\n";
file.close();
if ( !exists( "foobar/cheeze" ) )
  std::cout << "Something is rotten in foobar\n";

Additional class path constructors provide for an operating system dependent format, useful for user provided input:

int main( int argc, char * argv[] ) {
path arg_path( argv[1], native ); // native means use O/S path format

To make class path objects easy to use in expressions, operator/ appends paths:

ifstream file1( arg_path / "foo/bar" );
ifstream file2( arg_path / "foo" / "bar" );

The expressions arg_path / "foo/bar" and arg_path / "foo" / "bar" yield identical results.

Paths can include references to the parent directory, using the ".." notation. Paths are always automatically converted to canonical form, so "foo/../bar" gets converted to "bar", and "../foo/../bar" gets converted to "../bar".

Class directory_iterator is an important component of the library. It provides input iterators over the contents of a directory, with the value type being class path.

The following function, given a directory path and a file name, recursively searches the directory and its sub-directories for the file name, returning a bool, and if successful, the path to the file that was found.  The code below is extracted from a real program, slightly modified for clarity:

bool find_file( const path & dir_path,     // in this directory,
                const std::string & file_name, // search for this name,
                path & path_found )        // placing path here if found
{
  if ( !exists( dir_path ) ) return false;
  directory_iterator end_itr; // default construction yields past-the-end
  for ( directory_iterator itr( dir_path );
        itr != end_itr;
        ++itr )
  {
    if ( is_directory( *itr ) )
    {
      if ( find_file( *itr, file_name, path_found ) ) return true;
    }
    else if ( itr->leaf() == file_name ) // see below
    {
      path_found = *itr;
      return true;
    }
  }
  return false;
}

The expression itr->leaf() == file_name, in the line commented // see below, calls the leaf() function on the path object returned by the iterator. leaf() returns a string which is a copy of the last (closest to the leaf, farthest from the root) file or directory name in the path object.

In addition to leaf(), several other function names use the tree/root/branch/leaf metaphor.

Notice that find_file() does not do explicit error checking, such as verifying that the dir_path argument really represents a directory. All Filesystem Library functions throw filesystem_error exceptions if they do not complete successfully, so there is enough implicit error checking that this application doesn't need to supply additional error checking code.

The tutorial is now over; hopefully you now are ready to write simple, script-like, programs using the Filesystem Library!

Examples

simple_ls.cpp

The example program simple_ls.cpp is given a path as a command line argument. Since the command line argument may be a relative path, the complete path is determined so that messages displayed can be more precise.

The program checks to see if the path exists; if not a message is printed.

If the path identifies a directory, the directory is iterated through, printing the name of the entries found, and an indication if they are directories. A count of directories and files is updated, and then printed after the iteration is complete.

If the path is for a file, a message indicating that is printed.

Try compiling and executing simple_ls.cpp to see how it works on your system. Try various path arguments to see what happens.

Other examples

The programs used to generate the Boost regression test status tables use the Filesystem Library extensively.  See:

Test programs are sometimes useful in understanding a library, as they illustrate what the developer expected to work and not work. See:

Definitions

directory - A container provided by the operating system, containing the names of files, other directories, or both. Directories are identified by directory path.

directory tree - A directory and file hierarchy viewed as an acyclic graph. The tree metaphor is reflected in the root/branch/leaf naming convention for many path related functions.

path - A possibly empty sequence of names. Each element in the sequence, except the last, names a directory which contains the next element. The last element may name either a directory or file. The first element is closest to the root of the directory tree, the last element is farthest from the root.

It is traditional to represent a path as a string, where each element in the path is represented by a name, and some operating system defined syntax distinguishes between the name elements. Other representations of a path are possible, such as each name being an element in a std::vector<std::string>.

file path - A path whose last element is a file.

directory path - A path whose last element is a directory.

complete path - A path which contains all the elements required by the operating system to uniquely identify a file or directory. (There is an odd corner case where a complete path can still be ambiguous on a few operating systems.)

relative path - A path which is not complete. Before actual use, relative paths are turned into complete paths either implicitly by the filesystem adding default elements, or explicitly by the program adding the missing elements via function call. Use of relative paths often makes a program much more flexible.

name - A file or directory name, without any directory path information to indicate the file or directory's actual location within a directory tree. For some operating systems, files and directories may have more than one valid name, such as a short-form name and a long-form name.

root - The initial node in the acyclic graph which represents the directory tree for a filesystem.

multi-root operating system - An operating system which has multiple roots. Some operating systems have different directory trees for each different disk, drive, device, volume, share, or other entity managed the system, with each having its own root-name.

link - A name in a directory can be viewed as a pointer to some underlying directory or file content. Modern operating systems permit multiple directory elements to point to the same underlying directory or file content. Such a pointer is often called a link. Not all operating systems support the concept of links. Links may be reference-counted (hard link) or non-reference-counted (symbolic link).

hard link - A reference-counted link. Because the operating system manages the underlying file or directory, hard links are transparent to programs. They "just work" without the programmer needing to be aware of their existence.

symbolic link - A non-reference-counted link. The operating system manages some aspects of symbolic links. Most uses, such as opening or querying files, automatically resolve to the file or directory being pointed to rather than to the symbolic link itself. A few uses, such as remove() and rename(), modify the symbolic link rather than it's target. See an important symbolic links warning.

Common Specifications

Unless otherwise specified, all Filesystem Library member and non-member functions have the following common specifications:

Postconditions not guaranteed in the presence of race-conditions - Filesystem function specifications follow the form of C++ Standard Library specifications, and so sometimes specify behavior in terms of postconditions. If a race-condition exists, a function's postconditions may no longer be true by the time the function returns to the caller.

May throw exceptions - Filesystem Library functions may throw filesystem_error exceptions if they cannot successfully complete their operational specifications. Function implementations may use C++ Standard Library functions, which may throw std::bad_alloc. These exceptions may be thrown even though the error condition leading to the exception is not explicitly specified in the function's "Throws" paragraph.

Exceptions thrown via boost::throw_exception() - All exceptions thrown by the Filesystem Library are implemented by calling boost::throw_exception(). Thus exact behavior may differ depending on BOOST_NO_EXCEPTIONS at the time the filesystem source files are compiled.

Links follow operating system rules- Links are transparent in that Filesystem Library functions simply follow operating system rules. That implies that some functions may throw filesystem_error exceptions if a link is cyclic or has other problems. Also, see an important symbolic links warning.

Typical operating systems rules call for deep operations on all links except that destructive operations on non-reference counted links are either shallow, or fail altogether in the case of trying to remove a non-reference counted link to a directory.

Rationale: Follows existing practice (POSIX, Windows, etc.).

No atomic-operation or rollback guarantee - Filesystem Library functions which throw exceptions may leave the external file system in an altered state. It is suggested that implementations provide stronger guarantees when possible.

Rationale: Implementers shouldn't be required to provide guarantees which are impossible to meet on some operating systems. Implementers should be given normative encouragement to provide those guarantees when possible.

Graceful degradation -  Filesystem Library functions which cannot be fully supported on a particular operating system will be partially supported if possible. Implementations must document such partial support. Functions which are requested to provide some operation which they cannot support should report an error at compile time (preferred) or throw an exception at runtime.

Rationale: Implementations on less-powerful operating systems should provide useful functionality if possible, but are not be required to simulate features not present in the underlying operating system.

Race-condition danger

The state of files and directories is often globally shared, and thus may be changed unexpectedly by other threads, processes, or even other computers having network access to the filesystem. As an example of the difficulties this can cause, note that the following asserts may fail:

assert( exists( "foo" ) == exists( "foo" ) );  // (1)

remove_all( "foo" );
assert( !exists( "foo" ) );  // (2)

assert( is_directory( "foo" ) == is_directory( "foo" ) ); // (3)

(1) will fail if a non-existent "foo" comes into existence, or an existent "foo" is removed, between the first and second call to exists(). This could happen if, during the execution of the example code, another thread, process, or computer is also performing operations in the same directory.

(2) will fail if between the call to remove_all() and the call to exists() a new file or directory named "foo" is created by another thread, process, or computer.

(3) will fail if another thread, process, or computer removes an existing file "foo" and then creates a directory named "foo", between the example code's two calls to is_directory().

A program which needs to be robust when operating on potentially-shared file or directory resources should be prepared for filesystem_error exceptions to be thrown from any filesystem function except those explicitly specified as not throwing exceptions.

Implementation

The current implementation supports operating systems that have either the POSIX or Windows API's available.

The following tests are provided:

The library is in regular use on Apple Mac OS, HP-UX, IBM AIX, Linux, Microsoft Windows, SGI IRIX, and Sun Solaris operating systems using a variety of compilers.

Building the object-library

The object-library will normally be built automatically. See Getting Started. It can also be built manually using a Jamfile supplied in directory libs/filesystem/build, or the user can construct an IDE project or make file which includes the object-library source files.

The object-library source files (convenience.cpp, exception.cpp, operations_posix_windows.cpp, and path_posix_windows.cpp) are supplied in directory libs/filesystem/src. These source files implement the library for POSIX or Windows compatible operating systems; no implementation is supplied for other operating systems. Note that many operating systems not normally thought of as POSIX systems, such as mainframe legacy operating systems or embedded operating systems, support POSIX compatible file systems which will work with the Filesystem Library.

The object-library can be built for static or dynamic (shared/dll) linking. This is controlled by the BOOST_ALL_DYN_LINK or BOOST_FILESYSTEM_DYN_LINK macros. See the Separate Compilation page for a description of the techniques used.

Note for Cygwin users

The library's implementation code automatically detects the current platform, and compiles the POSIX or Windows implementation code accordingly. Automatic platform detection during object library compilation can be overridden by defining BOOST_POSIX or BOOST_WINDOWS macros. With the exception of the Cygwin environment, there is usually no reason to define one of the macros, as the software development kits supplied with most compilers only support a single platform.

The Cygwin package of tools supports traditional Windows usage, but also provides an emulation layer and other tools which can be used to make Windows act as Linux (and thus POSIX), and provide the Linux look-and-feel. GCC is usually the compiler of choice in this environment, as it can be installed via the Cygwin install process. Other compilers can also use the Cygwin emulation of POSIX, at least in theory.

Those wishing to use the Cygwin POSIX emulation layer should define the BOOST_POSIX macro when compiling the Boost Filesystem Library's object-library. The macro does not need to be defined (and will have no effect if defined) for Boost Filesystem Library user programs.

Acknowledgements

The Filesystem Library was designed and implemented by Beman Dawes. The directory_iterator and filesystem_error classes were based on prior work from Dietmar Kühl, as modified by Jan Langer. Thomas Witt was a particular help in later stages of development.

Key design requirements and design realities were developed during extensive discussions on the Boost mailing list, followed by comments on the initial implementation. Numerous helpful comments were then received during the Formal Review.

Participants included Aaron Brashears, Alan Bellingham, Aleksey Gurtovoy, Alex Rosenberg, Alisdair Meredith, Andy Glew, Anthony Williams, Baptiste Lepilleur, Beman Dawes, Bill Kempf, Bill Seymour, Carl Daniel, Chris Little, Chuck Allison, Craig Henderson, Dan Nuffer, Dan'l Miller, Daniel Frey, Darin Adler, David Abrahams, David Held, Davlet Panech, Dietmar Kuehl, Douglas Gregor, Dylan Nicholson, Ed Brey, Eric Jensen, Eric Woodruff, Fedder Skovgaard, Gary Powell, Gennaro Prota, Geoff Leyland, George Heintzelman, Giovanni Bajo, Glen Knowles, Hillel Sims, Howard Hinnant, Jaap Suter, James Dennett, Jan Langer, Jani Kajala, Jason Stewart, Jeff Garland, Jens Maurer, Jesse Jones, Jim Hyslop, Joel de Guzman, Joel Young, John Levon, John Maddock, John Williston, Jonathan Caves, Jonathan Biggar, Jurko, Justus Schwartz, Keith Burton, Ken Hagen, Kostya Altukhov, Mark Rodgers, Martin Schuerch, Matt Austern, Matthias Troyer, Mattias Flodin, Michiel Salters, Mickael Pointier, Misha Bergal, Neal Becker, Noel Yap, Parksie, Patrick Hartling, Pavel Vozenilek, Pete Becker, Peter Dimov, Rainer Deyke, Rene Rivera, Rob Lievaart, Rob Stewart, Ron Garcia, Ross Smith, Sashan, Steve Robbins, Thomas Witt, Tom Harris, Toon Knapen, Victor Wagner, Vincent Finn, Vladimir Prus, and Yitzhak Sapir

A lengthy discussion on the C++ committee's library reflector illuminated the "illusion of portability" problem, particularly in postings by PJ Plauger and Pete Becker.

Walter Landry provided much help illuminating symbolic link use cases for version 1.31.0.

Change history

Version 1.32.0

Version 1.31.0


Revised 02 August, 2005

© Copyright Beman Dawes, 2002

Use, modification, and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)