Introduction
Grammar for generic path strings
Canonical form
Header synopsis
Class path
Native path
representation
Representation example
Caution for POSIX and UNIX
programmers
Good programming practice:
relative paths
Path equality vs path equivalence
Member functions
Non-member functions
Default name_check mechansim
Rationale
Path decomposition examples
Filesystem Library functions traffic in objects of class path, provided by this header. The header also supplies non-member functions for error checking.
For actual operations on files and directories, see boost/filesystem/operations.hpp documentation.
For file I/O stream operations, see boost/filesystem/fstream.hpp documentation.
The Filesystem Library's Common Specifications apply to all member and non-member functions supplied by this header.
The Portability Guide discusses path naming issues which are important when portability is a concern.
Class path provides for portable mechanism for representing paths in C++ programs, using a portable generic path string grammar. Class path is concerned with the lexical and syntactic aspects of a path. The path does not have to exist in the operating system's filesystem, and may contain names which are not even valid for the current operating system.
Rationale: If Filesystem functions trafficked in std::strings or C-style strings, the functions would provide only an illusion of portability since the function calls would be portable but the strings they operate on would not be portable.
An object of class path can be conceptualized as containing a sequence
of strings. Each string is said to be an element of the path. Each element
represents the name of a directory, or, in the case
of the string representing the element farthest from the root in the directory
hierarchy, the name of a directory or file. The names ".."
and
"."
are reserved to represent the concepts of parent-directory
and directory-placeholder.
This conceptual path representation is independent of any particular representation of the path as a single string.
There is no requirement that an implementation of class path actually contain a sequence of strings, but conceptualizing the contents as a sequence of strings provides a completely portable way to reason about paths.
So that programs can portably express paths as a single string, class path defines a grammar for a portable generic path string format, and supplies constructor and append operations taking such strings as arguments. Because user input or third-party library functions may supply path strings formatted according to operating system specific rules, an additional constructor is provided which takes a system-specific format as an argument.
Access functions are provided to retrieve the contents of a object of class path formatted as a portable path string, a directory path string using the operating system's format, and a file path string using the operating system's format. Additional access functions retrieve specific portions of the contained path.
The grammar is specified in extended BNF, with terminal symbols in quotes:
path ::= [root] [relative-path] // an empty path is validroot ::= [root-name] [root-directory]root-directory ::= separatorrelative-path ::= path-element { separator path-element } [separator]path-element ::= name | parent-directory | directory-placeholdername ::= char { char }directory-placeholder ::= "."parent-directory ::= ".." separator ::= "/" // an implementation may define additional separators
root-name grammar is implementation-defined. root-name must not be present in generic input. It may be part of the strings returned by path member functions, and may be present in the src argument to path constructors when the native name check is in effect.
char may not be slash ('/') or '\0'. In additional, many operating and file systems may place additional restrictions on the characters which may appear in names. See File and Directory Name Recommendations.
Although implementation-defined, it is desirable that root-name have a grammar which is distinguishable from other grammar elements, and follow the conventions of the operating system.
The optional trailing "/" in a relative-path is allowed as a notational convenience. It has no semantic meaning and is simply discarded.
Whether or not a generic path string is actually portable to a particular operating system will depend on the names used. See the Portability Guide.
All operations modifying path objects leave the path object in canonical form.
An empty path is in canonical form.
A non-empty path is converted to canonical form as if by first converting it to the conceptual model, and then:
Normalized form is the same as canonical form, except that adjacent name, parent-directory elements are recursively removed.
Thus a non-empty path in normal form either has no directory-placeholders, or consists solely of one directory-placeholder. If it has parent-directory elements, they precede all name elements.
namespace boost { namespace filesystem { class path { public: typedef bool (*name_check)( const std::string & name ); // compiler generates copy constructor, // copy assignment, and destructor // constructors: path(); path( const std::string & src ); path( const char * src ); path( const std::string & src, name_check checker ); path( const char * src, name_check checker ); // append operations: path & operator /= ( const path & rhs ); path operator / ( const path & rhs ) const; // conversion functions: const std::string & string() const; std::string native_file_string() const; std::string native_directory_string() const; // modification functions: path & normalize(); // decomposition functions: path root_path() const; std::string root_name() const; std::string root_directory() const; path relative_path() const; std::string leaf() const; path branch_path() const; // query functions: bool empty() const; bool is_complete() const; bool has_root_path() const; bool has_root_name() const; bool has_root_directory() const; bool has_relative_path() const; bool has_leaf() const; bool has_branch_path() const; // iteration: typedef implementation-defined iterator; iterator begin() const; iterator end() const; // default name_check mechanism: static bool default_name_check_writable(); static void default_name_check( name_check new_check ); static name_check default_name_check();
// relational operators: bool operator==( const path & that ) const; bool operator!=( const path & that ) const; bool operator<( const path & that ) const; bool operator<=( const path & that ) const; bool operator>( const path & that ) const; bool operator>=( const path & that ) const; private: std::vector<std::string> m_name; // for exposition only }; path operator / ( const char * lhs, const path & rhs ); path operator / ( const std::string & lhs, const path & rhs ); // name_check functions bool portable_posix_name( const std::string & name ); bool windows_name( const std::string & name ); bool portable_name( const std::string & name ); bool portable_directory_name( const std::string & name ); bool portable_file_name( const std::string & name ); bool no_check( const std::string & name ); bool native( const std::string & name ); } }
For the sake of exposition, class path member functions are described as if the class contains a private member std::vector<std::string> m_name. Actual implementations may differ.
Class path member, or non-member operator/, functions may throw a filesystem_error exception if the path is not in the syntax specified for the grammar.
Note: There is no guarantee that a path object represents a path which is considered valid by the current operating system. A path might be invalid to the operating system because it contains invalid names (too long, invalid characters, and so on), or because it is a partial path still as yet unfinished by the program. An invalid path will normally be detected at time of use, such as by one of the Filesystem Library's operations or fstream functions.
Portability Warning: There is no guarantee that a path object represents a path which would be portable to another operating system. A path might be non-portable because it contains names which the operating systems considers too long or contains invalid characters. A default name_check mechanism is provided to aid in the detection of non-portable names, or a name_check function can be specified in path constructors. The library supplies several name_check functions, or users can supply their own.
Several path member functions return representations of m_name in formats specific to the operating system. These formats are implementation defined. If an m_name element contains characters which are invalid under the operating system's rules, and there is an unambiguous translation between the invalid character and a valid character, the implementation is required to perform that translation. For example, if an operating system does not permit lowercase letters in file or directory names, these letters will be translated to uppercase if unambiguous. Such translation does not apply to generic path string format representations.
The rule-of-thumb is to use string() when a generic string representation of the path is required, and use either native_directory_string() or native_file_string() when a string representation formatted for the particular operating system is required.
The difference between the representations returned by string(), native_directory_string(), and native_file_string() are illustrated by the following code:
path my_path( "foo/bar/data.txt" ); std::cout << "string------------------: " << my_path.string() << '\n' << "native_directory_string-: " << my_path.native_directory_string() << '\n' << "native_file_string------: " << my_path.native_file_string() << '\n';
On POSIX systems, the output would be:
string------------------: foo/bar/data.txt native_directory_string-: foo/bar/data.txt native_file_string------: foo/bar/data.txt
On Windows, the output would be:
string------------------: foo/bar/data.txt native_directory_string-: foo\bar\data.txt native_file_string------: foo\bar\data.txt
On classic Mac OS, the output would be:
string------------------: foo/bar/data.txt native_directory_string-: foo:bar:data.txt native_file_string------: foo:bar:data.txt
On a hypothetical operating system using OpenVMS format representations, it would be:
string------------------: foo/bar/data.txt native_directory_string-: [foo.bar.data.txt] native_file_string------: [foo.bar]data.txt
Note that that because OpenVMS uses period as both a directory separator character and as a separator between filename and extension, native_directory_string() in the example produces a useless result. On this operating system, the programmer should only use this path as a file path. (There is a portability recommendation to not use periods in directory names.)
POSIX and other UNIX-like operating systems have a single root, while most other operating systems have multiple roots. Multi-root operating systems require a root-name such as a drive, device, disk, volume, or share name for a path to be resolved to an actual specific file or directory. Because of this, the root() and root_directory() functions return identical results on UNIX and other single-root operating systems, but different results on multi-root operating systems. Thus use of the wrong function will not be apparent on UNIX-like systems, but will result in non-portable code which will fail when used on multi-root systems. UNIX programmers are cautioned to use particular care in choosing between root() and root_directory(). If undecided, use root().
The same warning applies to has_root() and has_root_directory().
It is usually bad programming practice to hard-code complete paths into programs. Such programs tend to be fragile because they break when directory trees get reorganized or the programs are moved to other machines or operating systems.
The most robust way to deal with path completion is to hard-code only relative paths. When a complete path is required, it can be obtained in several ways:
create_directory( "foo" ); // operating system will complete path
path foo( argv[1], native ); foo /= "foo";
path foo( initial_path() / "foo" );
Are paths "abc" and "ABC" equal? No, never, if you determine equality via
class path's operator==
, which considers only the two paths
lexical representations.
Do paths "abc" and "ABC" resolve to the same file or directory? The answer is "yes", "no", or "maybe" depending on the external file system. The (pending) operations function equivalent() is the only way to determine if two paths resolve to the same external file system entity.
Programmers wishing to determine if two paths are "the same" must decide if that means "the same representation" or "resolve to the same actual file or directory", and choose the appropriate function accordingly.
path();Effects: Default constructs an object of class path.
Postcondition: path().empty()
path( const std::string & src, name_check checker ); path( const char * src, name_check checker ); path( const std::string & src ); path( const char * src );For the single-argument forms,
default_name_check()
is used aschecker
.Precondition:
src != 0
.Effects: Select the grammar as follows:
- If
checker == native
, the operating system's implementation defined grammar for paths.- else if
checker == no_check
, the generic path string grammar with optional root-name.- else the generic path string grammar without root-name.
Parse src into a sequence of names, according to the grammar, then, for each name in
src
,m_name.push_back( name )
.Throws: For each name in
src
, throw ifchecker( name )
returns false.Postcondition:
m_name
is in canonical form. For the single-argument forms only,!default_name_check_writable()
.Rationale: The single-argument constructors are not
explicit
because an intended use is automatic conversion of strings to paths.
path & operator/=( const path & rhs );Effects: If any of the following conditions are met, then m_name.push_back("/").
- has_relative_path().
- !is_absolute() && has_root_name(), and the operating system requires the system-specific root be absolute
Then append
rhs.m_name
tom_name
.(Footnote: Thus on Windows, (path("//share") /= "foo").string() is "//share/foo")
Returns:
*this
Postcondition:
m_name
is in canonical form.Rationale: It is not considered an error for
rhs
to include aroot-directory
becausem_name
might be relative or empty, and thus it is valid for rhs to supplyroot-directory
. For example, on Windows, the following must succeed:path p( "c:", native ); p /= "/foo"; assert( p.string() == "c:/foo" );
const path operator/ ( const path & rhs ) const;Returns:
path( *this ) /= rhs
Rationale: Operator / is supplied because together with operator /=, it provides a convenient way for users to supply paths with a variable number of elements. For example,
initial_path() / "src" / test_name
. Operator+ and operator+= were considered as alternatives, but deemed too easy to confuse with those operators for std::string. Operator<< and operator=<< were used originally until during public review Dave Abrahams pointed out that / and /= match the generic path syntax.Note: Also see non-member operator/ functions.
path & normalize();
Postcondition: m_name is in normalized form.
Returns:
*this
const std::string & string() const;Returns: The contents of
m_name
, formatted according to the rules of the generic path string grammar.Note: The returned string must be unambiguous according to the grammar. That means that for an operating system with root-names indistinguishable from relative-path names, names containing "/", or allowing "." or ".." as directory or file names, escapes or other mechanisms will have to be introduced into the grammar to prevent ambiguities. This has not been done yet, since no current implementations are on operating systems with any of those problems.
See: Representation example above.
std::string native_file_string() const;Returns: The contents of
m_name
, formatted in the native representation of a file path.See: Representation example above.
Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.
const std::string native_directory_string() const;Returns: The contents of
m_name
, formatted in the native representation of a directory path.See: Representation example above.
Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.
path root_path() const;Returns:
root_name() / root_directory()
Portably provides a copy of a path's full root path, if any. See Path decomposition examples.
std::string root_name() const;Returns: If
!m_name.empty() && m_name[0]
is a root-name, returns m_name[0], else returns a null string.Portably provides a copy of a path's root-name, if any. See Path decomposition examples.
std::string root_directory() const;Returns: If the path contains root-directory, then
string("/")
, elsestring()
.Portably provides a copy of a path's root-directory, if any. The only possible results are "/" or "". See Path decomposition examples.
path relative_path() const;Returns: A new path containing only the relative-path portion of the source path.
Portably provides a copy of a path's relative portion, if any. See Path decomposition examples.
std::string leaf() const;Returns:
empty() ? std::string() : m_name.back()
A typical use is to obtain a file or directory name without path information from a path returned by a directory_iterator. See Path decomposition examples.
path branch_path() const;Returns:
m_name.size() <= 1 ? path("") : x
, wherex
is a path constructed from all the elements ofm_name
except the last.A typical use is to obtain the parent path for a path supplied by the user. See Path decomposition examples.
bool empty() const;Returns:
string().empty()
.The path::empty() function determines if a path string itself is empty. To determine if the file or directory identified by the path is empty, use the operations.hpp is_empty() function.
Naming rationale: C++ Standard Library containers use the empty name for the equivalent functions.
bool is_complete() const;Returns: For single-root operating systems,
has_root_directory()
. For multi-root operating systems,has_root_directory() && has_root_name()
.Naming rationale: The alternate name, is_absolute(), causes confusion and controversy because on multi-root operating systems some people believe root_name() should participate in is_absolute(), and some don't. See the FAQ.
Note: On most operating systems, a complete path always unambiguously identifies a specific file or directory. On a few systems (classic Mac OS, for example), even a complete path may be ambiguous in unusual cases because the OS does not require unambiguousness.
bool has_root_path() const;Returns:
has_root_name() || has_root_directory()
bool has_root_name() const;Returns:
!root_name().empty()
bool has_root_directory() const;Returns:
!root_directory().empty()
bool has_relative_path() const;Returns:
!relative_path().empty()
bool has_leaf() const;Returns:
!leaf().empty()
bool has_branch_path() const;Returns:
!branch_path().empty()
typedef implementation-defined iterator;
A const iterator meeting the C++ Standard Library requirements for bidirectional iterators (24.1). The iterator is a class type (so that operator++ and -- will work on temporaries). The value, reference, and pointer types are std::string, const std::string &, and const std::string *, respectively.
iterator begin() const;
Returns:
m_path.begin()
iterator end() const;
Returns:
m_path.end()
static bool default_name_check_writable();
Returns: True, unless a default_name_check function has been previously called.
static void default_name_check( name_check new_check );
Precondition: new_check != 0
Postcondition:
default_name_check(new_check) && !default_name_check_writable()
Throws: if
!default_name_check_writable()
static name_check default_name_check();
Returns: the default name_check.
Postcondition:
!default_name_check_writable()
bool operator==( const path & that ) const;Returns:
!(*this < that) && !(that < *this)
bool operator!=( const path & that ) const;Returns:
!(*this == that)
bool operator<( const path & that ) const;Returns:
std::lexicographical_compare( begin(), end(), that.begin(), that.end() )
See Path equality vs path equivalence.
Rationale: Relational operators are provided to ease uses such as specifying paths as keys in associative containers. Lexicographical comparison is used because:
- Even though not a full-fledged standard container, paths are enough like containers to merit meeting the C++ Standard Library's container comparison requirements (23.1 table 65).
- The alternative is to return
this->string(), that.string()
. But path::string() as currently specified can yield non-unique results for differing paths. The case (from Peter Dimov) ispath("first/")/"second"
andpath("first")/"second"
both returning a string() of"first//second"
.
bool operator<=( const path & that ) const;Returns:
!(that < *this)
bool operator>( const path & that ) const;Returns:
that < *this
bool operator>=( const path & that ) const;Returns:
!(*this < that)
path operator / ( const char * lhs, const path & rhs );
path operator / ( const std::string & lhs, const path & rhs );Returns:
path( lhs ) /= rhs
It is difficult or impossible to write portable programs without some way to verify that directory and file names are portable. Without automatic name checking, verification is tedious, error prone, and ugly. Yet no single name check function can serve all applications, and within an application different paths or portions of paths may require different name check functions. Sometimes there should be no checking at all.
Those needs are met by providing a default name check function to meet an application's most common needs, and then providing path constructors which override the default name check function to handle less common needs. The default name check function can be set by the application, allowing the most common case for the particular application to be handled by the default check.
The default name check function is set and retrieved by path static member functions, and as such is similar to a global variable. Since global variables are considered harmful [Wulf-Shaw-73], class path allows the default name check function to be set only once, and only before the first use. This turns a dangerous global variable into a safer global constant. Even with this protection, the ability to set the default name check function is still a powerful feature, and is still dangerous in that it can change the behavior of code buried out-of-sight in libraries or elsewhere. Thus changing the default error check function should only be done when explicitly specifying the function via the two argument path constructors is not reasonable.
Also see the FAQ for additional rationale.
Function naming: Class path member function names and operations.hpp non-member function names were chosen to be somewhat distinct from one another. The objective was to avoid cases like foo.empty() and empty( foo ) both being valid, but with completely different semantics. At one point path::empty() was renamed path::is_null(), but that caused many coding typos because std::string::empty() is often used nearby.
Decomposition functions: Decomposition functions are provided because without them it is impossible to write portable path manipulations. Convenience is also a factor.
Const vs non-const returns: In some earlier versions of the library, member functions returned values as const rather than non-const. See Scott Myers, Effective C++, Item 21. The const qualifiers were eliminated (1) to conform with C++ Standard Library practice, (2) because non-const returns allow occasionally useful expressions, and (3) because the number of coding errors eliminated were deemed rare. A requirement that path::iterator be a class type was added to eliminate non-const iterator errors.
It is often useful to extract specific elements from a path object. While any decomposition can be achieved by iterating over the elements of a path, convenience functions are provided which are easier to use, more efficient, and less error prone.
The first column of the table gives the example path, formatted by the string() function. The second column shows the values which would be returned by dereferencing each element iterator. The remaining columns show the results of various expressions.
p.string() | Elements | p.root_ path() |
p.root_ name() |
p.root_ directory() |
p.relative_ path() |
p.root_ directory() / p.relative_ path() |
p.root_ name() / p.relative_ path() |
p.branch_ path() |
p.leaf() |
All systems | |||||||||
/ |
/ |
/ |
"" |
/ |
"" |
/ |
"" |
"" |
/ |
foo |
foo |
"" |
"" |
"" |
foo |
foo |
foo |
"" |
foo |
/foo |
/,foo |
/ |
"" |
/ |
foo |
/foo |
foo |
/ |
foo |
foo/bar |
foo,bar |
"" |
"" |
"" |
foo/bar |
foo/bar |
foo/bar |
foo |
bar |
/foo/bar |
/,foo,bar |
/ |
"" |
/ |
foo/bar |
/foo/bar |
foo/bar |
/foo |
bar |
. |
. |
"" |
"" |
"" |
. |
. |
. |
"" |
. |
.. |
.. |
"" |
"" |
"" |
.. |
.. |
.. |
"" |
.. |
../foo |
..,foo |
"" |
"" |
"" |
../foo |
../foo |
../foo |
.. |
foo |
Windows | |||||||||
c: |
c: |
c: |
c: |
"" |
"" |
"" |
c: |
"" |
c: |
c:/ |
c:,/ |
c:/ |
c: |
/ |
"" |
/ |
c: |
c: |
/ |
c:.. |
c:,.. |
c: |
c: |
"" |
.. |
c:.. |
c:.. |
c: |
.. |
c:foo |
c:,foo |
c: |
c: |
"" |
foo |
foo |
c:foo |
c: |
foo |
c:/foo |
c:,/,foo |
c:/ |
c: |
/ |
foo |
/foo |
c:foo |
c:/ |
foo |
//shr |
//shr |
//shr |
//shr |
"" |
"" |
"" |
//shr |
"" |
//shr |
//shr/ |
//shr,/ |
//shr/ |
//shr |
/ |
"" |
/ |
//shr |
//shr |
/ |
//shr/foo |
//shr, |
//shr/ |
//shr |
/ |
foo |
/foo |
//shr/foo |
//shr/ |
foo |
prn: |
prn: |
prn: |
prn: |
"" |
"" |
"" |
prn: |
"" |
prn: |
Revised 02 August, 2005
© Copyright Beman Dawes, 2002
Use, modification, and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/LICENSE_1_0.txt)