boost.png (6897 bytes)Type-safe 'printf-like' format class

Choices made

"Le pourquoi du comment" ( - "the why of the how")


The syntax of the format-string

Format is a new library. One of its goal is to provide a replacement for printf, that means format can parse a format-string designed for printf, apply it to the given arguments, and produce the same result as printf would have.
With this constraint, there were roughly 3 possible choices for the syntax of the format-string :

  1. Use the exact same syntax of printf. It's well known by many experienced users, and fits almost all needs. But with C++ streams, the type-conversion character, crucial to determine the end of a directive, is only useful to set some associated formatting options, in a C++ streams context (%x for setting hexa, etc..) It would be better to make this obligatory type-conversion character, with modified meaning, optional.
  2. extend printf syntax while maintaining compatibility, by using characters and constructs not yet valid as printf syntax. e.g. : "%1%", "%[1]", "%|1$d|", .. Using begin / end marks, all sort of extension can be considered.
  3. Provide a non-legacy mode, in parallel of the printf-compatible one, that can be designed to fit other objectives without constraints of compatibilty with the existing printf syntax.
    But Designing a replacement to printf's syntax, that would be clearly better, and as much powerful, is yet another task than building a format class. When such a syntax is designed, we should consider splitting Boost.format into 2 separate libraries : one working hand in hand with this new syntax, and another supporting the legacy syntax (possibly a fast version, built with safety improvement above snprintf or the like).
In the absence of a full, clever, new syntax clearly better adapted to C++ streams than printf, the second approach was chosen. Boost.format uses printf's syntax, with extensions (tabulations, centered alignements) that can be expressed using extensions to this syntax.
And alternate compatible notations are provided to address the weaknesses of printf's :


Why are arguments passed through an operator rather than a function call ?


The inconvenience of the operator approach (for some people) is that it might be confusing. It's a usual warning that too much of overloading operators gets people real confused.
Since the use of format objects will be in specific contexts ( most often right after a "cout << ") and look like a formatting string followed by arguments indeed :
format(" %s at %s  with %s\n") % x % y % z;
we can hope it wont confuse people that much.

An other fear about operators, is precedence problems. What if I someday write format("%s") % x+y
instead of format("%s") % (x+y) ??
It will make a mistake at compile-time, so the error will be immediately detected.
indeed, this line calls tmp = operator%( format("%s"), x)
and then operator+(tmp, y)
tmp will be a format object, for which no implicit conversion is defined, and thus the call to operator+ will fail. (except if you define such an operator, of course). So you can safely assume precedence mistakes will be noticed at compilation.


On the other hand, the function approach has a true inconvenience. It needs to define lots of template function like :

template <class T1, class T2,  .., class TN> 
string format(string s,  const T1& x1, .... , const T1& xN);

and even if we define those for N up to 500, that is still a limitation, that C's printf does not have.
Also, since format somehow emulates printf in some cases, but is far from being fully equivalent to printf, it's best to use a radically different appearance, and using operator calls succeeds very well in that !


Anyhow, if we actually chose the formal function call templates system, it would only be able to print Classes T for which there is an

operator<< ( stream,   const T&)
Because allowing both const and non const produces a combinatorics explosion - if we go up to 10 arguments, we need 2^10 functions.
(providing overloads on T& / const T& is at the frontier of defects of the C++ standard, and thus is far from guaranteed to be supported. But right now several compilers support those overloads)
There is a lot of chances that a class which only provides the non-const equivalent is badly designed, but yet it is another unjustified restriction to the user.
Also, some manipulators are functions, and can not be passed as const references. The function call approach thus does not support manipulators well.

In conclusion, using a dedicated binary operator is the simplest, most robust, and least restrictive mechanism to pass arguments when you can't know the number of arguments at compile-time.


Why operator% rather than a member function 'with(..)' ??

technically,
format(fstr) % x1 % x2 % x3;
has the same structure as
format(fstr).with( x1 ).with( x2 ).with( x3 );
which does not have any precedence problem. The only drawback, is it's harder for the eye to catch what is done in this line, than when we are using operators. calling .with(..), it looks just like any other line of code. So it may be a better solution, depending on tastes. The extra characters, and overall cluttered aspect of the line of code using 'with(..)' were enough for me to opt for a true operator.

Why operator% rather than usual formatting operator<< ??


Why operator% rather than operator(), or operator[] ??

operator() has the merit of being the natural way to send an argument into a function. And some think that operator[] 's meaning apply well to the usage in format.
They're as good as operator% technically, but quite ugly. (that's a matter of taste)
And deepd down, using operator% for passing arguments that were referred to by "%" in the format string seems much more natural to me than using those operators.


July 07, 2001

© Copyright Samuel Krempp 2001. Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.