Numerics

Numerics

Similar to chlit, strlit etc. numeric parsers are also primitives. Numeric parsers are placed on a section of their own to give this important building block better focus. The framework includes a couple of predefined objects for parsing signed and unsigned integers and real numbers. These parsers are fully parametric. Most of the important aspects of numeric parsing can be finely adjusted to suit. This includes the radix base, the minimum and maximum number of allowable digits, the exponent, the fraction etc. Policies control the real number parsers' behavior. There are some predefined policies covering the most common real number formats but the user can supply her own when needed.

uint_parser

This class is the simplest among the members of the numerics package. The uint_parser can parse unsigned integers of arbitrary length and size. The uint_parser parser can be used to parse ordinary primitive C/C++ integers or even user defined scalars such as bigints (unlimited precision integers). Like most of the classes in Spirit, the uint_parser is a template class. Template parameters fine tune its behavior. The uint_parser is so flexible that the other numeric parsers are implemented using it as the backbone.

    template <
        typename T = unsigned,
        int Radix = 10,
        unsigned MinDigits = 1,
        int MaxDigits = -1>
    struct uint_parser { /*...*/ };

uint_parser template parameters
T	The numeric base type of the numeric parser. Defaults to `unsigned`
Radix	The radix base. This can be either 2: binary, 8: octal, 10: decimal and 16: hexadecimal. Defaults to 10; decimal
MinDigits	The minimum number of digits allowable
MaxDigits	The maximum number of digits allowable. If this is -1, then the maximum limit becomes unbounded

Predefined uint_parsers
bin_p	`uint_parser<unsigned, 2, 1, -1> const`
oct_p	`uint_parser<unsigned, 8, 1, -1> const`
uint_p	`uint_parser<unsigned, 10, 1, -1> const`
hex_p	`uint_parser<unsigned, 16, 1, -1> const`

The following example shows how the uint_parser can be used to parse thousand separated numbers. The example can correctly parse numbers such as 1,234,567,890.

    uint_parser<unsigned, 10, 1, 3> uint3_p;        //  1..3 digits
    uint_parser<unsigned, 10, 3, 3> uint3_3_p;      //  exactly 3 digits
    ts_num_p = (uint3_p >> *(',' >> uint3_3_p));    //  our thousand separated number parser

bin_p, oct_p, uint_p and hex_p are parser generator objects designed to be used within expressions. Here's an example of a rule that parses comma delimited list of numbers (We've seen this before):

    list_of_numbers = real_p >> *(',' >> real_p);

Later, we shall see how we can extract the actual numbers parsed by the numeric parsers. We shall deal with this when we get to the section on specialized actions.

int_parser

The int_parser can parse signed integers of arbitrary length and size. This is almost the same as the uint_parser. The only difference is the additional task of parsing the '+' or '-' sign preceding the number. The class interface is the same as that of the uint_parser.

A predefined int_parser
int_p	`int_parser<int, 10, 1, -1> const`

real_parser

The real_parser can parse real numbers of arbitrary length and size limited by its parametric type T. The real_parser is a template class with 2 template parameters. Here's the real_parser template interface:

    template<
        typename T = double,
        typename RealPoliciesT = ureal_parser_policies<T> >
    struct real_parser;

The first template parameter is its numeric base type T. This defaults to double.

Parsing special numeric types

Notice that the numeric base type T can be specified by the user. This means that we can use the numeric parsers to parse user defined numeric types such as fixed_point (fixed point reals) and bigint (unlimited precision integers).

The second template parameter is a class that groups all the policies and defaults to ureal_parser_policies<T>. Policies control the real number parsers' behavior. The default policies provided are designed to parse C/C++ style real numbers of the form nnn.fff.Eeee where nnn is the whole number part, fff is the fractional part, E is 'e' or 'E' and eee is the exponent optionally preceded by '-' or '+'. This corresponds to the following grammar, with the exception that plain integers without the decimal point are also accepted by default.

    floatingliteral
        =   fractionalconstant >> !exponentpart
        |  +digit_p >> exponentpart
        ;

    fractionalconstant
        =  *digit_p >> '.' >> +digit_p
        |  +digit_p >> '.'
        ;

    exponentpart
        =   ('e' | 'E') >> !('+' | '-') >> +digit_p
        ;

The default policies are provided to take care of the most common case (there are many ways to represent, and hence parse, real numbers). In most cases, the default setting of the real_parser is sufficient and can be used straight out of the box. Actually, there are four real_parsers pre-defined for immediate use:

Predefined real_parsers
ureal_p	`real_parser<double, ureal_parser_policies<double> > const`
real_p	`real_parser<double, real_parser_policies<double> > const`
strict_ureal_p	`real_parser<double, strict_ureal_parser_policies<double> > const`
strict_real_p	`real_parser<double, strict_real_parser_policies<double> > const`

We've seen real_p before. ureal_p is its unsigned variant.

Strict Reals

Integer numbers are considered a subset of real numbers, so real_p and ureal_p recognize integer numbers (without a dot) as real numbers. strict_real_p and strict_ureal_p are the equivalent parsers that require a dot to be present for a number to be considered a successful match.

Advanced: real_parser policies

The parser policies break down real number parsing into 6 steps:

1	parse_sign	Parse the prefix sign
2	parse_n	Parse the integer at the left of the decimal point
3	parse_dot	Parse the decimal point
4	parse_frac_n	Parse the fraction after the decimal point
5	parse_exp	Parse the exponent prefix (e.g. 'e')
6	parse_exp_n	Parse the actual exponent

And the interaction of these sub-parsing tasks is further controlled by these 3 policies:

1	allow_leading_dot	Allow a leading dot to be present (".1" becomes equivalent to "0.1")
2	allow_trailing_dot	Allow a trailing dot to be present ("1." becomes equivalent to "1.0")
3	expect_dot	Require a dot to be present (disallows "1" to be equivalent to "1.0")

[ From here on, required reading: The Scanner, In-depth The Parser and In-depth The Scanner ]

sign_parser and sign_p

Before we move on, a small utility parser is included here to ease the parsing of the '-' or '+' sign. While it is easy to write one:

    sign_p = (ch_p('+') | '-');

it is not possible to extract the actual sign (positive or negative) without resorting to semantic actions. The sign_p parser has a bool attribute returned to the caller through the match object which, after parsing, is set to true if the parsed sign is negative. This attribute detects if the negative sign has been parsed. Examples:

    bool is_negative;
    r = sign_p[assign_a(is_negative)];

or simply...

    // directly extract the result from the match result's value
    bool is_negative = sign_p.parse(scan).value();

The sign_p parser expects attached semantic actions to have a signature (see Specialized Actions for further detail) compatible with:

Signature for functions:

    void func(bool is_negative);

Signature for functors:

    struct ftor
    {
        void operator()(bool is_negative) const;
    };

ureal_parser_policies

    template <typename T>
    struct ureal_parser_policies
    {
        typedef uint_parser<T, 10, 1, -1>   uint_parser_t;
        typedef int_parser<T, 10, 1, -1>    int_parser_t;

        static const bool allow_leading_dot  = true;
        static const bool allow_trailing_dot = true;
        static const bool expect_dot         = false;

        template <typename ScannerT>
        static typename match_result<ScannerT, nil_t>::type
        parse_sign(ScannerT& scan)
        { return scan.no_match(); }

        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_n(ScannerT& scan)
        { return uint_parser_t().parse(scan); }

        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_dot(ScannerT& scan)
        { return ch_p('.').parse(scan); }

        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_frac_n(ScannerT& scan)
        { return uint_parser_t().parse(scan); }

        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_exp(ScannerT& scan)
        { return as_lower_d['e'].parse(scan); }

        template <typename ScannerT>
        static typename parser_result<int_parser_t, ScannerT>::type
        parse_exp_n(ScannerT& scan)
        { return int_parser_t().parse(scan); }
    };

The default ureal_parser_policies uses the lower level integer numeric parsers to do its job.

real_parser_policies

    template <typename T>
    struct real_parser_policies : public ureal_parser_policies<T>
    {
        template <typename ScannerT>
        static typename parser_result<sign_parser, ScannerT>::type
        parse_sign(ScannerT& scan)
        { return sign_p.parse(scan); }
    };

Notice how the real_parser_policies replaced parse_sign of the ureal_parser_policies from which it is subclassed. The default real_parser_policies simply uses a sign_p instead of scan.no_match() in the parse_sign step.

strict_ureal_parser_policies and strict_real_parser_policies

    template <typename T>
    struct strict_ureal_parser_policies : public ureal_parser_policies<T>
    {
        static const bool expect_dot = true;
    };

    template <typename T>
    struct strict_real_parser_policies : public real_parser_policies<T>
    {
        static const bool expect_dot = true;
    };

Again, these policies replaced just the policies they wanted different from their superclasses.

Specialized real parser policies can reuse some of the defaults while replacing a few. For example, the following is a real number parser policy that parses thousands separated numbers with at most two decimal places and no exponent.

The full source code can be viewed here.

    template <typename T>
    struct ts_real_parser_policies : public ureal_parser_policies<T>
    {
        //  These policies can be used to parse thousand separated
        //  numbers with at most 2 decimal digits after the decimal
        //  point. e.g. 123,456,789.01

        typedef uint_parser<int, 10, 1, 2>  uint2_t;
        typedef uint_parser<T, 10, 1, -1>   uint_parser_t;
        typedef int_parser<int, 10, 1, -1>  int_parser_t;

        //////////////////////////////////  2 decimal places Max
        template <typename ScannerT>
        static typename parser_result<uint2_t, ScannerT>::type
        parse_frac_n(ScannerT& scan)
        { return uint2_t().parse(scan); }

        //////////////////////////////////  No exponent
        template <typename ScannerT>
        static typename parser_result<chlit<>, ScannerT>::type
        parse_exp(ScannerT& scan)
        { return scan.no_match(); }

        //////////////////////////////////  No exponent
        template <typename ScannerT>
        static typename parser_result<int_parser_t, ScannerT>::type
        parse_exp_n(ScannerT& scan)
        { return scan.no_match(); }

        //////////////////////////////////  Thousands separated numbers
        template <typename ScannerT>
        static typename parser_result<uint_parser_t, ScannerT>::type
        parse_n(ScannerT& scan)
        {
            typedef typename parser_result<uint_parser_t, ScannerT>::type RT;
            static uint_parser<unsigned, 10, 1, 3> uint3_p;
            static uint_parser<unsigned, 10, 3, 3> uint3_3_p;

            if (RT hit = uint3_p.parse(scan))
            {
                T n;
                typedef typename ScannerT::iterator_t iterator_t;
                iterator_t save = scan.first;
                while (match<> next = (',' >> uint3_3_p[assign_a(n)]).parse(scan))
                {
                    hit.value() *= 1000;
                    hit.value() += n;
                    scan.concat_match(hit, next);
                    save = scan.first;
                }
                scan.first = save;
                return hit;

                // Note: On erroneous input such as "123,45", the result should
                // be a partial match "123". 'save' is used to makes sure that
                // the scanner position is placed at the last *valid* parse
                // position.
            }
            return scan.no_match();
        }
    };

Copyright © 1998-2002 Joel de Guzman

Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)