Distinct Parser |
The distinct parsers are utility parsers which ensure that matched input is not immediately followed by a forbidden pattern. Their typical usage is to distinguish keywords from identifiers.
The basic usage of the distinct_parser is to replace the str_p parser. For example the declaration_rule in the following example:
rule<ScannerT> declaration_rule = str_p("declare") >> lexeme_d[+alpha_p];
would correctly match an input "declare abc", but as well an input"declareabc" what is usually not intended. In order to avoid this, we can use distinct_parser:
// keyword_p may be defined in the global scope
distinct_parser<> keyword_p("a-zA-Z0-9_");
rule<ScannerT> declaration_rule = keyword_p("declare") >> lexeme_d[+alpha_p];
The keyword_p works in the same way as the str_p parser but matches only when the matched input is not immediately followed by one of the characters from the set passed to the constructor of keyword_p. In the example the "declare" can't be immediately followed by any alphabetic character, any number or an underscore.
See the full example here .
For more sophisticated cases, for example when keywords are stored in a symbol table, we can use distinct_directive.
distinct_directive<> keyword_d("a-zA-Z0-9_");
symbol<> keywords = "declare", "begin", "end";
rule<ScannerT> keyword = keyword_d[keywords];
In some cases a set of forbidden follow-up characters is not sufficient. For example ASN.1 naming conventions allows identifiers to contain dashes, but not double dashes (which marks the beginning of a comment). Furthermore, identifiers can't end with a dash. So, a matched keyword can't be followed by any alphanumeric character or exactly one dash, but can be followed by two dashes.
This is when dynamic_distinct_parser and the dynamic_distinct_directive come into play. The constructor of the dynamic_distinct_parser accepts a parser which matches any input that must NOT follow the keyword.
// Alphanumeric characters and a dash followed by a non-dash
// may not follow an ASN.1 identifier.
dynamic_distinct_parser<> keyword_p(alnum_p | ('-' >> ~ch_p('-')));
rule<ScannerT> declaration_rule = keyword_p("declare") >> lexeme_d[+alpha_p];
Since the dynamic_distinct_parser internally uses a rule, its type is dependent on the scanner type. So, the keyword_p shouldn't be defined globally, but rather within the grammar.
See the full example here.
When the keyword_p_1 and the keyword_p_2 are defined as
distinct_parser<> keyword_p(forbidden_chars);
distinct_parser_dynamic<> keyword_p(forbidden_tail_parser);
the parsers
keyword_p_1(str)
keyword_p_2(str)
are equivalent to the rules
lexeme_d[chseq_p(str) >> ~epsilon_p(chset_p(forbidden_chars))]
lexeme_d[chseq_p(str) >> ~epsilon_p(forbidden_tail_parser)]
Copyright © 2003-2004
Vaclav Vesely
Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)