Class: PHP_LexerGenerator - X-Ref
The basic home class for the lexer generator. A lexer scans text and
organizes it into tokens for usage by a parser.
Sample Usage:
<code>
require_once 'PHP/LexerGenerator.php';
$lex = new PHP_LexerGenerator('/path/to/lexerfile.plex');
</code>
A file named "/path/to/lexerfile.php" will be created.
File format consists of a PHP file containing specially
formatted comments like so:
<code>
/*!lex2php
{@*}
</code>
The first lex2php comment must contain several declarations and define
all regular expressions. Declarations (processor instructions) start with
a "%" symbol and must be:
- %counter
- %input
- %token
- %value
- %line
token and counter should define the class variables used to define lexer input
and the index into the input. token and value should be used to define the class
variables used to store the token number and its textual value. Finally, line
should be used to define the class variable used to define the current line number
of scanning.
For example:
<code>
/*!lex2php
%counter {$this->N}
%input {$this->data}
%token {$this->token}
%value {$this->value}
%line {%this->linenumber}
{@*}
</code>
Patterns consist of an identifier containing upper or lower-cased letters, and
a descriptive match pattern.
Descriptive match patterns may either be regular expressions (regexes) or
quoted literal strings. Here are some examples:
<pre>
pattern = "quoted literal"
ANOTHER = /[a-zA-Z_]+/
</pre>
Quoted strings must escape the \ and " characters with \" and \\.
Regex patterns must be in Perl-compatible regular expression format (preg).
special characters (like \t \n or \x3H) can only be used in regexes, all
\ will be escaped in literal strings.
Any sub-patterns must be defined using (?:) instead of ():
<code>
/*!lex2php
%counter {$this->N}
%input {$this->data}
%token {$this->token}
%value {$this->value}
%line {%this->linenumber}
alpha = /[a-zA-Z]/
alphaplus = /[a-zA-Z]+/
number = /[0-9]/
numerals = /[0-9]+/
whitespace = /[ \t\n]+/
blah = "$\""
blahblah = /a\$/
GAMEEND = @(?:1\-0|0\-1|1/2\-1/2)@
PAWNMOVE = /P?[a-h](?:[2-7]|[18]\=(?:Q|R|B|N))|P?[a-h]x[a-h](?:[2-7]|[18]\=(?:Q|R|B|N))/
{@*}
</code>
All regexes must be delimited. Any legal preg delimiter can be used (as in @ or / in
the example above)