ECMAScript 4 Netscape Proposal
Formal Description
Stages
|
|
Tuesday, October 15, 2002
The source code is processed in the following stages:
- If necessary, convert the source code into the Unicode UTF-16 format, normalized
form C.
- Remove any Unicode format control characters (category Cf) from the source code.
- Simultaneously split the source code into input elements using the lexical grammar
and semantics and parse it using the syntactic grammar
to obtain a parse tree P.
- Evaluate P using the syntactic semantics by computing the action Eval
on it.
Lexing and Parsing
Processing stage 3 is done as follows:
- Let inputElements be an empty array of input elements (syntactic grammar terminals
and line breaks).
- Let input be the input sequence of Unicode characters. Append a special placeholder End
to the end of input.
- Let state be a variable that holds one of the constants re, div,
or num. Initialize it to re.
- Apply the lexical grammar to parse the longest possible prefix of input.
Use the start symbol NextInputElementre,
NextInputElementdiv,
or NextInputElementnum
depending on whether state is re, div, or num,
respectively. The result of the parse should be a lexical grammar parse tree T. If the parse failed, return
a syntax error.
- Compute the action InputElement on T to obtain an InputElement
e.
- If e is the endOfInput input element, go to
step 15.
- Remove the characters matched by T from input, leaving only the yet-unlexed suffix of input.
- Interpret e as a syntactic grammar terminal or line break
as follows:
- A lineBreak is interpreted as a line break, which
is not a terminal itself but indicates one or more line breaks between two terminals. It prevents the syntactic
grammar from matching any productions that have a [no line break] annotation in the place where the
lineBreak occurred.
- An Identifier s is interpreted as the
terminal Identifier. Applying the semantic action Name
to the Identifier returns the String
value s.name.
- A Keyword s is interpreted as the reserved
word, future reserved word, or non-reserved word terminal corresponding
to the Keyword’s String
s.
- A Punctuator s is interpreted as the
punctuation token or future punctuation token terminal corresponding to
the Punctuator’s String
s.
- A NumberToken x is interpreted as
the terminal Number. Applying the semantic action Value
to the Number returns the GeneralNumber
value x.value.
- A negatedMinLong, which results from a numeric
long
token with the value 263, is interpreted as the terminal NegatedMinLong.
- A StringToken s is interpreted as
the terminal String. Applying the semantic action Value
to the String returns the String
value s.value.
- A RegularExpression z is interpreted
as the terminal RegularExpression.
- Append the resulting terminal or line break
to the end of the inputElements array.
- If the inputElements array forms a valid prefix of the context-free language defined by the syntactic
grammar, go to step 13.
- If is not a lineBreak
but the previous element of the inputElements array is a lineBreak,
then insert a VirtualSemicolon terminal between that lineBreak
and in the inputElements array.
- If the inputElements array still does not form a valid prefix of the context-free language defined by the
syntactic grammar, signal a syntax error and stop.
- If is a Number
or NegatedMinLong, then set state to num. Otherwise,
if the inputElements array followed by the terminal
/
forms a valid prefix
of the context-free language defined by the syntactic grammar, then set state
to div; otherwise, set state to re.
- Go to step 4.
- If the inputElements array does not form a valid sentence of the context-free language defined by the syntactic
grammar, signal a syntax error and stop.
- Return the parse tree obtained by the syntactic grammar’s derivation of the
sentence formed by the inputElements array.