ECMAScript 4 Stages

ECMAScript 4 Netscape Proposal

Formal Description

Stages

Tuesday, October 15, 2002

The source code is processed in the following stages:

If necessary, convert the source code into the Unicode UTF-16 format, normalized form C.
Remove any Unicode format control characters (category Cf) from the source code.
Simultaneously split the source code into input elements using the lexical grammar and semantics and parse it using the syntactic grammar to obtain a parse tree P.
Evaluate P using the syntactic semantics by computing the action Eval on it.

Lexing and Parsing

Processing stage 3 is done as follows:

Let inputElements be an empty array of input elements (syntactic grammar terminals and line breaks).
Let input be the input sequence of Unicode characters. Append a special placeholder End to the end of input.
Let state be a variable that holds one of the constants re, div, or num. Initialize it to re.
Apply the lexical grammar to parse the longest possible prefix of input. Use the start symbol NextInputElement^re, NextInputElement^div, or NextInputElement^num depending on whether state is re, div, or num, respectively. The result of the parse should be a lexical grammar parse tree T. If the parse failed, return a syntax error.
Compute the action InputElement on T to obtain an InputElement e.
If e is the endOfInput input element, go to step 15.
Remove the characters matched by T from input, leaving only the yet-unlexed suffix of input.
Interpret e as a syntactic grammar terminal or line break as follows:
- A lineBreak is interpreted as a line break, which is not a terminal itself but indicates one or more line breaks between two terminals. It prevents the syntactic grammar from matching any productions that have a [no line break] annotation in the place where the lineBreak occurred.
- An Identifier s is interpreted as the terminal Identifier. Applying the semantic action Name to the Identifier returns the String value s.name.
- A Keyword s is interpreted as the reserved word, future reserved word, or non-reserved word terminal corresponding to the Keyword’s String s.
- A Punctuator s is interpreted as the punctuation token or future punctuation token terminal corresponding to the Punctuator’s String s.
- A NumberToken x is interpreted as the terminal Number. Applying the semantic action Value to the Number returns the GeneralNumber value x.value.
- A negatedMinLong, which results from a numeric long token with the value 2⁶³, is interpreted as the terminal NegatedMinLong.
- A StringToken s is interpreted as the terminal String. Applying the semantic action Value to the String returns the String value s.value.
- A RegularExpression z is interpreted as the terminal RegularExpression.
Append the resulting terminal or line break to the end of the inputElements array.
If the inputElements array forms a valid prefix of the context-free language defined by the syntactic grammar, go to step 13.
If is not a lineBreak but the previous element of the inputElements array is a lineBreak, then insert a VirtualSemicolon terminal between that lineBreak and in the inputElements array.
If the inputElements array still does not form a valid prefix of the context-free language defined by the syntactic grammar, signal a syntax error and stop.
If is a Number or NegatedMinLong, then set state to num. Otherwise, if the inputElements array followed by the terminal / forms a valid prefix of the context-free language defined by the syntactic grammar, then set state to div; otherwise, set state to re.
Go to step 4.
If the inputElements array does not form a valid sentence of the context-free language defined by the syntactic grammar, signal a syntax error and stop.
Return the parse tree obtained by the syntactic grammar’s derivation of the sentence formed by the inputElements array.

Waldemar Horwat
Last modified Tuesday, October 15, 2002