Bespin Plugin Guide

Syntax highlighting

Introduction

Syntax highlighting in Bespin is designed from the ground up for flexibility and extensibility. It's easy to design syntax highlighting engines for your favorite programming languages and share them with others.

At its core, a syntax highlighter is simply a routine that annotates lines of text handed to it with tags. Tags are short keywords that describe what a line fragment is; for example, keyword, identifier, and number are all common tags. Syntax engines don't directly provide the colors that you see on the screen; the mapping from tags to colors is the responsibility of the theme. But the syntax engine provides the programming language-specific markup that the theme needs to do its work.

There are two APIs for syntax developers: the simple declarative standard syntax API, which derives from StandardSyntax, and the low-level programmatic syntax API (which is really an informal interface) providing the most flexibility. The built-in Bespin highlighters use the standard syntax API, but advanced developers may prefer the programmatic API for more fine-grained control over the highlighting. The standard syntax engine is built on top of the programmatic API.

Metadata

Like all Bespin plugins, syntax engines are JavaScript files (or, less commonly, collections of JavaScript files). The syntax manager looks for plugins at the syntax extension point like so:

{
    "description": "HTML syntax highlighter",
    "dependencies": { "standard_syntax": "0.0.0" },
    "environments": { "worker": true },
    "provides": [
        {
            "ep": "syntax",
            "name": "html",
            "pointer": "#HTMLSyntax"
        }
    ]
}

The syntax object that you provide as the target of the pointer is either an instance of StandardSyntax, for the standard API, or a JS Object, for the programmatic API.

The Standard API

The standard syntax API is based on regular expressions. For a tutorial on JavaScript regular expressions, see regular-expressions.info's JavaScript regex tutorial. On the same site is a handy online tool to test your regexes.

Standard syntax plugins in Bespin are plugins like any other, but most of the code is written for you. All you need to do is to provide a list of regexes, tags, and actions, grouped into states. Let's look at an excerpt from the JavaScript syntax highlighter for an example:

exports.JSSyntax = StandardSyntax.create({
    states: {
        start: [
            {
                regex:  /^[A-Za-z_][A-Za-z0-9_]*/,
                tag:    'identifier'
            },
            {
                regex:  /^"/,
                tag:    'string',
                then:   'qqstring'
            }
        ]
        ...

To begin with, the JavaScript syntax highlighter derives from the StandardSyntax class using the standard SproutCore create method. The standard syntax engine is passed a list of states, the first of which is always named start. Within each state is a list of regular expressions. The first regex in this example, /^[A-Za-z_][A-Za-z0-9_]*/, matches a word consisting of letters, numbers, and underscores, starting with a letter or underscore, which happens to match most JavaScript identifiers. You can see that, in fact, this regex is tagged with identifier, and when this regular expression matches some text, the identifier tag will be applied to the text that it matched and passed on to the theme engine.

The second regex, /^"/, matches the quote character ". This character starts a string in JavaScript, and sure enough, the associated tag is string. In this case, in order to highlight the text correctly, all characters after the " (up to the next ") need to be considered part of the same string. So the JavaScript syntax engine specifies an action to perform via the then property. Here, the action is a transition to the qqstring (double quoted string) state.

Note that all regexes are anchored at the beginning of the string with the ^ character. As you write a syntax highlighter, it's crucial to anchor all regexes in this way. If you don't, then your regex will match if the pattern appears anywhere in the line, and the syntax highlighting engine will become confused.

Now let's look at a more advanced case: a simple HTML highlighter. (The actual HTML highlighter is more complex than this, because it allows for attributes in tags and detects malformed syntax.)

exports.HTMLSyntax = StandardSyntax.create({
    states: {
        start: [
            {
                regex:  /^<script>/i,
                tag:    'tag',
                then:   'script start:js'
            },
            {
                regex:  /^[^<]+/,
                tag:    'plain'
            }
        ],
        script: [
            {
                regex:  /^[^<]+/,
                tag:    'plain'
            },
            {
                regex:  /^<\/script>/i,
                tag:    'tag',
                then:   'start stop:js'
            },
            {
                regex:  /^./,
                tag:    'plain'
                then:   'start'
            }
        ]
    }
});

This highlighter allows the user to embed JavaScript inside <script> tags. When the /^<script>/i regex matches, the syntax engine switches to the script state and starts the js context, as specified by the value of the then property. Once in the script state, the regex /^<\/script>/i likewise triggers a switch to the start state and ends the js context. Under the hood, once the standard syntax engine sees the start: or stop: tag in the list of actions, it begins to load the appropriate highlighter in the background. As soon as the highlighter is loaded, it is run, and the colors in the region delimited by the start: and stop: actions change to those specified by the new highlighter (overriding, in this case, the plain tag).

That's all there is to the standard syntax API. It's powerful enough to handle most cases—in fact, all of Bespin's syntax highlighters are written using this API—but if you want more flexibility or need to run your own custom parsing code, read on.

The Programmatic API

Note: This section is incomplete.

As far as Bespin is concerned, a syntax engine is just a SproutCore object that implements the method syntaxInfoForLineFragment. In JavaScript pseudocode, this method has the following signature:

syntaxInfoForLineFragment(context : string, state : string, line : string,
    start : number, end : number) : Promise<Result>

The context is a string describing the current context: this will be equal to the name of the context that your plugin exports. The state is initially start and is afterward equal to whatever your plugin returned for the previous line; for efficiency, your plugin should store all of its state in this string. line is the text of the current line, while start and end are the boundaries of the region to be styled. (start is inclusive, while end is exclusive. So the text to be highlighted can be retrieved with line.substring(start, end).)

The Result object is defined as an object with these properties:

Result = { attrs : Array<Attr>, next : Next }

attrs is an array of attribute ranges, which specify the boundaries of each range. The next property specifies the context and state for the end of the line.

The Attr object is an object with these properties:

Attr = { start : number, end : number, state : string, tag : string,
    actions : Array<string> }

And the Next object is an object that looks like this:

Next = { context : string, state : string }

Note that this function returns a promise to return a Result object, not a Result object itself. This means that your syntax highlighting engine can do work asynchronously; e.g. in a Web Worker. For more information on promises, see the relevant CommonJS specification.

To see an example of a syntax engine based on the programmatic API in action, check out the StandardSyntax plugin in plugins/supported/SyntaxManager/controllers/standardsyntax.js.