Regular ExpressionsRegular Expressions
A gentle guide to using regular expressions in NetKernel
Home > Books > Tutorials and Training Guides > Development Tools > Regular Expressions

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Regular Expressions, or regex's, are a powerful industry-standard technology used to find and match patterns in text strings. In NetKernel they are used extensively to match the text-based URI addresses and re-write them into new addresses creating mappings and architectural layers.

For example, if you need to find either the word "color" or "colour", you can use the regular expression: "colou?r". The "?" symbol means to include 0 or 1 instances of the previous character in the match pattern. In this case both "color" and "colour" match this regular expression.

Regular expressions are very powerful and may seem complicated at first. Once you master the basic principles and learn a few rules you will become comfortable and proficient at applying them in your NetKernel applications. Wikipedia has a good entry on Regular Expressions and the O'Reilly book Mastering Regular Expressions is a comprehensive resource. Sun Microsystems also provides a Guide to Regular Experssions.

Regular Expressions in NetKernel

Resource requests include a URI resource address expressed in textual form. With regular expressions one can define matches and re-write rules for the URI addresses.

This guide uses NetKernel's Regular Expression Cookbook, a simple tool that allows you to interactively write. and evaluate regular expressions. In the examples below there will be a 'try it' link that will set up and run the cookbook pre-loaded with the example described in the text.

Direct Match

The simplest example is a direct match on a text string. In the following example the address ffcpl:/index.html is matched against the match element's expression and a substitute address ffcpl:/resource/index.html is used to find the resource.

<rule>
  <match>ffcpl:/index.html</match>
  <to>ffcpl:/resource/index.html</to>
</rule>

To see how this works in the Regular Expression Cookbook press this link to set up a match test: try it Next use the following link to set up a rewrite rule: try it

Simple Pattern Match

Next we use a simple regular expression pattern to match a group of addresses. This is simplistic and matches all requests in the ffcpl:/ address space to a single resource.

<rule>
  <match>ffcpl:/.*</match>
  <to>ffcpl:/resource/index.html</to>
</rule>

try it

Code to Runtime mapping

A more sophisticated example is used frequently to map the name of a code file to its runtime service. For example, if you need to map all Beanshell programs (files ending in the suffix ".bsh") to the beanshell service (active:beanshell) use the following rule:

<rule>
  <match>(.*\.bsh)(.*)</match>
  <to>active:beanshell+operator@$1$2</to>
</rule>

This rule captures two groups. The first group contains the resource address of the program and the second group contains any parameters attached to the program. For example, this rule matches the address ffcpl:/myprogram.bsh and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh. try it Next see how this rule works when parameters are involved: try it

Address Space Translation

A frequent use of regular expressions is address translation. For example, a module may export the address space ffcpl:/samples/.* but you want all addresses in the module's private address space to be anchored at ffcpl:/. This rewrite rule performs this address space translation:

<rule>
  <match>ffcpl:/samples/(.*)</match>
  <to>ffcpl:/$1</to>
</rule>

try it

Removing Parameters

Sometimes it is important to remove parameters from a request. The following rule matches everything up to the first parameter:

<rule>
  <match>(ffcpl:/.*?)(+.*)?</match>
  <to>$1</to>
</rule>

try it

Old, not refactored

Regular expressions are used within NetKernel to express the mapping of one URI address space to another. For example, the following mapping rule directs all requests for BeanShell programs (resources ending with the suffix ".bsh") to a URI that activates the BeanShell runtime to run the specified program.

This rule will match the request for ffcpl:/myprogram.bsh and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh.

<rule>
  <match>(.*\.bsh)(.*)</match>
  <to>active:beanshell+operator@$1$2</to>
</rule>

This rule will match the request for ffcpl:/myprogram.bsh and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh.

In this example two sets of parenthesis "(" and ")" are used in the regular expression. The parthesis capture part of the regular expression match and move that matching portion into a temporary variable. The first set of parenthesis creates variable "1", the second create the variable "2", and so on. We will examine this example in more detail later.

Because regular expressions play an important role in NetKernel applications, it would be wise to spend time mastering the common patterns of regular expressions used in NetKernel. This will boost your ability to comprehend and write NetKernel applications.

Hello World

The hello-world for regular expressions is to create an expression that matches 'Hello World'.

String: Hello World
Pattern: Hello World

Pretty silly since naturally one string matches another.

try it

Hello Worlds

Let's make this more meaningful. Let's create a pattern which matches all strings beginning with 'Hello' - we can do this with the following pattern:

String: Hello World
Pattern: Hello.*

In this example the "*" character means match 0 or more instances of the previous character (in this case, "."). The "." symbol means to match any character. The result of this regular expression is a match of all strings that begin with "Hello" and end with any number of characters.

try it

To prove that this matches anything beginning 'Hello', try this

Capturing Groups

String matching becomes even more powerful when the matching expression specifies how to rewrite one string to another string. This is used extensively for URI address space mappings. Regex provides a syntax for specifying 'capturing groups' - these are portions of the matching string which can be grabbed and used to write new strings. A capturing group is written as a matched pair of brackets '( )' - any number of capturing groups can be specified in a pattern.

A captured group can then be placed into a new string. This is done with a dollar-sign '$' followed by the index number of the captured group to substitute in - the index is the order in which the group appears

String: Hello World
Pattern: Hello(.*)
Rewrite: Goodbye$1

Anything after 'Hello' is captured (the index of this group is 1 as it is the first, and only, capturing group). The rewrite rule says to create a string beginning with 'Goodbye' and the '$1' says to append anything that was captured by group 1.

try it

Directing Beanshell Program Requests

Earlier you saw a module match rule that directed requests for BeanShell programs to the BeanShell runtime. It is now time to examine this in more detail.

<rule>
  <match>(.*\.bsh)(.*)</match>
  <to>active:beanshell+operator@$1$2</to>
</rule>

The intent of this rule is to capture all incoming requests for a BeanShell program and write the request so that it can be dispatched to the BeanShell runtime, activated by the URI "active:beanshell". Notice that we need to capture both name of the BeanShell program and any parameters sent to the program. This is accomplished by using two capture groups.

String: ShowMe.bsh+operand@number=3
Pattern: (.*\.bsh)(.*)
Rewrite: active:beanshell+operator@$1$2

try it

Summary

There are many ways that patterns can be written using the regular expression syntax. Take a look at the syntax reference to see the possibilities.

That covers the basics of regular expressions. The cookbook provides a set of the common recipes which occur most often in NetKernel URI address space mappings - now go and try some experiments.

© 2003-2007, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.