Regular Expressions,
or regex's, are a powerful industry-standard technology used to find and match
patterns in text strings.
In NetKernel they are used extensively to match the text-based
URI addresses and re-write them
into new addresses creating mappings and architectural layers.
For example, if you need to find either the word "color" or "colour",
you can use the regular expression: "colou?r".
The "?" symbol means to include 0 or 1 instances of the previous character in
the match pattern.
In this case both "color" and "colour" match this regular expression.
Regular expressions are very powerful and may seem complicated at first.
Once you master the basic principles and learn a few rules
you will become comfortable and proficient at applying them in
your NetKernel applications.
Wikipedia has a good entry on
Regular Expressions
and the O'Reilly book
Mastering Regular Expressions
is a comprehensive resource.
Sun Microsystems also provides a
Guide to Regular Experssions
.
Regular Expressions in NetKernel
Resource requests include a URI resource address expressed in textual form.
With regular expressions one can define matches and re-write rules for the URI
addresses.
This guide uses NetKernel's
Regular Expression Cookbook
,
a simple tool that allows you to interactively write.
and evaluate regular expressions.
In the examples below there will be a 'try it' link that
will set up and run the cookbook pre-loaded with the example
described in the text.
Direct Match
The simplest example is a direct match on a text string.
In the following example the address ffcpl:/index.html
is matched against the match
element's expression
and a substitute address ffcpl:/resource/index.html
is used to find the resource.
<rule>
<match>ffcpl:/index.html</match>
<to>ffcpl:/resource/index.html</to>
</rule>
To see how this works in the Regular Expression Cookbook press this link
to set up a match test:
try it
Next use the following link to set up a rewrite rule:
try it
Simple Pattern Match
Next we use a simple regular expression pattern to match a group of addresses.
This is simplistic and matches all requests in the ffcpl:/
address
space to a single resource.
<rule>
<match>ffcpl:/.*</match>
<to>ffcpl:/resource/index.html</to>
</rule>
try it
Code to Runtime mapping
A more sophisticated example is used frequently to map the
name of a code file to its runtime service.
For example, if you need to map all Beanshell programs (files ending
in the suffix ".bsh") to the beanshell service (active:beanshell
)
use the following rule:
<rule>
<match>(.*\.bsh)(.*)</match>
<to>active:beanshell+operator@$1$2</to>
</rule>
This rule captures two groups.
The first group contains the resource address of the program
and the second group contains any parameters attached to the
program.
For example, this rule matches the address ffcpl:/myprogram.bsh
and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh
.
try it
Next see how this rule works when parameters are involved:
try it
Address Space Translation
A frequent use of regular expressions is address translation.
For example, a module may export the address space
ffcpl:/samples/.*
but you want all addresses in the
module's private address space to be anchored at ffcpl:/
.
This rewrite rule performs this address space translation:
<rule>
<match>ffcpl:/samples/(.*)</match>
<to>ffcpl:/$1</to>
</rule>
try it
Removing Parameters
Sometimes it is important to remove parameters from a request.
The following rule matches everything up to the first parameter:
<rule>
<match>(ffcpl:/.*?)(+.*)?</match>
<to>$1</to>
</rule>
try it
Old, not refactored
Regular expressions are used within NetKernel to express the mapping of one URI
address space to another.
For example, the following mapping rule directs all requests for BeanShell programs
(resources ending with the suffix ".bsh") to a URI that activates the
BeanShell runtime to run the specified program.
This rule will match the request for ffcpl:/myprogram.bsh
and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh
.
<rule>
<match>(.*\.bsh)(.*)</match>
<to>active:beanshell+operator@$1$2</to>
</rule>
This rule will match the request for ffcpl:/myprogram.bsh
and replace it with active:beanshell+operator@ffcpl:/myprogram.bsh
.
In this example two sets of parenthesis "(" and ")"
are used in the regular expression.
The parthesis capture part of the regular expression match and
move that matching portion into a temporary variable.
The first set of parenthesis creates variable "1",
the second create the variable "2", and so on.
We will examine this example in more detail later.
Because regular expressions play an important role in
NetKernel applications, it would be wise to spend time mastering the common
patterns of regular expressions used in NetKernel.
This will boost your ability to comprehend and write NetKernel applications.
Hello World
The hello-world for regular expressions is to create an expression that
matches 'Hello World'.
String: |
Hello World |
Pattern: |
Hello World
|
Pretty silly since naturally one string matches another.
try it
Hello Worlds
Let's make this more meaningful.
Let's create a pattern which matches all strings beginning with
'Hello' - we can do this with the following pattern:
String: |
Hello World |
Pattern: |
Hello.*
|
In this example the "*" character means match
0 or more instances of the previous character (in this case, ".").
The "." symbol means to match any character.
The result of this regular expression is a match of all strings that begin with
"Hello" and end with any number of characters.
try it
To prove that this matches anything beginning 'Hello',
try this
Capturing Groups
String matching becomes even more powerful when the matching
expression specifies how to rewrite one string to another string.
This is used extensively for URI address space mappings.
Regex provides a syntax for specifying 'capturing groups' - these are portions of
the matching string which can be grabbed and used to write new strings.
A capturing group is written as a matched pair of brackets '( )' - any number of
capturing groups can be specified in a pattern.
A captured group can then be placed into a new string. This is done with a dollar-sign '$' followed by the index
number of the captured group to substitute
in - the index is the order in which the group appears
String: |
Hello World |
Pattern: |
Hello(.*)
|
Rewrite: |
Goodbye$1
|
Anything after 'Hello' is captured (the index of this group is 1 as it is the first, and only, capturing group).
The rewrite rule says to create a string
beginning with 'Goodbye' and the '$1' says to append anything that was captured by group 1.
try it
Directing Beanshell Program Requests
Earlier you saw a module match rule that directed requests for
BeanShell programs to the BeanShell runtime.
It is now time to examine this in more detail.
<rule>
<match>(.*\.bsh)(.*)</match>
<to>active:beanshell+operator@$1$2</to>
</rule>
The intent of this rule is to capture all incoming
requests for a BeanShell program and
write the request so that it can be dispatched to
the BeanShell runtime, activated by the URI
"active:beanshell".
Notice that we need to capture both name of the
BeanShell program and any parameters sent to the program.
This is accomplished by using two capture groups.
String: |
ShowMe.bsh+operand@number=3 |
Pattern: |
(.*\.bsh)(.*)
|
Rewrite: |
active:beanshell+operator@$1$2
|
try it
Summary
There are many ways that patterns can be written using the regular expression syntax. Take a look at the
syntax reference
to see the possibilities.
That covers the basics of regular expressions.
The cookbook provides a set of the common recipes which occur most often in NetKernel URI address space
mappings - now go and try some
experiments
.