A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression. The RegexpChunkRule class itself can be used to implement any transformational rule based on regular expressions. There are also a number of subclasses, which can be used to implement simpler types of rules, based on matching regular expressions.

Each RegexpChunkRule has a regular expression and a replacement expression. When a RegexpChunkRule is applied to a ChunkString, it searches the ChunkString for any substring that matches the regular expression, and replaces it using the replacement expression. This search/replace operation has the same semantics as re.sub.

Each RegexpChunkRule also has a description string, which gives a short (typically less than 75 characters) description of the purpose of the rule.

This transformation defined by this RegexpChunkRule should only add and remove braces; it should not modify the sequence of angle-bracket delimited tags. Furthermore, this transformation may not result in nested or mismatched bracketing.

__init__(self, regexp, repl, descr)
Construct a new RegexpChunkRule.
apply(self, chunkstr)
Apply this rule to the given ChunkString.
Returns: a short description of the purpose and/or effect of this rule.
Returns: A string representation of this rule.
Create a RegexpChunkRule from a string description.
__init__(self, regexp, repl, descr)

Construct a new RegexpChunkRule.

  • regexp (regexp or string) - This RegexpChunkRule's regular expression. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using the replacement string repl. Note that this must be a normal regular expression, not a tag pattern.
  • repl (string) - This RegexpChunkRule's replacement expression. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using repl.
  • descr (string) - A short description of the purpose and/or effect of this rule.
apply(self, chunkstr)

Apply this rule to the given ChunkString. See the class reference documentation for a description of what it means to apply a rule.

  • chunkstr (ChunkString) - The chunkstring to which this rule is applied.
Returns: None
  • ValueError - If this transformation generated an invalid chunkstring.


Returns: string
a short description of the purpose and/or effect of this rule.

Returns: string
A string representation of this rule. This string representation has the form:
   <RegexpChunkRule: '{<IN|VB.*>}'->'<IN>'>

Note that this representation does not include the description string; that string can be accessed separately with the descr method.

Create a RegexpChunkRule from a string description. Currently, the following formats are supported:

 {regexp}         # chunk rule
 }regexp{         # chink rule
 regexp}{regexp   # split rule
 regexp{}regexp   # merge rule

Where regexp is a regular expression for the rule. Any text following the comment marker (#) will be used as the rule's description:

>>> RegexpChunkRule.parse('{<DT>?<NN.*>+}')
<ChunkRule: '<DT>?<NN.*>+'>