Package nltk :: Package chunk :: Module regexp :: Class RegexpChunkRule
[hide private]
[frames] | no frames]

Class RegexpChunkRule

source code

object --+
         |
        RegexpChunkRule
Known Subclasses:

A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression. The RegexpChunkRule class itself can be used to implement any transformational rule based on regular expressions. There are also a number of subclasses, which can be used to implement simpler types of rules, based on matching regular expressions.

Each RegexpChunkRule has a regular expression and a replacement expression. When a RegexpChunkRule is applied to a ChunkString, it searches the ChunkString for any substring that matches the regular expression, and replaces it using the replacement expression. This search/replace operation has the same semantics as re.sub.

Each RegexpChunkRule also has a description string, which gives a short (typically less than 75 characters) description of the purpose of the rule.

This transformation defined by this RegexpChunkRule should only add and remove braces; it should not modify the sequence of angle-bracket delimited tags. Furthermore, this transformation may not result in nested or mismatched bracketing.

Instance Methods [hide private]
 
__init__(self, regexp, repl, descr)
Construct a new RegexpChunkRule.
source code
None
apply(self, chunkstr)
Apply this rule to the given ChunkString.
source code
string
descr(self)
Returns: a short description of the purpose and/or effect of this rule.
source code
string
__repr__(self)
Returns: A string representation of this rule.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Static Methods [hide private]
 
parse(s)
Create a RegexpChunkRule from a string description.
source code
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, regexp, repl, descr)
(Constructor)

source code 

Construct a new RegexpChunkRule.

Parameters:
  • regexp (regexp or string) - This RegexpChunkRule's regular expression. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using the replacement string repl. Note that this must be a normal regular expression, not a tag pattern.
  • repl (string) - This RegexpChunkRule's replacement expression. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using repl.
  • descr (string) - A short description of the purpose and/or effect of this rule.
Overrides: object.__init__

apply(self, chunkstr)

source code 

Apply this rule to the given ChunkString. See the class reference documentation for a description of what it means to apply a rule.

Parameters:
  • chunkstr (ChunkString) - The chunkstring to which this rule is applied.
Returns: None
Raises:
  • ValueError - If this transformation generated an invalid chunkstring.

descr(self)

source code 
Returns: string
a short description of the purpose and/or effect of this rule.

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
A string representation of this rule. This string representation has the form:
   <RegexpChunkRule: '{<IN|VB.*>}'->'<IN>'>

Note that this representation does not include the description string; that string can be accessed separately with the descr method.

Overrides: object.__repr__

parse(s)
Static Method

source code 

Create a RegexpChunkRule from a string description. Currently, the following formats are supported:

 {regexp}         # chunk rule
 }regexp{         # chink rule
 regexp}{regexp   # split rule
 regexp{}regexp   # merge rule

Where regexp is a regular expression for the rule. Any text following the comment marker (#) will be used as the rule's description:

>>> RegexpChunkRule.parse('{<DT>?<NN.*>+}')
<ChunkRule: '<DT>?<NN.*>+'>