Package nltk :: Package chunk :: Module regexp
[hide private]
[frames] | no frames]

Module regexp

source code

Classes [hide private]
  ChunkString
A string-based encoding of a particular chunking of a text.
  RegexpChunkRule
A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression.
  ChunkRule
A rule specifying how to add chunks to a ChunkString, using a matching tag pattern.
  ChinkRule
A rule specifying how to remove chinks to a ChunkString, using a matching tag pattern.
  UnChunkRule
A rule specifying how to remove chunks to a ChunkString, using a matching tag pattern.
  MergeRule
A rule specifying how to merge chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern.
  SplitRule
A rule specifying how to split chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern.
  ExpandLeftRule
A rule specifying how to expand chunks in a ChunkString to the left, using two matching tag patterns: a left pattern, and a right pattern.
  ExpandRightRule
A rule specifying how to expand chunks in a ChunkString to the right, using two matching tag patterns: a left pattern, and a right pattern.
  ChunkRuleWithContext
A rule specifying how to add chunks to a ChunkString, using three matching tag patterns: one for the left context, one for the chunk, and one for the right context.
  RegexpChunkParser
A regular expression based chunk parser.
  RegexpParser
A grammar based chunk parser.
Functions [hide private]
string
tag_pattern2re_pattern(tag_pattern)
Convert a tag pattern to a regular expression pattern.
source code
 
demo_eval(chunkparser, text)
Demonstration code for evaluating a chunk parser, using a ChunkScore.
source code
 
demo()
A demonstration for the RegexpChunkParser class.
source code
Variables [hide private]
  CHUNK_TAG_PATTERN = re.compile(r'^(([^\{\}<>]+|<[^\{\}<>]+>)*)$')
Function Details [hide private]

tag_pattern2re_pattern(tag_pattern)

source code 

Convert a tag pattern to a regular expression pattern. A tag pattern is a modified version of a regular expression, designed for matching sequences of tags. The differences between regular expression patterns and tag patterns are:

  • In tag patterns, '<' and '>' act as parentheses; so '<NN>+' matches one or more repetitions of '<NN>', not '<NN' followed by one or more repetitions of '>'.
  • Whitespace in tag patterns is ignored. So '<DT> | <NN>' is equivalant to '<DT>|<NN>'
  • In tag patterns, '.' is equivalant to '[^{}<>]'; so '<NN.*>' matches any single tag starting with 'NN'.

In particular, tag_pattern2re_pattern performs the following transformations on the given pattern:

  • Replace '.' with '[^<>{}]'
  • Remove any whitespace
  • Add extra parens around '<' and '>', to make '<' and '>' act like parentheses. E.g., so that in '<NN>+', the '+' has scope over the entire '<NN>'; and so that in '<NN|IN>', the '|' has scope over 'NN' and 'IN', but not '<' or '>'.
  • Check to make sure the resulting pattern is valid.
Parameters:
  • tag_pattern (string) - The tag pattern to convert to a regular expression pattern.
Returns: string
A regular expression pattern corresponding to tag_pattern.
Raises:
  • ValueError - If tag_pattern is not a valid tag pattern. In particular, tag_pattern should not include braces; and it should not contain nested or mismatched angle-brackets.

demo_eval(chunkparser, text)

source code 

Demonstration code for evaluating a chunk parser, using a ChunkScore. This function assumes that text contains one sentence per line, and that each sentence has the form expected by tree.chunk. It runs the given chunk parser on each sentence in the text, and scores the result. It prints the final score (precision, recall, and f-measure); and reports the set of chunks that were missed and the set of chunks that were incorrect. (At most 10 missing chunks and 10 incorrect chunks are reported).

Parameters:
  • chunkparser (ChunkParserI) - The chunkparser to be tested
  • text (string) - The chunked tagged text that should be used for evaluation.

demo()

source code 

A demonstration for the RegexpChunkParser class. A single text is parsed with four different chunk parsers, using a variety of rules and strategies.