Package nltk :: Package chunk :: Module util
Module util

Classes
A utility class for scoring chunk parsers.
Functions
accuracy(chunker, gold)
Score the accuracy of the chunker against the gold standard.
_chunksets(t, count, chunk_node) source code
tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree.
conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')
Convert a CoNLL IOB string into a tree.
list of tuple
Convert a tree to the CoNLL IOB tag format
Convert a tree to the CoNLL IOB string format
_ieer_read_text(s, top_node) source code
ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')
Convert a string of chunked tagged text in the IEER named entity format into a chunk structure.
Variables
  _LINE_RE = re.compile(r'(\S+)\s+(\S+)\s+([IOB])-?(\S+)?')
  _IEER_DOC_RE = re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>....
  _IEER_TYPE_RE = re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+...
Function Details

accuracy(chunker, gold)

Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.

  • chunker (ChunkParserI) - The chunker being evaluated.
  • gold (tree) - The chunk structures to score the chunker on.
Returns: float

tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

  • s (string) - The string to be converted
  • chunk_node (string) - The label to use for chunk nodes
  • top_node (string) - The label to use for the root of the tree
Returns: tree
A tree corresponding to the string representation.

conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')

Convert a CoNLL IOB string into a tree. Uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

  • s (string) - The CoNLL string to be converted.
  • chunk_types (tuple) - The chunk types to be converted.
  • top_node (string) - The node label to use for the root.
Returns: Tree
A chunk structure for a single sentence encoded in the given CONLL 2000 style string.


Convert a tree to the CoNLL IOB tag format

  • t (Tree) - The tree to be converted.
Returns: list of tuple
A list of 3-tuples containing word, tag and IOB tag.


Convert a tree to the CoNLL IOB string format

  • t (Tree) - The tree to be converted.
Returns: string
A multiline string where each line contains a word, tag and IOB tag.

ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')

Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Returns: Tree
A chunk structure containing the chunked tagged text that is encoded in the given IEER style string.

Variables Details




