Package nltk :: Package chunk :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Classes [hide private]
  ChunkScore
A utility class for scoring chunk parsers.
Functions [hide private]
float
accuracy(chunker, gold)
Score the accuracy of the chunker against the gold standard.
source code
 
_chunksets(t, count, chunk_node) source code
tree
tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree.
source code
Tree
conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')
Convert a CoNLL IOB string into a tree.
source code
list of tuple
tree2conlltags(t)
Convert a tree to the CoNLL IOB tag format
source code
string
tree2conllstr(t)
Convert a tree to the CoNLL IOB string format
source code
 
_ieer_read_text(s, top_node) source code
Tree
ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')
Convert a string of chunked tagged text in the IEER named entity format into a chunk structure.
source code
 
demo() source code
Variables [hide private]
  _LINE_RE = re.compile(r'(\S+)\s+(\S+)\s+([IOB])-?(\S+)?')
  _IEER_DOC_RE = re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>....
  _IEER_TYPE_RE = re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+...
Function Details [hide private]

accuracy(chunker, gold)

source code 

Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.

Parameters:
  • chunker (ChunkParserI) - The chunker being evaluated.
  • gold (tree) - The chunk structures to score the chunker on.
Returns: float

tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')

source code 

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

Parameters:
  • s (string) - The string to be converted
  • chunk_node (string) - The label to use for chunk nodes
  • top_node (string) - The label to use for the root of the tree
Returns: tree
A tree corresponding to the string representation.

conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')

source code 

Convert a CoNLL IOB string into a tree. Uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

Parameters:
  • s (string) - The CoNLL string to be converted.
  • chunk_types (tuple) - The chunk types to be converted.
  • top_node (string) - The node label to use for the root.
Returns: Tree
A chunk structure for a single sentence encoded in the given CONLL 2000 style string.

tree2conlltags(t)

source code 

Convert a tree to the CoNLL IOB tag format

Parameters:
  • t (Tree) - The tree to be converted.
Returns: list of tuple
A list of 3-tuples containing word, tag and IOB tag.

tree2conllstr(t)

source code 

Convert a tree to the CoNLL IOB string format

Parameters:
  • t (Tree) - The tree to be converted.
Returns: string
A multiline string where each line contains a word, tag and IOB tag.

ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')

source code 

Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Returns: Tree
A chunk structure containing the chunked tagged text that is encoded in the given IEER style string.

Variables Details [hide private]

_IEER_DOC_RE

Value:
re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>.+?)\s*</DOCNO>\s*)?(<DO\
CTYPE>\s*(?P<doctype>.+?)\s*</DOCTYPE>\s*)?(<DATE_TIME>\s*(?P<date_tim\
e>.+?)\s*</DATE_TIME>\s*)?<BODY>\s*(<HEADLINE>\s*(?P<headline>.+?)\s*<\
/HEADLINE>\s*)?<TEXT>(?P<text>.*?)</TEXT>\s*</BODY>\s*</DOC>\s*')

_IEER_TYPE_RE

Value:
re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+)"')