Package nltk :: Package chunk :: Module util

[frames] | no frames]

Module util

source code

Classes

[hide private]

ChunkScore
A utility class for scoring chunk parsers.

Functions

[hide private]

float

accuracy(chunker, gold)
Score the accuracy of the chunker against the gold standard.

source code

_chunksets(t, count, chunk_node)

source code

tree

tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')
Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. source code

Tree

conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')
Convert a CoNLL IOB string into a tree. source code

list of tuple

tree2conlltags(t)
Convert a tree to the CoNLL IOB tag format

source code

string

tree2conllstr(t)
Convert a tree to the CoNLL IOB string format

source code

_ieer_read_text(s, top_node)

source code

Tree

ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')
Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. source code

demo()

source code

Variables

[hide private]

_LINE_RE = re.compile(r'(\S+)\s+(\S+)\s+([IOB])-?(\S+)?')

_IEER_DOC_RE = re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>....

_IEER_TYPE_RE = re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+...

Function Details

[hide private]

accuracy(chunker, gold)

source code

Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.

Parameters:

chunker (ChunkParserI) - The chunker being evaluated.
gold (tree) - The chunk structures to score the chunker on.

Returns: float

tagstr2tree(s, chunk_node=`'NP'`, top_node=`'S'`, sep=`'/'`)

source code

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

Parameters:

s (string) - The string to be converted
chunk_node (string) - The label to use for chunk nodes
top_node (string) - The label to use for the root of the tree

Returns: tree

A tree corresponding to the string representation.

conllstr2tree(s, chunk_types=`('NP',` `'PP',` `'VP')`, top_node=`'S'`)

source code

Convert a CoNLL IOB string into a tree. Uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

Parameters:

s (string) - The CoNLL string to be converted.
chunk_types (tuple) - The chunk types to be converted.
top_node (string) - The node label to use for the root.

Returns: Tree

A chunk structure for a single sentence encoded in the given CONLL 2000 style string.

tree2conlltags(t)

source code

Convert a tree to the CoNLL IOB tag format

Parameters:

t (Tree) - The tree to be converted.

Returns: list of tuple

A list of 3-tuples containing word, tag and IOB tag.

tree2conllstr(t)

source code

Convert a tree to the CoNLL IOB string format

Parameters:

t (Tree) - The tree to be converted.

Returns: string

A multiline string where each line contains a word, tag and IOB tag.

ieerstr2tree(s, chunk_types=`['LOCATION',` `'ORGANIZATION',` `'PERSON',` `'DURATION',` `'DATE',` `'CA...`, top_node=`'S'`)

source code

Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Returns: Tree: A chunk structure containing the chunked tagged text that is encoded in the given IEER style string.

Variables Details

[hide private]

_IEER_DOC_RE

Value:

re.compile(r'(?s)<DOC>\s*(<DOCNO>\s*(?P<docno>.+?)\s*</DOCNO>\s*)?(<DO
CTYPE>\s*(?P<doctype>.+?)\s*</DOCTYPE>\s*)?(<DATE_TIME>\s*(?P<date_tim
e>.+?)\s*</DATE_TIME>\s*)?<BODY>\s*(<HEADLINE>\s*(?P<headline>.+?)\s*<
/HEADLINE>\s*)?<TEXT>(?P<text>.*?)</TEXT>\s*</BODY>\s*</DOC>\s*')

_IEER_TYPE_RE

Value:

re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+)"')

Module util

accuracy(chunker, gold)

tagstr2tree(s, chunk_node='NP', top_node='S', sep='/')

conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), top_node='S')

tree2conlltags(t)

tree2conllstr(t)

ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CA..., top_node='S')

_IEER_DOC_RE

_IEER_TYPE_RE

tagstr2tree(s, chunk_node=`'NP'`, top_node=`'S'`, sep=`'/'`)

conllstr2tree(s, chunk_types=`('NP',` `'PP',` `'VP')`, top_node=`'S'`)

ieerstr2tree(s, chunk_types=`['LOCATION',` `'ORGANIZATION',` `'PERSON',` `'DURATION',` `'DATE',` `'CA...`, top_node=`'S'`)