Package nltk :: Package sem :: Module relextract
[hide private]
[frames] | no frames]

Module relextract

source code

Code for extracting relational triples from the ieer and conll2002 corpora.

Relations are stored internally as dictionaries ('reldicts').

The two serialization outputs are rtuple and clause.

Functions [hide private]
str
_expand(type)
Expand an NE class name.
source code
str
class_abbrev(type)
Abbreviate an NE class name.
source code
str
_join(lst, sep=' ', untag=False)
Join a list into a string, turning tags tuples into tag strings or just words.
source code
 
descape_entity(m, defs={'AElig': '\xc6', 'Aacute': '\xc1', 'Acirc': '\xc2', 'Agrave':...)
Translate one entity to its ISO Latin value.
source code
unicode
list2sym(lst)
Convert a list of strings into a canonical symbol.
source code
list of tuple
mk_pairs(tree)
Group a chunk structure into a list of pairs of the form (list(str), Tree)
source code
list of defaultdict
mk_reldicts(pairs, window=5, trace=0)
Converts the pairs generated by mk_pairs into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them.
source code
list of defaultdict
extract_rels(subjclass, objclass, doc, corpus='ieer', pattern=None, window=10)
Filter the output of mk_reldicts according to specified NE classes and a filler pattern.
source code
 
show_raw_rtuple(reldict, lcon=False, rcon=False)
Pretty print the reldict as an rtuple.
source code
 
show_clause(reldict, relsym)
Print the relation in clausal form.
source code
 
in_demo(trace=0) source code
 
roles_demo(trace=0) source code
 
ieer_headlines() source code
 
conllned(trace=1)
Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002.
source code
 
conllesp() source code
Variables [hide private]
  NE_CLASSES = {'conll2002': ['LOC', 'PER', 'ORG'], 'ieer': ['LO...
  short2long = {'LOC': 'LOCATION', 'ORG': 'ORGANIZATION', 'PER':...
  long2short = {'LOCATION': 'LOC', 'ORGANIZATION': 'ORG', 'PERSO...
Function Details [hide private]

_expand(type)

source code 

Expand an NE class name.

Parameters:
  • type (str)
Returns: str

class_abbrev(type)

source code 

Abbreviate an NE class name.

Parameters:
  • type (str)
Returns: str

_join(lst, sep=' ', untag=False)

source code 

Join a list into a string, turning tags tuples into tag strings or just words.

Parameters:
  • untag - if True, omit the tag from tagged input strings.
  • lst (list)
Returns: str

descape_entity(m, defs={'AElig': '\xc6', 'Aacute': '\xc1', 'Acirc': '\xc2', 'Agrave':...)

source code 

Translate one entity to its ISO Latin value. Inspired by example from effbot.org

list2sym(lst)

source code 

Convert a list of strings into a canonical symbol.

Parameters:
  • lst (list)
Returns: unicode
a Unicode string without whitespace

mk_pairs(tree)

source code 

Group a chunk structure into a list of pairs of the form (list(str), Tree)

In order to facilitate the construction of (Tree, string, Tree) triples, this identifies pairs whose first member is a list (possibly empty) of terminal strings, and whose second member is a Tree of the form (NE_label, terminals).

Parameters:
  • tree - a chunk tree
Returns: list of tuple
a list of pairs (list(str), Tree)

mk_reldicts(pairs, window=5, trace=0)

source code 

Converts the pairs generated by mk_pairs into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).

Parameters:
  • pairs - a pair of list(str) and Tree, as generated by
  • window (int) - a threshold for the number of items to include in the left and right context
Returns: list of defaultdict
'relation' dictionaries whose keys are 'lcon', 'subjclass', 'subjtext', 'subjsym', 'filler', objclass', objtext', 'objsym' and 'rcon'

extract_rels(subjclass, objclass, doc, corpus='ieer', pattern=None, window=10)

source code 

Filter the output of mk_reldicts according to specified NE classes and a filler pattern.

The parameters subjclass and objclass can be used to restrict the Named Entities to particular types (any of 'LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE').

Parameters:
  • subjclass (string) - the class of the subject Named Entity.
  • objclass (string) - the class of the object Named Entity.
  • doc (ieer document or a list of chunk trees) - input document
  • corpus (string) - name of the corpus to take as input; possible values are 'ieer' and 'conll2002'
  • pattern (SRE_Pattern) - a regular expression for filtering the fillers of retrieved triples.
  • window (int) - filters out fillers which exceed this threshold
Returns: list of defaultdict
see mk_reldicts

show_raw_rtuple(reldict, lcon=False, rcon=False)

source code 

Pretty print the reldict as an rtuple.

Parameters:
  • reldict (defaultdict) - a relation dictionary

show_clause(reldict, relsym)

source code 

Print the relation in clausal form.

Parameters:
  • reldict (defaultdict) - a relation dictionary
  • relsym (str) - a label for the relation

Variables Details [hide private]

NE_CLASSES

Value:
{'conll2002': ['LOC', 'PER', 'ORG'],
 'ieer': ['LOCATION',
          'ORGANIZATION',
          'PERSON',
          'DURATION',
          'DATE',
          'CARDINAL',
          'PERCENT',
...

short2long

Value:
{'LOC': 'LOCATION', 'ORG': 'ORGANIZATION', 'PER': 'PERSON'}

long2short

Value:
{'LOCATION': 'LOC', 'ORGANIZATION': 'ORG', 'PERSON': 'PER'}