Package nltk :: Package sem :: Module relextract
Module relextract

Code for extracting relational triples from the ieer and conll2002 corpora.

Relations are stored internally as dictionaries ('reldicts').

The two serialization outputs are rtuple and clause.

Functions
Expand an NE class name.
Abbreviate an NE class name.
_join(lst, sep=' ', untag=False)
Join a list into a string, turning tags tuples into tag strings or just words.
descape_entity(m, defs={'AElig': '\xc6', 'Aacute': '\xc1', 'Acirc': '\xc2', 'Agrave':...)
Translate one entity to its ISO Latin value.
Convert a list of strings into a canonical symbol.
list of tuple
Group a chunk structure into a list of pairs of the form (list(str), Tree)
list of defaultdict
mk_reldicts(pairs, window=5, trace=0)
Converts the pairs generated by mk_pairs into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them.
list of defaultdict
extract_rels(subjclass, objclass, doc, corpus='ieer', pattern=None, window=10)
Filter the output of mk_reldicts according to specified NE classes and a filler pattern.
show_raw_rtuple(reldict, lcon=False, rcon=False)
Pretty print the reldict as an rtuple.
show_clause(reldict, relsym)
Print the relation in clausal form.
in_demo(trace=0) source code
roles_demo(trace=0) source code
ieer_headlines() source code
Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002.
source code
conllesp() source code
Variables
  NE_CLASSES = {'conll2002': ['LOC', 'PER', 'ORG'], 'ieer': ['LO...
  short2long = {'LOC': 'LOCATION', 'ORG': 'ORGANIZATION', 'PER':...
  long2short = {'LOCATION': 'LOC', 'ORGANIZATION': 'ORG', 'PERSO...
Function Details


Expand an NE class name.

  • type (str)
Returns: str


Abbreviate an NE class name.

  • type (str)
Returns: str

_join(lst, sep=' ', untag=False)

Join a list into a string, turning tags tuples into tag strings or just words.

  • untag - if True, omit the tag from tagged input strings.
  • lst (list)
Returns: str

descape_entity(m, defs={'AElig': '\xc6', 'Aacute': '\xc1', 'Acirc': '\xc2', 'Agrave':...)

Translate one entity to its ISO Latin value. Inspired by example from


Convert a list of strings into a canonical symbol.

  • lst (list)
Returns: unicode
a Unicode string without whitespace


Group a chunk structure into a list of pairs of the form (list(str), Tree)

In order to facilitate the construction of (Tree, string, Tree) triples, this identifies pairs whose first member is a list (possibly empty) of terminal strings, and whose second member is a Tree of the form (NE_label, terminals).

  • tree - a chunk tree
Returns: list of tuple
a list of pairs (list(str), Tree)

mk_reldicts(pairs, window=5, trace=0)

Converts the pairs generated by mk_pairs into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).

  • pairs - a pair of list(str) and Tree, as generated by
  • window (int) - a threshold for the number of items to include in the left and right context
Returns: list of defaultdict
'relation' dictionaries whose keys are 'lcon', 'subjclass', 'subjtext', 'subjsym', 'filler', objclass', objtext', 'objsym' and 'rcon'

extract_rels(subjclass, objclass, doc, corpus='ieer', pattern=None, window=10)

Filter the output of mk_reldicts according to specified NE classes and a filler pattern.

The parameters subjclass and objclass can be used to restrict the Named Entities to particular types (any of 'LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE').

  • subjclass (string) - the class of the subject Named Entity.
  • objclass (string) - the class of the object Named Entity.
  • doc (ieer document or a list of chunk trees) - input document
  • corpus (string) - name of the corpus to take as input; possible values are 'ieer' and 'conll2002'
  • pattern (SRE_Pattern) - a regular expression for filtering the fillers of retrieved triples.
  • window (int) - filters out fillers which exceed this threshold
Returns: list of defaultdict
see mk_reldicts

show_raw_rtuple(reldict, lcon=False, rcon=False)

Pretty print the reldict as an rtuple.

  • reldict (defaultdict) - a relation dictionary

show_clause(reldict, relsym)

Print the relation in clausal form.

  • reldict (defaultdict) - a relation dictionary
  • relsym (str) - a label for the relation

Variables Details


{'conll2002': ['LOC', 'PER', 'ORG'],
 'ieer': ['LOCATION',



