Package nltk :: Module tree :: Class Tree
[hide private]
[frames] | no frames]

Class Tree

source code

object --+    
         |    
      list --+
             |
            Tree
Known Subclasses:

A hierarchical structure.

Each Tree represents a single hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree.

A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.

Any other properties that a Tree defines are known as node properties, and are used to add information about individual hierarchical groupings. For example, syntax trees use a NODE property to label syntactic constituents with phrase tags, such as "NP" and"VP".

Several Tree methods use tree positions to specify children or descendants of a tree. Tree positions are defined as follows:

I.e., every tree position is either a single index i, specifying self[i]; or a sequence (i1, i2, ..., iN), specifying self[i1][i2]...[iN].

Instance Methods [hide private]
new list
__init__(self, node_or_str, children=None)
Construct a new tree.
source code
 
__eq__(self, other)
x==y
source code
 
__ne__(self, other)
x!=y
source code
 
__lt__(self, other)
x<y
source code
 
__le__(self, other)
x<=y
source code
 
__gt__(self, other)
x>y
source code
 
__ge__(self, other)
x>=y
source code
 
__mul__(self, v)
x*n
source code
 
__rmul__(self, v)
n*x
source code
 
__add__(self, v)
x+y
source code
 
__radd__(self, v) source code
 
__getitem__(self, index)
x[y]
source code
 
__setitem__(self, index, value)
x[i]=y
source code
 
__delitem__(self, index)
del x[y]
source code
list
leaves(self)
Returns: a list containing this tree's leaves.
source code
Tree
flatten(self)
Returns: a tree consisting of this tree's root connected directly to its leaves, omitting all intervening non-terminal nodes.
source code
int
height(self)
Returns: The height of this tree.
source code
 
treepositions(self, order='preorder') source code
 
subtrees(self, filter=None)
Generate all the subtrees of this tree, optionally restricted to trees matching the filter function.
source code
list of Productions
productions(self)
Generate the productions that correspond to the non-terminal nodes of the tree.
source code
list of tuples
pos(self)
Returns: a list of tuples containing leaves and pre-terminals (part-of-speech tags).
source code
 
leaf_treeposition(self, index)
Returns: The tree position of the index-th leaf in this tree.
source code
 
treeposition_spanning_leaves(self, start, end)
Returns: The tree position of the lowest descendant of this tree that dominates self.leaves()[start:end].
source code
 
chomsky_normal_form(self, factor='right', horzMarkov=None, vertMarkov=0, childChar='|', parentChar='^')
This method can modify a tree in three ways:
source code
 
un_chomsky_normal_form(self, expandUnary=True, childChar='|', parentChar='^', unaryChar='+')
This method modifies the tree in three ways:
source code
 
collapse_unary(self, collapsePOS=False, collapseRoot=False, joinChar='+')
Collapse subtrees with a single child (ie.
source code
 
copy(self, deep=False) source code
 
_frozen_class(self) source code
 
freeze(self, leaf_freezer=None) source code
 
draw(self)
Open a new window containing a graphical diagram of this tree.
source code
 
__repr__(self)
repr(x)
source code
 
__str__(self)
str(x)
source code
string
pprint(self, margin=70, indent=0, nodesep='', parens='()', quotes=False)
Returns: A pretty-printed string representation of this tree.
source code
string
pprint_latex_qtree(self)
Returns a representation of the tree compatible with the LaTeX qtree package.
source code
 
_pprint_flat(self, nodesep, parens, quotes) source code

Inherited from list: __contains__, __delslice__, __getattribute__, __getslice__, __hash__, __iadd__, __imul__, __iter__, __len__, __reversed__, __setslice__, append, count, extend, index, insert, pop, remove, reverse, sort

Inherited from object: __delattr__, __reduce__, __reduce_ex__, __setattr__

Class Methods [hide private]
 
convert(cls, val)
Convert a tree between different subtypes of Tree.
source code
Tree
parse(cls, s, brackets='()', parse_node=None, parse_leaf=None, node_pattern=None, leaf_pattern=None, remove_empty_top_bracketing=False)
Parse a bracketed tree string and return the resulting tree.
source code
 
_parse_error(cls, s, match, expecting)
Display a friendly error message when parsing a tree string fails.
source code
Static Methods [hide private]
a new object with type S, a subtype of T
__new__(cls, node_or_str=None, children=None) source code
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__new__(cls, node_or_str=None, children=None)
Static Method

source code 
Returns: a new object with type S, a subtype of T
Overrides: list.__new__
(inherited documentation)

__init__(self, node_or_str, children=None)
(Constructor)

source code 

Construct a new tree. This constructor can be called in one of two ways:

  • Tree(node, children) constructs a new tree with the specified node value and list of children.
  • Tree(s) constructs a new tree by parsing the string s. It is equivalent to calling the class method Tree.parse(s).
Returns: new list
Overrides: list.__init__

__eq__(self, other)
(Equality operator)

source code 

x==y

Overrides: list.__eq__
(inherited documentation)

__ne__(self, other)

source code 

x!=y

Overrides: list.__ne__
(inherited documentation)

__lt__(self, other)
(Less-than operator)

source code 

x<y

Overrides: list.__lt__
(inherited documentation)

__le__(self, other)
(Less-than-or-equals operator)

source code 

x<=y

Overrides: list.__le__
(inherited documentation)

__gt__(self, other)
(Greater-than operator)

source code 

x>y

Overrides: list.__gt__
(inherited documentation)

__ge__(self, other)
(Greater-than-or-equals operator)

source code 

x>=y

Overrides: list.__ge__
(inherited documentation)

__mul__(self, v)

source code 

x*n

Overrides: list.__mul__
(inherited documentation)

__rmul__(self, v)

source code 

n*x

Overrides: list.__rmul__
(inherited documentation)

__add__(self, v)
(Addition operator)

source code 

x+y

Overrides: list.__add__
(inherited documentation)

__getitem__(self, index)
(Indexing operator)

source code 

x[y]

Overrides: list.__getitem__
(inherited documentation)

__setitem__(self, index, value)
(Index assignment operator)

source code 

x[i]=y

Overrides: list.__setitem__
(inherited documentation)

__delitem__(self, index)
(Index deletion operator)

source code 

del x[y]

Overrides: list.__delitem__
(inherited documentation)

leaves(self)

source code 
Returns: list
a list containing this tree's leaves. The order reflects the order of the leaves in the tree's hierarchical structure.

flatten(self)

source code 
Returns: Tree
a tree consisting of this tree's root connected directly to its leaves, omitting all intervening non-terminal nodes.

height(self)

source code 
Returns: int
The height of this tree. The height of a tree containing no children is 1; the height of a tree containing only leaves is 2; and the height of any other tree is one plus the maximum of its children's heights.

treepositions(self, order='preorder')

source code 
Parameters:
  • order - One of: preorder, postorder, bothorder, leaves.

subtrees(self, filter=None)

source code 

Generate all the subtrees of this tree, optionally restricted to trees matching the filter function.

Parameters:
  • filter (function) - the function to filter all local trees

productions(self)

source code 

Generate the productions that correspond to the non-terminal nodes of the tree. For each subtree of the form (P: C1 C2 ... Cn) this produces a production of the form P -> C1 C2 ... Cn.

Returns: list of Productions

pos(self)

source code 
Returns: list of tuples
a list of tuples containing leaves and pre-terminals (part-of-speech tags). The order reflects the order of the leaves in the tree's hierarchical structure.

leaf_treeposition(self, index)

source code 
Returns:
The tree position of the index-th leaf in this tree. I.e., if tp=self.leaf_treeposition(i), then self[tp]==self.leaves()[i].
Raises:
  • IndexError - If this tree contains fewer than index+1 leaves, or if index<0.

treeposition_spanning_leaves(self, start, end)

source code 
Returns:
The tree position of the lowest descendant of this tree that dominates self.leaves()[start:end].
Raises:
  • ValueError - if end <= start

chomsky_normal_form(self, factor='right', horzMarkov=None, vertMarkov=0, childChar='|', parentChar='^')

source code 

This method can modify a tree in three ways:

  1. Convert a tree into its Chomsky Normal Form (CNF) equivalent -- Every subtree has either two non-terminals or one terminal as its children. This process requires the creation of more"artificial" non-terminal nodes.
  2. Markov (vertical) smoothing of children in new artificial nodes
  3. Horizontal (parent) annotation of nodes
Parameters:
  • factor (string = [left|right]) - Right or left factoring method (default = "right")
  • horzMarkov (int | None) - Markov order for sibling smoothing in artificial nodes (None (default) = include all siblings)
  • vertMarkov (int | None) - Markov order for parent smoothing (0 (default) = no vertical annotation)
  • childChar (string) - A string used in construction of the artificial nodes, separating the head of the original subtree from the child nodes that have yet to be expanded (default = "|")
  • parentChar (string) - A string used to separate the node representation from its vertical annotation

un_chomsky_normal_form(self, expandUnary=True, childChar='|', parentChar='^', unaryChar='+')

source code 

This method modifies the tree in three ways:

  1. Transforms a tree in Chomsky Normal Form back to its original structure (branching greater than two)
  2. Removes any parent annotation (if it exists)
  3. (optional) expands unary subtrees (if previously collapsed with collapseUnary(...) )
Parameters:
  • expandUnary (boolean) - Flag to expand unary or not (default = True)
  • childChar (string) - A string separating the head node from its children in an artificial node (default = "|")
  • parentChar (string) - A sting separating the node label from its parent annotation (default = "^")
  • unaryChar (string) - A string joining two non-terminals in a unary production (default = "+")

collapse_unary(self, collapsePOS=False, collapseRoot=False, joinChar='+')

source code 

Collapse subtrees with a single child (ie. unary productions) into a new non-terminal (Tree node) joined by 'joinChar'. This is useful when working with algorithms that do not allow unary productions, and completely removing the unary productions would require loss of useful information. The Tree is modified directly (since it is passed by reference) and no value is returned.

Parameters:
  • collapsePOS (boolean) - 'False' (default) will not collapse the parent of leaf nodes (ie. Part-of-Speech tags) since they are always unary productions
  • collapseRoot (boolean) - 'False' (default) will not modify the root production if it is unary. For the Penn WSJ treebank corpus, this corresponds to the TOP -> productions.
  • joinChar (string) - A string used to connect collapsed node values (default = "+")

convert(cls, val)
Class Method

source code 

Convert a tree between different subtypes of Tree. cls determines which class will be used to encode the new tree.

Parameters:
  • val (Tree) - The tree that should be converted.
Returns:
The new Tree.

parse(cls, s, brackets='()', parse_node=None, parse_leaf=None, node_pattern=None, leaf_pattern=None, remove_empty_top_bracketing=False)
Class Method

source code 

Parse a bracketed tree string and return the resulting tree. Trees are represented as nested brackettings, such as:

 (S (NP (NNP John)) (VP (V runs)))
Parameters:
  • s (str) - The string to parse
  • brackets (length-2 str) - The bracket characters used to mark the beginning and end of trees and subtrees.
  • parse_node (function), parse_leaf (function) - If specified, these functions are applied to the substrings of s corresponding to nodes and leaves (respectively) to obtain the values for those nodes and leaves. They should have the following signature:
    >>> parse_node(str) -> value

    For example, these functions could be used to parse nodes and leaves whose values should be some type other than string (such as FeatStruct). Note that by default, node strings and leaf strings are delimited by whitespace and brackets; to override this default, use the node_pattern and leaf_pattern arguments.

  • node_pattern (str), leaf_pattern (str) - Regular expression patterns used to find node and leaf substrings in s. By default, both nodes patterns are defined to match any sequence of non-whitespace non-bracket characters.
  • remove_empty_top_bracketing (bool) - If the resulting tree has an empty node label, and is length one, then return its single child instead. This is useful for treebank trees, which sometimes contain an extra level of bracketing.
Returns: Tree
A tree corresponding to the string representation s. If this class method is called using a subclass of Tree, then it will return a tree of that type.

_parse_error(cls, s, match, expecting)
Class Method

source code 

Display a friendly error message when parsing a tree string fails.

Parameters:
  • s - The string we're parsing.
  • match - regexp match of the problem token.
  • expecting - what we expected to see instead.

__repr__(self)
(Representation operator)

source code 

repr(x)

Overrides: list.__repr__
(inherited documentation)

__str__(self)
(Informal representation operator)

source code 

str(x)

Overrides: object.__str__
(inherited documentation)

pprint(self, margin=70, indent=0, nodesep='', parens='()', quotes=False)

source code 
Parameters:
  • margin (int) - The right margin at which to do line-wrapping.
  • indent (int) - The indentation level at which printing begins. This number is used to decide how far to indent subsequent lines.
  • nodesep - A string that is used to separate the node from the children. E.g., the default value ':' gives trees like (S: (NP: I) (VP: (V: saw) (NP: it))).
Returns: string
A pretty-printed string representation of this tree.

pprint_latex_qtree(self)

source code 

Returns a representation of the tree compatible with the LaTeX qtree package. This consists of the string \Tree followed by the parse tree represented in bracketed notation.

For example, the following result was generated from a parse tree of the sentence The announcement astounded us:

 \Tree [.I'' [.N'' [.D The ] [.N' [.N announcement ] ] ]
     [.I' [.V'' [.V' [.V astounded ] [.N'' [.N' [.N us ] ] ] ] ] ] ]

See http://www.ling.upenn.edu/advice/latex.html for the LaTeX style file for the qtree package.

Returns: string
A latex qtree representation of this tree.