nltk.parse package¶

Submodules¶

nltk.parse.api module¶

class nltk.parse.api.ParserI[source]¶

Bases: object

A processing class for deriving trees that represent possible structures for a sequence of tokens. These tree structures are known as “parses”. Typically, parsers are used to derive syntax trees for sentences. But parsers can also be used to derive other kinds of tree structure, such as morphological trees and discourse structures.

Subclasses must define:

at least one of: parse(), parse_sents().

Subclasses may define:

grammar()

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(sent, *args, **kwargs)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

parse_all(sent, *args, **kwargs)[source]¶

Return type:	list(Tree)

parse_one(sent, *args, **kwargs)[source]¶

Return type:	Tree or None

parse_sents(sents, *args, **kwargs)[source]¶: Apply self.parse() to each element of sents. :rtype: iter(iter(Tree))

nltk.parse.bllip module¶

class nltk.parse.bllip.BllipParser(parser_model=None, reranker_features=None, reranker_weights=None, parser_options=None, reranker_options=None)[source]¶

Bases: nltk.parse.api.ParserI

Interface for parsing with BLLIP Parser. BllipParser objects can be constructed with the BllipParser.from_unified_model_dir class method or manually using the BllipParser constructor.

classmethod from_unified_model_dir(model_dir, parser_options=None, reranker_options=None)[source]¶

Create a BllipParser object from a unified parsing model directory. Unified parsing model directories are a standardized way of storing BLLIP parser and reranker models together on disk. See bllipparser.RerankingParser.get_unified_model_parameters() for more information about unified model directories.

Returns:	A `BllipParser` object using the parser and reranker

models in the model directory.

Parameters:	model_dir (str) – Path to the unified model directory. parser_options – optional dictionary of parser options, see

bllipparser.RerankingParser.RerankingParser.load_parser_options() for more information. :type parser_options: dict(str) :param reranker_options: optional dictionary of reranker options, see bllipparser.RerankingParser.RerankingParser.load_reranker_model() for more information. :type reranker_options: dict(str) :rtype: BllipParser

parse(sentence)[source]¶

Use BLLIP Parser to parse a sentence. Takes a sentence as a list of words; it will be automatically tagged with this BLLIP Parser instance’s tagger.

Returns:	An iterator that generates parse trees for the sentence

from most likely to least likely.

Parameters:	sentence (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

tagged_parse(word_and_tag_pairs)[source]¶

Use BLLIP to parse a sentence. Takes a sentence as a list of (word, tag) tuples; the sentence must have already been tokenized and tagged. BLLIP will attempt to use the tags provided but may use others if it can’t come up with a complete parse subject to those constraints. You may also specify a tag as None to leave a token’s tag unconstrained.

Returns:	An iterator that generates parse trees for the sentence

from most likely to least likely.

Parameters:	sentence (list(tuple(str, str))) – Input sentence to parse as (word, tag) pairs
Return type:	iter(Tree)

nltk.parse.chart module¶

Data classes and parser implementations for “chart parsers”, which use dynamic programming to efficiently parse a text. A chart parser derives parse trees for a text by iteratively adding “edges” to a “chart.” Each edge represents a hypothesis about the tree structure for a subsequence of the text. The chart is a “blackboard” for composing and combining these hypotheses.

When a chart parser begins parsing a text, it creates a new (empty) chart, spanning the text. It then incrementally adds new edges to the chart. A set of “chart rules” specifies the conditions under which new edges should be added to the chart. Once the chart reaches a stage where none of the chart rules adds any new edges, parsing is complete.

Charts are encoded with the Chart class, and edges are encoded with the TreeEdge and LeafEdge classes. The chart parser module defines three chart parsers:

ChartParser is a simple and flexible chart parser. Given a set of chart rules, it will apply those rules to the chart until no more edges are added.

SteppingChartParser is a subclass of ChartParser that can be used to step through the parsing process.

class nltk.parse.chart.AbstractChartRule[source]¶

Bases: nltk.parse.chart.ChartRuleI

An abstract base class for chart rules. AbstractChartRule provides:

A default implementation for apply.
A default implementation for apply_everywhere, (Currently, this implementation assumes that ``NUM_EDGES``<=3.)
A default implementation for __str__, which returns a name based on the rule’s class name.

apply(chart, grammar, *edges)[source]¶

Return a generator that will add edges licensed by this rule and the given edges to the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

apply_everywhere(chart, grammar)[source]¶

Return a generator that will add all edges licensed by this rule, given the edges that are currently in the chart, one at a time. Each time the generator is resumed, it will either add a new edge and yield that edge; or return.

Return type:	iter(EdgeI)

unicode_repr¶: Return repr(self).

class nltk.parse.chart.BottomUpChartParser(grammar, **parser_args)[source]¶

Bases: nltk.parse.chart.ChartParser

A ChartParser using a bottom-up parsing strategy. See ChartParser for more information.

class nltk.parse.chart.BottomUpLeftCornerChartParser(grammar, **parser_args)[source]¶

Bases: nltk.parse.chart.ChartParser

A ChartParser using a bottom-up left-corner parsing strategy. This strategy is often more efficient than standard bottom-up. See ChartParser for more information.

class nltk.parse.chart.BottomUpPredictCombineRule[source]¶

Bases: nltk.parse.chart.BottomUpPredictRule

A rule licensing any edge corresponding to a production whose right-hand side begins with a complete edge’s left-hand side. In particular, this rule specifies that [A -> alpha \*] licenses the edge [B -> A \* beta] for each grammar production B -> A beta.

Note:	This is like `BottomUpPredictRule`, but it also applies the `FundamentalRule` to the resulting edge.

NUM_EDGES = 1¶

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.BottomUpPredictRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

A rule licensing any edge corresponding to a production whose right-hand side begins with a complete edge’s left-hand side. In particular, this rule specifies that [A -> alpha \*] licenses the edge [B -> \* A beta] for each grammar production B -> A beta.

NUM_EDGES = 1¶

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.CachedTopDownPredictRule[source]¶

Bases: nltk.parse.chart.TopDownPredictRule

A cached version of TopDownPredictRule. After the first time this rule is applied to an edge with a given end and next, it will not generate any more edges for edges with that end and next.

If chart or grammar are changed, then the cache is flushed.

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.Chart(tokens)[source]¶

Bases: object

A blackboard for hypotheses about the syntactic constituents of a sentence. A chart contains a set of edges, and each edge encodes a single hypothesis about the structure of some portion of the sentence.

The select method can be used to select a specific collection of edges. For example chart.select(is_complete=True, start=0) yields all complete edges whose start indices are 0. To ensure the efficiency of these selection operations, Chart dynamically creates and maintains an index for each set of attributes that have been selected on.

In order to reconstruct the trees that are represented by an edge, the chart associates each edge with a set of child pointer lists. A child pointer list is a list of the edges that license an edge’s right-hand side.

Variables:

Variables:	_tokens – The sentence that the chart covers. _num_leaves – The number of tokens. _edges – A list of the edges in the chart _edge_to_cpls – A dictionary mapping each edge to a set of child pointer lists that are associated with that edge. _indexes – A dictionary mapping tuples of edge attributes to indices, where each index maps the corresponding edge attribute values to lists of edges.

_tokens – The sentence that the chart covers.
_num_leaves – The number of tokens.
_edges – A list of the edges in the chart
_edge_to_cpls – A dictionary mapping each edge to a set of child pointer lists that are associated with that edge.
_indexes – A dictionary mapping tuples of edge attributes to indices, where each index maps the corresponding edge attribute values to lists of edges.

child_pointer_lists(edge)[source]¶

Return the set of child pointer lists for the given edge. Each child pointer list is a list of edges that have been used to form this edge.

Return type:	list(list(EdgeI))

dot_digraph()[source]¶

edges()[source]¶

Return a list of all edges in this chart. New edges that are added to the chart after the call to edges() will not be contained in this list.

Return type:	list(EdgeI)
See:	`iteredges`, `select`

initialize()[source]¶: Clear the chart.

insert(edge, *child_pointer_lists)[source]¶

Add a new edge to the chart, and return True if this operation modified the chart. In particular, return true iff the chart did not already contain edge, or if it did not already associate child_pointer_lists with edge.

Parameters:	edge (EdgeI) – The new edge child_pointer_lists (sequence of tuple(EdgeI)) – A sequence of lists of the edges that were used to form this edge. This list is used to reconstruct the trees (or partial trees) that are associated with `edge`.
Return type:	bool

insert_with_backpointer(new_edge, previous_edge, child_edge)[source]¶: Add a new edge to the chart, using a pointer to the previous edge.

iteredges()[source]¶

Return an iterator over the edges in this chart. It is not guaranteed that new edges which are added to the chart before the iterator is exhausted will also be generated.

Return type:	iter(EdgeI)
See:	`edges`, `select`

leaf(index)[source]¶

Return the leaf value of the word at the given index.

Return type:	str

leaves()[source]¶

Return a list of the leaf values of each word in the chart’s sentence.

Return type:	list(str)

num_edges()[source]¶

Return the number of edges contained in this chart.

Return type:	int

num_leaves()[source]¶

Return the number of words in this chart’s sentence.

Return type:	int

parses(root, tree_class=<class 'nltk.tree.Tree'>)[source]¶: Return an iterator of the complete tree structures that span the entire chart, and whose root node is root.

pretty_format(width=None)[source]¶

Return a pretty-printed string representation of this chart.

Parameters:	width – The number of characters allotted to each index in the sentence.
Return type:	str

pretty_format_edge(edge, width=None)[source]¶

Return a pretty-printed string representation of a given edge in this chart.

Return type:	str
Parameters:	width – The number of characters allotted to each index in the sentence.

pretty_format_leaves(width=None)[source]¶: Return a pretty-printed string representation of this chart’s leaves. This string can be used as a header for calls to pretty_format_edge.

select(**restrictions)[source]¶

Return an iterator over the edges in this chart. Any new edges that are added to the chart before the iterator is exahusted will also be generated. restrictions can be used to restrict the set of edges that will be generated.

Parameters:

Parameters:	span – Only generate edges `e` where `e.span()==span` start – Only generate edges `e` where `e.start()==start` end – Only generate edges `e` where `e.end()==end` length – Only generate edges `e` where `e.length()==length` lhs – Only generate edges `e` where `e.lhs()==lhs` rhs – Only generate edges `e` where `e.rhs()==rhs` nextsym – Only generate edges `e` where `e.nextsym()==nextsym` dot – Only generate edges `e` where `e.dot()==dot` is_complete – Only generate edges `e` where `e.is_complete()==is_complete` is_incomplete – Only generate edges `e` where `e.is_incomplete()==is_incomplete`
Return type:	iter(EdgeI)

span – Only generate edges e where e.span()==span
start – Only generate edges e where e.start()==start
end – Only generate edges e where e.end()==end
length – Only generate edges e where e.length()==length
lhs – Only generate edges e where e.lhs()==lhs
rhs – Only generate edges e where e.rhs()==rhs
nextsym – Only generate edges e where e.nextsym()==nextsym
dot – Only generate edges e where e.dot()==dot
is_complete – Only generate edges e where e.is_complete()==is_complete
is_incomplete – Only generate edges e where e.is_incomplete()==is_incomplete

Return type:

iter(EdgeI)

trees(edge, tree_class=<class 'nltk.tree.Tree'>, complete=False)[source]¶

Return an iterator of the tree structures that are associated with edge.

If edge is incomplete, then the unexpanded children will be encoded as childless subtrees, whose node value is the corresponding terminal or nonterminal.

Return type:	list(Tree)
Note:	If two trees share a common subtree, then the same Tree may be used to encode that subtree in both trees. If you need to eliminate this subtree sharing, then create a deep copy of each tree.

class nltk.parse.chart.ChartParser(grammar, strategy=[<nltk.parse.chart.LeafInitRule object>, <nltk.parse.chart.EmptyPredictRule object>, <nltk.parse.chart.BottomUpPredictCombineRule object>, <nltk.parse.chart.SingleEdgeFundamentalRule object>], trace=0, trace_chart_width=50, use_agenda=True, chart_class=<class 'nltk.parse.chart.Chart'>)[source]¶

Bases: nltk.parse.api.ParserI

A generic chart parser. A “strategy”, or list of ChartRuleI instances, is used to decide what edges to add to the chart. In particular, ChartParser uses the following algorithm to parse texts:

Until no new edges are added:

For each rule in strategy:

Apply rule to any applicable edges in the chart.

Return any complete parses in the chart

chart_parse(tokens, trace=None)[source]¶

Return the final parse Chart from which all possible parse trees can be extracted.

Parameters:	tokens (list(str)) – The sentence to be parsed
Return type:	Chart

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(tokens, tree_class=<class 'nltk.tree.Tree'>)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

class nltk.parse.chart.ChartRuleI[source]¶

Bases: object

A rule that specifies what new edges are licensed by any given set of existing edges. Each chart rule expects a fixed number of edges, as indicated by the class variable NUM_EDGES. In particular:

A chart rule with NUM_EDGES=0 specifies what new edges are licensed, regardless of existing edges.
A chart rule with NUM_EDGES=1 specifies what new edges are licensed by a single existing edge.
A chart rule with NUM_EDGES=2 specifies what new edges are licensed by a pair of existing edges.

Variables:	NUM_EDGES – The number of existing edges that this rule uses to license new edges. Typically, this number ranges from zero to two.

apply(chart, grammar, *edges)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

apply_everywhere(chart, grammar)[source]¶

Return type:	iter(EdgeI)

class nltk.parse.chart.EdgeI[source]¶

Bases: object

A hypothesis about the structure of part of a sentence. Each edge records the fact that a structure is (partially) consistent with the sentence. An edge contains:

A span, indicating what part of the sentence is consistent with the hypothesized structure.
A left-hand side, specifying what kind of structure is hypothesized.
A right-hand side, specifying the contents of the hypothesized structure.
A dot position, indicating how much of the hypothesized structure is consistent with the sentence.

Every edge is either complete or incomplete:

An edge is complete if its structure is fully consistent with the sentence.
An edge is incomplete if its structure is partially consistent with the sentence. For every incomplete edge, the span specifies a possible prefix for the edge’s structure.

There are two kinds of edge:

A TreeEdge records which trees have been found to be (partially) consistent with the text.
A LeafEdge records the tokens occurring in the text.

The EdgeI interface provides a common interface to both types of edge, allowing chart parsers to treat them in a uniform manner.

dot()[source]¶

Return this edge’s dot position, which indicates how much of the hypothesized structure is consistent with the sentence. In particular, self.rhs[:dot] is consistent with tokens[self.start():self.end()].

Return type:	int

end()[source]¶

Return the end index of this edge’s span.

Return type:	int

is_complete()[source]¶

Return True if this edge’s structure is fully consistent with the text.

Return type:	bool

is_incomplete()[source]¶

Return True if this edge’s structure is partially consistent with the text.

Return type:	bool

length()[source]¶

Return the length of this edge’s span.

Return type:	int

lhs()[source]¶

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the left-hand side values for each edge type.

nextsym()[source]¶

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:	Nonterminal or terminal or None

rhs()[source]¶

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the right-hand side values for each edge type.

span()[source]¶

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:	tuple(int, int)

start()[source]¶

Return the start index of this edge’s span.

Return type:	int

class nltk.parse.chart.EmptyPredictRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

A rule that inserts all empty productions as passive edges, in every position in the chart.

NUM_EDGES = 0¶

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.FilteredBottomUpPredictCombineRule[source]¶

Bases: nltk.parse.chart.BottomUpPredictCombineRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.FilteredSingleEdgeFundamentalRule[source]¶: Bases: nltk.parse.chart.SingleEdgeFundamentalRule

class nltk.parse.chart.FundamentalRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

A rule that joins two adjacent edges to form a single combined edge. In particular, this rule specifies that any pair of edges

[A -> alpha \* B beta][i:j]
[B -> gamma \*][j:k]

licenses the edge:

[A -> alpha B * beta][i:j]

NUM_EDGES = 2¶

apply(chart, grammar, left_edge, right_edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.LeafEdge(leaf, index)[source]¶

Bases: nltk.parse.chart.EdgeI

An edge that records the fact that a leaf value is consistent with a word in the sentence. A leaf edge consists of:

An index, indicating the position of the word.
A leaf, specifying the word’s content.

A leaf edge’s left-hand side is its leaf value, and its right hand side is (). Its span is [index, index+1], and its dot position is 0.

dot()[source]¶

Return type:	int

end()[source]¶

Return the end index of this edge’s span.

Return type:	int

is_complete()[source]¶

Return True if this edge’s structure is fully consistent with the text.

Return type:	bool

is_incomplete()[source]¶

Return True if this edge’s structure is partially consistent with the text.

Return type:	bool

length()[source]¶

Return the length of this edge’s span.

Return type:	int

lhs()[source]¶

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the left-hand side values for each edge type.

nextsym()[source]¶

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:	Nonterminal or terminal or None

rhs()[source]¶

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the right-hand side values for each edge type.

span()[source]¶

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:	tuple(int, int)

start()[source]¶

Return the start index of this edge’s span.

Return type:	int

unicode_repr()¶: Return repr(self).

class nltk.parse.chart.LeafInitRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

NUM_EDGES = 0¶

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.LeftCornerChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.chart.ChartParser

class nltk.parse.chart.SingleEdgeFundamentalRule[source]¶

Bases: nltk.parse.chart.FundamentalRule

A rule that joins a given edge with adjacent edges in the chart, to form combined edges. In particular, this rule specifies that either of the edges:

[A -> alpha \* B beta][i:j]
[B -> gamma \*][j:k]

licenses the edge:

[A -> alpha B * beta][i:j]

if the other edge is already in the chart.

Note:	This is basically `FundamentalRule`, with one edge left unspecified.

NUM_EDGES = 1¶

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.SteppingChartParser(grammar, strategy=[], trace=0)[source]¶

Bases: nltk.parse.chart.ChartParser

A ChartParser that allows you to step through the parsing process, adding a single edge at a time. It also allows you to change the parser’s strategy or grammar midway through parsing a text.

The initialize method is used to start parsing a text. step adds a single edge to the chart. set_strategy changes the strategy used by the chart parser. parses returns the set of parses that has been found by the chart parser.

Variables:	_restart – Records whether the parser’s strategy, grammar, or chart has been changed. If so, then `step` must restart the parsing algorithm.

chart()[source]¶: Return the chart that is used by this parser.

current_chartrule()[source]¶: Return the chart rule used to generate the most recent edge.

grammar()[source]¶: Return the grammar used by this parser.

initialize(tokens)[source]¶: Begin parsing the given tokens.

parse(tokens, tree_class=<class 'nltk.tree.Tree'>)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

parses(tree_class=<class 'nltk.tree.Tree'>)[source]¶: Return the parse trees currently contained in the chart.

set_chart(chart)[source]¶: Load a given chart into the chart parser.

set_grammar(grammar)[source]¶: Change the grammar used by the parser.

set_strategy(strategy)[source]¶

Change the strategy that the parser uses to decide which edges to add to the chart.

Parameters:	strategy (list(ChartRuleI)) – A list of rules that should be used to decide what edges to add to the chart.

step()[source]¶

Return a generator that adds edges to the chart, one at a time. Each time the generator is resumed, it adds a single edge and yields that edge. If no more edges can be added, then it yields None.

If the parser’s strategy, grammar, or chart is changed, then the generator will continue adding edges using the new strategy, grammar, or chart.

Note that this generator never terminates, since the grammar or strategy might be changed to values that would add new edges. Instead, it yields None when no more edges can be added with the current strategy and grammar.

strategy()[source]¶: Return the strategy used by this parser.

class nltk.parse.chart.TopDownChartParser(grammar, **parser_args)[source]¶

Bases: nltk.parse.chart.ChartParser

A ChartParser using a top-down parsing strategy. See ChartParser for more information.

class nltk.parse.chart.TopDownInitRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

A rule licensing edges corresponding to the grammar productions for the grammar’s start symbol. In particular, this rule specifies that [S -> \* alpha][0:i] is licensed for each grammar production S -> alpha, where S is the grammar’s start symbol.

NUM_EDGES = 0¶

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.TopDownPredictRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

A rule licensing edges corresponding to the grammar productions for the nonterminal following an incomplete edge’s dot. In particular, this rule specifies that [A -> alpha \* B beta][i:j] licenses the edge [B -> \* gamma][j:j] for each grammar production B -> gamma.

Note:	This rule corresponds to the Predictor Rule in Earley parsing.

NUM_EDGES = 1¶

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.chart.TreeEdge(span, lhs, rhs, dot=0)[source]¶

Bases: nltk.parse.chart.EdgeI

An edge that records the fact that a tree is (partially) consistent with the sentence. A tree edge consists of:

A span, indicating what part of the sentence is consistent with the hypothesized tree.
A left-hand side, specifying the hypothesized tree’s node value.
A right-hand side, specifying the hypothesized tree’s children. Each element of the right-hand side is either a terminal, specifying a token with that terminal as its leaf value; or a nonterminal, specifying a subtree with that nonterminal’s symbol as its node value.
A dot position, indicating which children are consistent with part of the sentence. In particular, if dot is the dot position, rhs is the right-hand size, (start,end) is the span, and sentence is the list of tokens in the sentence, then tokens[start:end] can be spanned by the children specified by rhs[:dot].

For more information about edges, see the EdgeI interface.

dot()[source]¶

Return type:	int

end()[source]¶

Return the end index of this edge’s span.

Return type:	int

static from_production(production, index)[source]¶

Return a new TreeEdge formed from the given production. The new edge’s left-hand side and right-hand side will be taken from production; its span will be (index,index); and its dot position will be 0.

Return type:	TreeEdge

is_complete()[source]¶

Return True if this edge’s structure is fully consistent with the text.

Return type:	bool

is_incomplete()[source]¶

Return True if this edge’s structure is partially consistent with the text.

Return type:	bool

length()[source]¶

Return the length of this edge’s span.

Return type:	int

lhs()[source]¶

Return this edge’s left-hand side, which specifies what kind of structure is hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the left-hand side values for each edge type.

move_dot_forward(new_end)[source]¶

Return a new TreeEdge formed from this edge. The new edge’s dot position is increased by 1, and its end index will be replaced by new_end.

Parameters:	new_end (int) – The new end index.
Return type:	TreeEdge

nextsym()[source]¶

Return the element of this edge’s right-hand side that immediately follows its dot.

Return type:	Nonterminal or terminal or None

rhs()[source]¶

Return this edge’s right-hand side, which specifies the content of the structure hypothesized by this edge.

See:	`TreeEdge` and `LeafEdge` for a description of the right-hand side values for each edge type.

span()[source]¶

Return a tuple (s, e), where tokens[s:e] is the portion of the sentence that is consistent with this edge’s structure.

Return type:	tuple(int, int)

start()[source]¶

Return the start index of this edge’s span.

Return type:	int

unicode_repr()¶: Return repr(self).

nltk.parse.chart.demo(choice=None, print_times=True, print_grammar=False, print_trees=True, trace=2, sent='I saw John with a dog with my cookie', numparses=5)[source]¶: A demonstration of the chart parsers.

nltk.parse.chart.demo_grammar()[source]¶

nltk.parse.corenlp module¶

class nltk.parse.corenlp.CoreNLPDependencyParser(url='http://localhost:9000', encoding='utf8', tagtype=None)[source]¶

Bases: nltk.parse.corenlp.GenericCoreNLPParser

Dependency parser.

>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')

>>> parse, = dep_parser.raw_parse(
...     'The quick brown fox jumps over the lazy dog.'
... )
>>> print(parse.to_conll(4))  
The     DT      4       det
quick   JJ      4       amod
brown   JJ      4       amod
fox     NN      5       nsubj
jumps   VBZ     0       ROOT
over    IN      9       case
the     DT      9       det
lazy    JJ      9       amod
dog     NN      5       nmod
.       .       5       punct

>>> print(parse.tree())  
(jumps (fox The quick brown) (dog over the lazy) .)

>>> for governor, dep, dependent in parse.triples():
...     print(governor, dep, dependent)  
    ('jumps', 'VBZ') nsubj ('fox', 'NN')
    ('fox', 'NN') det ('The', 'DT')
    ('fox', 'NN') amod ('quick', 'JJ')
    ('fox', 'NN') amod ('brown', 'JJ')
    ('jumps', 'VBZ') nmod ('dog', 'NN')
    ('dog', 'NN') case ('over', 'IN')
    ('dog', 'NN') det ('the', 'DT')
    ('dog', 'NN') amod ('lazy', 'JJ')
    ('jumps', 'VBZ') punct ('.', '.')

>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )
>>> print(parse_fox.to_conll(4))  
The DT      4       det
quick       JJ      4       amod
brown       JJ      4       amod
fox NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
dog NN      5       nmod
.   .       5       punct

>>> print(parse_dog.to_conll(4))  
The DT      4       det
quick       JJ      4       amod
grey        JJ      4       amod
wolf        NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
fox NN      5       nmod
.   .       5       punct

>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )
>>> print(parse_dog.to_conll(4))  
I   PRP     4       nsubj
'm  VBP     4       cop
a   DT      4       det
dog NN      0       ROOT

>>> print(parse_friends.to_conll(4))  
This        DT      6       nsubj
is  VBZ     6       cop
my  PRP$    4       nmod:poss
friends     NNS     6       nmod:poss
'   POS     4       case
cat NN      0       ROOT
-LRB-       -LRB-   9       punct
the DT      9       det
tabby       NN      6       appos
-RRB-       -RRB-   9       punct

>>> parse_john, parse_mary, = dep_parser.parse_text(
...     'John loves Mary. Mary walks.'
... )

>>> print(parse_john.to_conll(4))  
John        NNP     2       nsubj
loves       VBZ     0       ROOT
Mary        NNP     2       dobj
.   .       2       punct

>>> print(parse_mary.to_conll(4))  
Mary        NNP     2       nsubj
walks       VBZ     0       ROOT
.   .       2       punct

Non-breaking space inside of a token.

>>> len(
...     next(
...         dep_parser.raw_parse(
...             'Anhalt said children typically treat a 20-ounce soda bottle as one '
...             'serving, while it actually contains 2 1/2 servings.'
...         )
...     ).nodes
... )
21

Phone numbers.

>>> len(
...     next(
...         dep_parser.raw_parse('This is not going to crash: 01 111 555.')
...     ).nodes
... )
10

>>> print(
...     next(
...         dep_parser.raw_parse('The underscore _ should not simply disappear.')
...     ).to_conll(4)
... )  
The         DT  3   det
underscore  VBP 3   amod
_           NN  7   nsubj
should      MD  7   aux
not         RB  7   neg
simply      RB  7   advmod
disappear   VB  0   ROOT
.           .   7   punct

>>> print(
...     '\n'.join(
...         next(
...             dep_parser.raw_parse(
...                 'for all of its insights into the dream world of teen life , and its electronic expression through '
...                 'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 '
...                 '1/2-hour running time .'
...             )
...         ).to_conll(4).split('\n')[-8:]
...     )
... )
its PRP$    40      nmod:poss
2 1/2       CD      40      nummod
-   :       40      punct
hour        NN      31      nmod
running     VBG     42      amod
time        NN      40      dep
.   .       24      punct

make_tree(result)[source]¶

parser_annotator = 'depparse'¶

class nltk.parse.corenlp.CoreNLPParser(url='http://localhost:9000', encoding='utf8', tagtype=None)[source]¶

Bases: nltk.parse.corenlp.GenericCoreNLPParser

>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> next(
...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print()  
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )

>>> parse_fox.pretty_print()  
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> parse_wolf.pretty_print()  
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .

>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )

>>> parse_dog.pretty_print()  
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog

>>> parse_friends.pretty_print()  
     ROOT
      |
      S
  ____|___________
 |                VP
 |     ___________|_____________
 |    |                         NP
 |    |                  _______|_________
 |    |                 NP               PRN
 |    |            _____|_______      ____|______________
 NP   |           NP            |    |        NP         |
 |    |     ______|_________    |    |     ___|____      |
 DT  VBZ  PRP$   NNS       POS  NN -LRB-  DT       NN  -RRB-
 |    |    |      |         |   |    |    |        |     |
This  is   my  friends      '  cat -LRB- the     tabby -RRB-

>>> parse_john, parse_mary, = parser.parse_text(
...     'John loves Mary. Mary walks.'
... )

>>> parse_john.pretty_print()  
      ROOT
       |
       S
  _____|_____________
 |          VP       |
 |      ____|___     |
 NP    |        NP   |
 |     |        |    |
NNP   VBZ      NNP   .
 |     |        |    |
John loves     Mary  .

>>> parse_mary.pretty_print()  
      ROOT
       |
       S
  _____|____
 NP    VP   |
 |     |    |
NNP   VBZ   .
 |     |    |
Mary walks  .

>>> next(
...     parser.raw_parse(
...         'NASIRIYA, Iraq—Iraqi doctors who treated former prisoner of war '
...         'Jessica Lynch have angrily dismissed claims made in her biography '
...         'that she was raped by her Iraqi captors.'
...     )
... ).height()
20

>>> next(
...     parser.raw_parse(
...         "The broader Standard & Poor's 500 Index <.SPX> was 0.46 points lower, or "
...         '0.05 percent, at 997.02.'
...     )
... ).height()
9

make_tree(result)[source]¶

parser_annotator = 'parse'¶

class nltk.parse.corenlp.CoreNLPServer(path_to_jar=None, path_to_models_jar=None, verbose=False, java_options=None, corenlp_options=None, port=None)[source]¶

Bases: object

start()[source]¶

stop()[source]¶

exception nltk.parse.corenlp.CoreNLPServerError[source]¶

Bases: OSError

Exceptions associated with the Core NLP server.

class nltk.parse.corenlp.GenericCoreNLPParser(url='http://localhost:9000', encoding='utf8', tagtype=None)[source]¶

Bases: nltk.parse.api.ParserI, nltk.tokenize.api.TokenizerI, nltk.tag.api.TaggerI

Interface to the CoreNLP Parser.

api_call(data, properties=None)[source]¶

parse_sents(sentences, *args, **kwargs)[source]¶

Parse multiple sentences.

Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance’s tagger.

If a whitespace exists inside a token, then the token will be treated as several tokens.

Parameters:	sentences (list(list(str))) – Input sentences to parse
Return type:	iter(iter(Tree))

parse_text(text, *args, **kwargs)[source]¶

Parse a piece of text.

The text might contain several sentences which will be split by CoreNLP.

Parameters:	text (str) – text to be split.
Returns:	an iterable of syntactic structures. # TODO: should it be an iterable of iterables?

raw_parse(sentence, properties=None, *args, **kwargs)[source]¶

Parse a sentence.

Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.

Parameters:	sentence (str) – Input sentence to parse
Return type:	iter(Tree)

raw_parse_sents(sentences, verbose=False, properties=None, *args, **kwargs)[source]¶

Parse multiple sentences.

Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.

Parameters:	sentences (list(str)) – Input sentences to parse.
Return type:	iter(iter(Tree))

raw_tag_sents(sentences)[source]¶

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a string.

Parameters:	sentences (list(str)) – Input sentences to tag
Return type:	list(list(list(tuple(str, str)))

tag(sentence)[source]¶

Tag a list of tokens.

Return type:	list(tuple(str, str))

>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
>>> parser.tag(tokens)
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'),
('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]

>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
>>> tokens = "What is the airspeed of an unladen swallow ?".split()
>>> parser.tag(tokens)
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

tag_sents(sentences)[source]¶

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a list of tokens.

Parameters:	sentences (list(list(str))) – Input sentences to tag
Return type:	list(list(tuple(str, str))

tokenize(text, properties=None)[source]¶

Tokenize a string of text.

>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> text = 'Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\nThanks.'
>>> list(parser.tokenize(text))
['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']

>>> s = "The colour of the wall is blue."
>>> list(
...     parser.tokenize(
...         'The colour of the wall is blue.',
...             properties={'tokenize.options': 'americanize=true'},
...     )
... )
['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']

nltk.parse.corenlp.setup_module(module)[source]¶

nltk.parse.corenlp.teardown_module(module)[source]¶

nltk.parse.corenlp.transform(sentence)[source]¶

nltk.parse.corenlp.try_port(port=0)[source]¶

nltk.parse.dependencygraph module¶

Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (http://stp.lingfil.uu.se/~nivre/research/MaltXML.html).

class nltk.parse.dependencygraph.DependencyGraph(tree_str=None, cell_extractor=None, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]¶

Bases: object

A container for the nodes and labelled edges of a dependency structure.

add_arc(head_address, mod_address)[source]¶: Adds an arc from the node specified by head_address to the node specified by the mod address.

add_node(node)[source]¶

connect_graph()[source]¶: Fully connects all non-root nodes. All nodes are set to be dependents of the root node.

contains_address(node_address)[source]¶: Returns true if the graph contains a node with the given node address, false otherwise.

contains_cycle()[source]¶

Check whether there are cycles.

>>> dg = DependencyGraph(treebank_data)
>>> dg.contains_cycle()
False

>>> cyclic_dg = DependencyGraph()
>>> top = {'word': None, 'deps': [1], 'rel': 'TOP', 'address': 0}
>>> child1 = {'word': None, 'deps': [2], 'rel': 'NTOP', 'address': 1}
>>> child2 = {'word': None, 'deps': [4], 'rel': 'NTOP', 'address': 2}
>>> child3 = {'word': None, 'deps': [1], 'rel': 'NTOP', 'address': 3}
>>> child4 = {'word': None, 'deps': [3], 'rel': 'NTOP', 'address': 4}
>>> cyclic_dg.nodes = {
...     0: top,
...     1: child1,
...     2: child2,
...     3: child3,
...     4: child4,
... }
>>> cyclic_dg.root = top

>>> cyclic_dg.contains_cycle()
[3, 1, 2, 4]

get_by_address(node_address)[source]¶: Return the node with the given address.

get_cycle_path(curr_node, goal_node_index)[source]¶

left_children(node_index)[source]¶: Returns the number of left children under the node specified by the given address.

static load(filename, zero_based=False, cell_separator=None, top_relation_label='ROOT')[source]¶

Parameters:	filename – a name of a file in Malt-TAB format zero_based – nodes in the input file are numbered starting from 0

rather than 1 (as produced by, e.g., zpar) :param str cell_separator: the cell separator. If not provided, cells are split by whitespace. :param str top_relation_label: the label by which the top relation is identified, for examlple, ROOT, null or TOP.

Returns:	a list of DependencyGraphs

nx_graph()[source]¶: Convert the data in a nodelist into a networkx labeled directed graph.

redirect_arcs(originals, redirect)[source]¶: Redirects arcs to any of the nodes in the originals list to the redirect node address.

remove_by_address(address)[source]¶: Removes the node with the given address. References to this node in others will still exist.

right_children(node_index)[source]¶: Returns the number of right children under the node specified by the given address.

to_conll(style)[source]¶

The dependency graph in CoNLL format.

Parameters:	style (int) – the style to use for the format (3, 4, 10 columns)
Return type:	str

to_dot()[source]¶

Return a dot representation suitable for using with Graphviz.

>>> dg = DependencyGraph(
...     'John N 2\n'
...     'loves V 0\n'
...     'Mary N 2'
... )
>>> print(dg.to_dot())
digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 2 [label="ROOT"]
1 [label="1 (John)"]
2 [label="2 (loves)"]
2 -> 1 [label=""]
2 -> 3 [label=""]
3 [label="3 (Mary)"]
}

tree()[source]¶: Starting with the root node, build a dependency tree using the NLTK Tree constructor. Dependency labels are omitted.

triples(node=None)[source]¶: Extract dependency triples of the form: ((head word, head tag), rel, (dep word, dep tag))

unicode_repr()¶: Return repr(self).

exception nltk.parse.dependencygraph.DependencyGraphError[source]¶

Bases: Exception

Dependency graph exception.

nltk.parse.dependencygraph.conll_demo()[source]¶: A demonstration of how to read a string representation of a CoNLL format dependency tree.

nltk.parse.dependencygraph.conll_file_demo()[source]¶

nltk.parse.dependencygraph.cycle_finding_demo()[source]¶

nltk.parse.dependencygraph.demo()[source]¶

nltk.parse.dependencygraph.malt_demo(nx=False)[source]¶: A demonstration of the result of reading a dependency version of the first sentence of the Penn Treebank.

nltk.parse.earleychart module¶

Data classes and parser implementations for incremental chart parsers, which use dynamic programming to efficiently parse a text. A “chart parser” derives parse trees for a text by iteratively adding “edges” to a “chart”. Each “edge” represents a hypothesis about the tree structure for a subsequence of the text. The “chart” is a “blackboard” for composing and combining these hypotheses.

A parser is “incremental”, if it guarantees that for all i, j where i < j, all edges ending at i are built before any edges ending at j. This is appealing for, say, speech recognizer hypothesis filtering.

The main parser class is EarleyChartParser, which is a top-down algorithm, originally formulated by Jay Earley (1970).

class nltk.parse.earleychart.CompleteFundamentalRule[source]¶: Bases: nltk.parse.chart.SingleEdgeFundamentalRule

class nltk.parse.earleychart.CompleterRule[source]¶

Bases: nltk.parse.earleychart.CompleteFundamentalRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.earleychart.EarleyChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser

class nltk.parse.earleychart.FeatureCompleteFundamentalRule[source]¶: Bases: nltk.parse.featurechart.FeatureSingleEdgeFundamentalRule

class nltk.parse.earleychart.FeatureCompleterRule[source]¶: Bases: nltk.parse.earleychart.CompleterRule

class nltk.parse.earleychart.FeatureEarleyChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.FeatureIncrementalChartParser

class nltk.parse.earleychart.FeatureIncrementalBottomUpChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.FeatureIncrementalChartParser

class nltk.parse.earleychart.FeatureIncrementalBottomUpLeftCornerChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.FeatureIncrementalChartParser

class nltk.parse.earleychart.FeatureIncrementalChart(tokens)[source]¶

Bases: nltk.parse.earleychart.IncrementalChart, nltk.parse.featurechart.FeatureChart

select(end, **restrictions)[source]¶: Returns an iterator over the edges in this chart. See Chart.select for more information about the restrictions on the edges.

class nltk.parse.earleychart.FeatureIncrementalChartParser(grammar, strategy=[<nltk.parse.chart.LeafInitRule object>, <nltk.parse.featurechart.FeatureEmptyPredictRule object>, <nltk.parse.featurechart.FeatureBottomUpPredictCombineRule object>, <nltk.parse.earleychart.FeatureCompleteFundamentalRule object>], trace_chart_width=20, chart_class=<class 'nltk.parse.earleychart.FeatureIncrementalChart'>, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser, nltk.parse.featurechart.FeatureChartParser

class nltk.parse.earleychart.FeatureIncrementalTopDownChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.FeatureIncrementalChartParser

class nltk.parse.earleychart.FeaturePredictorRule[source]¶: Bases: nltk.parse.featurechart.FeatureTopDownPredictRule

class nltk.parse.earleychart.FeatureScannerRule[source]¶: Bases: nltk.parse.earleychart.ScannerRule

class nltk.parse.earleychart.FilteredCompleteFundamentalRule[source]¶

Bases: nltk.parse.chart.FilteredSingleEdgeFundamentalRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.earleychart.IncrementalBottomUpChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser

class nltk.parse.earleychart.IncrementalBottomUpLeftCornerChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser

class nltk.parse.earleychart.IncrementalChart(tokens)[source]¶

Bases: nltk.parse.chart.Chart

edges()[source]¶

Return a list of all edges in this chart. New edges that are added to the chart after the call to edges() will not be contained in this list.

Return type:	list(EdgeI)
See:	`iteredges`, `select`

initialize()[source]¶: Clear the chart.

iteredges()[source]¶

Return an iterator over the edges in this chart. It is not guaranteed that new edges which are added to the chart before the iterator is exhausted will also be generated.

Return type:	iter(EdgeI)
See:	`edges`, `select`

select(end, **restrictions)[source]¶

Parameters:

span – Only generate edges e where e.span()==span
start – Only generate edges e where e.start()==start
end – Only generate edges e where e.end()==end
length – Only generate edges e where e.length()==length
lhs – Only generate edges e where e.lhs()==lhs
rhs – Only generate edges e where e.rhs()==rhs
nextsym – Only generate edges e where e.nextsym()==nextsym
dot – Only generate edges e where e.dot()==dot
is_complete – Only generate edges e where e.is_complete()==is_complete
is_incomplete – Only generate edges e where e.is_incomplete()==is_incomplete

Return type:

iter(EdgeI)

class nltk.parse.earleychart.IncrementalChartParser(grammar, strategy=[<nltk.parse.chart.LeafInitRule object>, <nltk.parse.chart.EmptyPredictRule object>, <nltk.parse.chart.BottomUpPredictCombineRule object>, <nltk.parse.earleychart.CompleteFundamentalRule object>], trace=0, trace_chart_width=50, chart_class=<class 'nltk.parse.earleychart.IncrementalChart'>)[source]¶

Bases: nltk.parse.chart.ChartParser

An incremental chart parser implementing Jay Earley’s parsing algorithm:

For each index end in [0, 1, …, N]:

For each edge such that edge.end = end:

If edge is incomplete and edge.next is not a part of speech:

Apply PredictorRule to edge

If edge is incomplete and edge.next is a part of speech:

Apply ScannerRule to edge

If edge is complete:

Apply CompleterRule to edge

Return any complete parses in the chart

chart_parse(tokens, trace=None)[source]¶

Return the final parse Chart from which all possible parse trees can be extracted.

Parameters:	tokens (list(str)) – The sentence to be parsed
Return type:	Chart

class nltk.parse.earleychart.IncrementalLeftCornerChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser

class nltk.parse.earleychart.IncrementalTopDownChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.earleychart.IncrementalChartParser

class nltk.parse.earleychart.PredictorRule[source]¶: Bases: nltk.parse.chart.CachedTopDownPredictRule

class nltk.parse.earleychart.ScannerRule[source]¶

Bases: nltk.parse.earleychart.CompleteFundamentalRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

nltk.parse.earleychart.demo(print_times=True, print_grammar=False, print_trees=True, trace=2, sent='I saw John with a dog with my cookie', numparses=5)[source]¶: A demonstration of the Earley parsers.

nltk.parse.evaluate module¶

class nltk.parse.evaluate.DependencyEvaluator(parsed_sents, gold_sents)[source]¶

Bases: object

Class for measuring labelled and unlabelled attachment score for dependency parsing. Note that the evaluation ignores punctuation.

>>> from nltk.parse import DependencyGraph, DependencyEvaluator

>>> gold_sent = DependencyGraph("""
... Pierre  NNP     2       NMOD
... Vinken  NNP     8       SUB
... ,       ,       2       P
... 61      CD      5       NMOD
... years   NNS     6       AMOD
... old     JJ      2       NMOD
... ,       ,       2       P
... will    MD      0       ROOT
... join    VB      8       VC
... the     DT      11      NMOD
... board   NN      9       OBJ
... as      IN      9       VMOD
... a       DT      15      NMOD
... nonexecutive    JJ      15      NMOD
... director        NN      12      PMOD
... Nov.    NNP     9       VMOD
... 29      CD      16      NMOD
... .       .       9       VMOD
... """)

>>> parsed_sent = DependencyGraph("""
... Pierre  NNP     8       NMOD
... Vinken  NNP     1       SUB
... ,       ,       3       P
... 61      CD      6       NMOD
... years   NNS     6       AMOD
... old     JJ      2       NMOD
... ,       ,       3       AMOD
... will    MD      0       ROOT
... join    VB      8       VC
... the     DT      11      AMOD
... board   NN      9       OBJECT
... as      IN      9       NMOD
... a       DT      15      NMOD
... nonexecutive    JJ      15      NMOD
... director        NN      12      PMOD
... Nov.    NNP     9       VMOD
... 29      CD      16      NMOD
... .       .       9       VMOD
... """)

>>> de = DependencyEvaluator([parsed_sent],[gold_sent])
>>> las, uas = de.eval()
>>> las
0.6...
>>> uas
0.8...
>>> abs(uas - 0.8) < 0.00001
True

eval()[source]¶

Return the Labeled Attachment Score (LAS) and Unlabeled Attachment Score (UAS)

:return : tuple(float,float)

nltk.parse.featurechart module¶

Extension of chart parsing implementation to handle grammars with feature structures as nodes.

class nltk.parse.featurechart.FeatureBottomUpChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.featurechart.FeatureChartParser

class nltk.parse.featurechart.FeatureBottomUpLeftCornerChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.featurechart.FeatureChartParser

class nltk.parse.featurechart.FeatureBottomUpPredictCombineRule[source]¶

Bases: nltk.parse.chart.BottomUpPredictCombineRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureBottomUpPredictRule[source]¶

Bases: nltk.parse.chart.BottomUpPredictRule

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureChart(tokens)[source]¶

Bases: nltk.parse.chart.Chart

A Chart for feature grammars. :see: Chart for more information.

parses(start, tree_class=<class 'nltk.tree.Tree'>)[source]¶: Return an iterator of the complete tree structures that span the entire chart, and whose root node is root.

select(**restrictions)[source]¶: Returns an iterator over the edges in this chart. See Chart.select for more information about the restrictions on the edges.

class nltk.parse.featurechart.FeatureChartParser(grammar, strategy=[<nltk.parse.chart.LeafInitRule object>, <nltk.parse.featurechart.FeatureEmptyPredictRule object>, <nltk.parse.featurechart.FeatureBottomUpPredictCombineRule object>, <nltk.parse.featurechart.FeatureSingleEdgeFundamentalRule object>], trace_chart_width=20, chart_class=<class 'nltk.parse.featurechart.FeatureChart'>, **parser_args)[source]¶: Bases: nltk.parse.chart.ChartParser

class nltk.parse.featurechart.FeatureEmptyPredictRule[source]¶

Bases: nltk.parse.chart.EmptyPredictRule

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureFundamentalRule[source]¶

Bases: nltk.parse.chart.FundamentalRule

A specialized version of the fundamental rule that operates on nonterminals whose symbols are FeatStructNonterminal``s. Rather tha simply comparing the nonterminals for equality, they are unified. Variable bindings from these unifications are collected and stored in the chart using a ``FeatureTreeEdge. When a complete edge is generated, these bindings are applied to all nonterminals in the edge.

The fundamental rule states that:

[A -> alpha \* B1 beta][i:j]
[B2 -> gamma \*][j:k]

licenses the edge:

[A -> alpha B3 \* beta][i:j]

assuming that B1 and B2 can be unified to generate B3.

apply(chart, grammar, left_edge, right_edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureSingleEdgeFundamentalRule[source]¶

Bases: nltk.parse.chart.SingleEdgeFundamentalRule

A specialized version of the completer / single edge fundamental rule that operates on nonterminals whose symbols are ``FeatStructNonterminal``s. Rather than simply comparing the nonterminals for equality, they are unified.

class nltk.parse.featurechart.FeatureTopDownChartParser(grammar, **parser_args)[source]¶: Bases: nltk.parse.featurechart.FeatureChartParser

class nltk.parse.featurechart.FeatureTopDownInitRule[source]¶

Bases: nltk.parse.chart.TopDownInitRule

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureTopDownPredictRule[source]¶

Bases: nltk.parse.chart.CachedTopDownPredictRule

A specialized version of the (cached) top down predict rule that operates on nonterminals whose symbols are ``FeatStructNonterminal``s. Rather than simply comparing the nonterminals for equality, they are unified.

The top down expand rule states that:

[A -> alpha \* B1 beta][i:j]

licenses the edge:

[B2 -> \* gamma][j:j]

for each grammar production B2 -> gamma, assuming that B1 and B2 can be unified.

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.featurechart.FeatureTreeEdge(span, lhs, rhs, dot=0, bindings=None)[source]¶

Bases: nltk.parse.chart.TreeEdge

A specialized tree edge that allows shared variable bindings between nonterminals on the left-hand side and right-hand side.

Each FeatureTreeEdge contains a set of bindings, i.e., a dictionary mapping from variables to values. If the edge is not complete, then these bindings are simply stored. However, if the edge is complete, then the constructor applies these bindings to every nonterminal in the edge whose symbol implements the interface SubstituteBindingsI.

bindings()[source]¶: Return a copy of this edge’s bindings dictionary.

static from_production(production, index)[source]¶

Returns:	A new `TreeEdge` formed from the given production. The new edge’s left-hand side and right-hand side will be taken from `production`; its span will be `(index,index)`; and its dot position will be `0`.
Return type:	TreeEdge

move_dot_forward(new_end, bindings=None)[source]¶

Returns:	A new `FeatureTreeEdge` formed from this edge. The new edge’s dot position is increased by `1`, and its end index will be replaced by `new_end`.
Return type:	FeatureTreeEdge
Parameters:	new_end (int) – The new end index. bindings (dict) – Bindings for the new edge.

next_with_bindings()[source]¶

unicode_repr()¶: Return repr(self).

variables()[source]¶

Returns:	The set of variables used by this edge.
Return type:	set(Variable)

class nltk.parse.featurechart.InstantiateVarsChart(tokens)[source]¶

Bases: nltk.parse.featurechart.FeatureChart

A specialized chart that ‘instantiates’ variables whose names start with ‘@’, by replacing them with unique new variables. In particular, whenever a complete edge is added to the chart, any variables in the edge’s lhs whose names start with ‘@’ will be replaced by unique new ``Variable``s.

initialize()[source]¶: Clear the chart.

insert(edge, child_pointer_list)[source]¶

Parameters:	edge (EdgeI) – The new edge child_pointer_lists (sequence of tuple(EdgeI)) – A sequence of lists of the edges that were used to form this edge. This list is used to reconstruct the trees (or partial trees) that are associated with `edge`.
Return type:	bool

inst_vars(edge)[source]¶

instantiate_edge(edge)[source]¶

If the edge is a FeatureTreeEdge, and it is complete, then instantiate all variables whose names start with ‘@’, by replacing them with unique new variables.

Note that instantiation is done in-place, since the parsing algorithms might already hold a reference to the edge for future use.

nltk.parse.featurechart.demo(print_times=True, print_grammar=True, print_trees=True, print_sentence=True, trace=1, parser=<class 'nltk.parse.featurechart.FeatureChartParser'>, sent='I saw John with a dog with my cookie')[source]¶

nltk.parse.featurechart.demo_grammar()[source]¶

nltk.parse.featurechart.run_profile()[source]¶

nltk.parse.generate module¶

nltk.parse.generate.demo(N=23)[source]¶

nltk.parse.generate.generate(grammar, start=None, depth=None, n=None)[source]¶

Generates an iterator of all sentences from a CFG.

Parameters:	grammar – The Grammar used to generate sentences. start – The Nonterminal from which to start generate sentences. depth – The maximal depth of the generated tree. n – The maximum number of sentences to return.
Returns:	An iterator of lists of terminal tokens.

nltk.parse.malt module¶

class nltk.parse.malt.MaltParser(parser_dirname, model_filename=None, tagger=None, additional_java_args=None)[source]¶

Bases: nltk.parse.api.ParserI

A class for dependency parsing with MaltParser. The input is the paths to: - a maltparser directory - (optionally) the path to a pre-trained MaltParser .mco model file - (optionally) the tagger to use for POS tagging before parsing - (optionally) additional Java arguments

Example:

>>> from nltk.parse import malt
>>> # With MALT_PARSER and MALT_MODEL environment set.
>>> mp = malt.MaltParser('maltparser-1.7.2', 'engmalt.linear-1.7.mco') 
>>> mp.parse_one('I shot an elephant in my pajamas .'.split()).tree() 
(shot I (elephant an) (in (pajamas my)) .)
>>> # Without MALT_PARSER and MALT_MODEL environment.
>>> mp = malt.MaltParser('/home/user/maltparser-1.7.2/', '/home/user/engmalt.linear-1.7.mco') 
>>> mp.parse_one('I shot an elephant in my pajamas .'.split()).tree() 
(shot I (elephant an) (in (pajamas my)) .)

generate_malt_command(inputfilename, outputfilename=None, mode=None)[source]¶

This function generates the maltparser command use at the terminal.

Parameters:	inputfilename (str) – path to the input file outputfilename (str) – path to the output file

parse_sents(sentences, verbose=False, top_relation_label='null')[source]¶

Use MaltParser to parse multiple sentences. Takes a list of sentences, where each sentence is a list of words. Each sentence will be automatically tagged with this MaltParser instance’s tagger.

Parameters:	sentences – Input sentences to parse
Returns:	iter(DependencyGraph)

parse_tagged_sents(sentences, verbose=False, top_relation_label='null')[source]¶

Use MaltParser to parse multiple POS tagged sentences. Takes multiple sentences where each sentence is a list of (word, tag) tuples. The sentences must have already been tokenized and tagged.

Parameters:	sentences – Input sentences to parse
Returns:	iter(iter(`DependencyGraph`)) the dependency graph

representation of each sentence

train(depgraphs, verbose=False)[source]¶

Train MaltParser from a list of DependencyGraph objects

Parameters:	depgraphs (DependencyGraph) – list of `DependencyGraph` objects for training input data

train_from_file(conll_file, verbose=False)[source]¶: Train MaltParser from a file :param conll_file: str for the filename of the training input data :type conll_file: str

nltk.parse.malt.find_malt_model(model_filename)[source]¶: A module to find pre-trained MaltParser model.

nltk.parse.malt.find_maltparser(parser_dirname)[source]¶: A module to find MaltParser .jar file and its dependencies.

nltk.parse.malt.malt_regex_tagger()[source]¶

nltk.parse.nonprojectivedependencyparser module¶

class nltk.parse.nonprojectivedependencyparser.DemoScorer[source]¶

Bases: nltk.parse.nonprojectivedependencyparser.DependencyScorerI

score(graph)[source]¶

Parameters:	graph (DependencyGraph) – A dependency graph whose set of edges need to be

scored. :rtype: A three-dimensional list of numbers. :return: The score is returned in a multidimensional(3) list, such that the outer-dimension refers to the head, and the inner-dimension refers to the dependencies. For instance, scores[0][1] would reference the list of scores corresponding to arcs from node 0 to node 1. The node’s ‘address’ field can be used to determine its number identification.

For further illustration, a score list corresponding to Fig.2 of Keith Hall’s ‘K-best Spanning Tree Parsing’ paper:

scores = [[[], [5], [1], [1]],

[[], [], [11], [4]], [[], [10], [], [5]], [[], [8], [8], []]]

When used in conjunction with a MaxEntClassifier, each score would correspond to the confidence of a particular edge being classified with the positive training examples.

train(graphs)[source]¶

Parameters:	graphs (list(DependencyGraph)) – A list of dependency graphs to train the scorer.

Typically the edges present in the graphs can be used as positive training examples, and the edges not present as negative examples.

class nltk.parse.nonprojectivedependencyparser.DependencyScorerI[source]¶

Bases: object

A scorer for calculated the weights on the edges of a weighted dependency graph. This is used by a ProbabilisticNonprojectiveParser to initialize the edge weights of a DependencyGraph. While typically this would be done by training a binary classifier, any class that can return a multidimensional list representation of the edge weights can implement this interface. As such, it has no necessary fields.

score(graph)[source]¶

Parameters:	graph (DependencyGraph) – A dependency graph whose set of edges need to be

For further illustration, a score list corresponding to Fig.2 of Keith Hall’s ‘K-best Spanning Tree Parsing’ paper:

scores = [[[], [5], [1], [1]],

[[], [], [11], [4]], [[], [10], [], [5]], [[], [8], [8], []]]

When used in conjunction with a MaxEntClassifier, each score would correspond to the confidence of a particular edge being classified with the positive training examples.

train(graphs)[source]¶

Parameters:	graphs (list(DependencyGraph)) – A list of dependency graphs to train the scorer.

Typically the edges present in the graphs can be used as positive training examples, and the edges not present as negative examples.

class nltk.parse.nonprojectivedependencyparser.NaiveBayesDependencyScorer[source]¶

Bases: nltk.parse.nonprojectivedependencyparser.DependencyScorerI

A dependency scorer built around a MaxEnt classifier. In this particular class that classifier is a NaiveBayesClassifier. It uses head-word, head-tag, child-word, and child-tag features for classification.

>>> from nltk.parse.dependencygraph import DependencyGraph, conll_data2

>>> graphs = [DependencyGraph(entry) for entry in conll_data2.split('\n\n') if entry]
>>> npp = ProbabilisticNonprojectiveParser()
>>> npp.train(graphs, NaiveBayesDependencyScorer())
>>> parses = npp.parse(['Cathy', 'zag', 'hen', 'zwaaien', '.'], ['N', 'V', 'Pron', 'Adj', 'N', 'Punc'])
>>> len(list(parses))
1

score(graph)[source]¶

Converts the graph into a feature-based representation of each edge, and then assigns a score to each based on the confidence of the classifier in assigning it to the positive label. Scores are returned in a multidimensional list.

Parameters:	graph (DependencyGraph) – A dependency graph to score.
Return type:	3 dimensional list
Returns:	Edge scores for the graph parameter.

train(graphs)[source]¶

Trains a NaiveBayesClassifier using the edges present in graphs list as positive examples, the edges not present as negative examples. Uses a feature vector of head-word, head-tag, child-word, and child-tag.

Parameters:	graphs (list(DependencyGraph)) – A list of dependency graphs to train the scorer.

class nltk.parse.nonprojectivedependencyparser.NonprojectiveDependencyParser(dependency_grammar)[source]¶

Bases: object

A non-projective, rule-based, dependency parser. This parser will return the set of all possible non-projective parses based on the word-to-word relations defined in the parser’s dependency grammar, and will allow the branches of the parse tree to cross in order to capture a variety of linguistic phenomena that a projective parser will not.

parse(tokens)[source]¶

Parses the input tokens with respect to the parser’s grammar. Parsing is accomplished by representing the search-space of possible parses as a fully-connected directed graph. Arcs that would lead to ungrammatical parses are removed and a lattice is constructed of length n, where n is the number of input tokens, to represent all possible grammatical traversals. All possible paths through the lattice are then enumerated to produce the set of non-projective parses.

param tokens: A list of tokens to parse. type tokens: list(str) return: An iterator of non-projective parses. rtype: iter(DependencyGraph)

class nltk.parse.nonprojectivedependencyparser.ProbabilisticNonprojectiveParser[source]¶

Bases: object

A probabilistic non-projective dependency parser.

Nonprojective dependencies allows for “crossing branches” in the parse tree which is necessary for representing particular linguistic phenomena, or even typical parses in some languages. This parser follows the MST parsing algorithm, outlined in McDonald(2005), which likens the search for the best non-projective parse to finding the maximum spanning tree in a weighted directed graph.

>>> class Scorer(DependencyScorerI):
...     def train(self, graphs):
...         pass
...
...     def score(self, graph):
...         return [
...             [[], [5],  [1],  [1]],
...             [[], [],   [11], [4]],
...             [[], [10], [],   [5]],
...             [[], [8],  [8],  []],
...         ]

>>> npp = ProbabilisticNonprojectiveParser()
>>> npp.train([], Scorer())

>>> parses = npp.parse(['v1', 'v2', 'v3'], [None, None, None])
>>> len(list(parses))
1

>>> from nltk.grammar import DependencyGrammar

>>> grammar = DependencyGrammar.fromstring('''
... 'taught' -> 'play' | 'man'
... 'man' -> 'the' | 'in'
... 'in' -> 'corner'
... 'corner' -> 'the'
... 'play' -> 'golf' | 'dachshund' | 'to'
... 'dachshund' -> 'his'
... ''')

>>> ndp = NonprojectiveDependencyParser(grammar)
>>> parses = ndp.parse(['the', 'man', 'in', 'the', 'corner', 'taught', 'his', 'dachshund', 'to', 'play', 'golf'])
>>> len(list(parses))
4

best_incoming_arc(node_index)[source]¶

Returns the source of the best incoming arc to the node with address: node_index

Parameters:	node_index (integer.) – The address of the ‘destination’ node,

the node that is arced to.

collapse_nodes(new_node, cycle_path, g_graph, b_graph, c_graph)[source]¶

Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.

Parameters:	new_node (Node.) – A Node (Dictionary) to collapse the cycle nodes into. cycle_path (A list of integers.) – A list of node addresses, each of which is in the cycle. b_graph, c_graph (g_graph,) – Graphs which need to be updated.

compute_max_subtract_score(column_index, cycle_indexes)[source]¶

When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.

Parameters:	column_index (integer.) – A index representing the column of incoming arcs

to a particular node being updated :type cycle_indexes: A list of integers. :param cycle_indexes: Only arcs from cycle nodes are considered. This is a list of such nodes addresses.

compute_original_indexes(new_indexes)[source]¶

As nodes are collapsed into others, they are replaced by the new node in the graph, but it’s still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.

Parameters:	new_indexes (A list of integers.) – A list of node addresses to check for

subsumed nodes.

initialize_edge_scores(graph)[source]¶

Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser’s scorer which was assigned during the training process.

Parameters:	graph (DependencyGraph) – A dependency graph to assign scores to.

original_best_arc(node_index)[source]¶

parse(tokens, tags)[source]¶

Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the DependencyScorerI interface and passing it to the training algorithm.

Parameters:	tokens (list(str)) – A list of words or punctuation to be parsed. tags (list(str)) – A list of tags corresponding by index to the words in the tokens list.
Returns:	An iterator of non-projective parses.
Return type:	iter(DependencyGraph)

train(graphs, dependency_scorer)[source]¶

Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser’s scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.

Parameters:	graphs (list(DependencyGraph)) – A list of dependency graphs to train the scorer. dependency_scorer (DependencyScorerI) – A scorer which implements the `DependencyScorerI` interface.

update_edge_scores(new_node, cycle_path)[source]¶

Updates the edge scores to reflect a collapse operation into new_node.

Parameters:	new_node (A Node.) – The node which cycle nodes are collapsed into. cycle_path (A list of integers.) – A list of node addresses that belong to the cycle.

nltk.parse.nonprojectivedependencyparser.demo()[source]¶

nltk.parse.nonprojectivedependencyparser.hall_demo()[source]¶

nltk.parse.nonprojectivedependencyparser.nonprojective_conll_parse_demo()[source]¶

nltk.parse.nonprojectivedependencyparser.rule_based_demo()[source]¶

nltk.parse.pchart module¶

Classes and interfaces for associating probabilities with tree structures that represent the internal organization of a text. The probabilistic parser module defines BottomUpProbabilisticChartParser.

BottomUpProbabilisticChartParser is an abstract class that implements a bottom-up chart parser for PCFG grammars. It maintains a queue of edges, and adds them to the chart one at a time. The ordering of this queue is based on the probabilities associated with the edges, allowing the parser to expand more likely edges before less likely ones. Each subclass implements a different queue ordering, producing different search strategies. Currently the following subclasses are defined:

InsideChartParser searches edges in decreasing order of their trees’ inside probabilities.

RandomChartParser searches edges in random order.

LongestChartParser searches edges in decreasing order of their location’s length.

The BottomUpProbabilisticChartParser constructor has an optional argument beam_size. If non-zero, this controls the size of the beam (aka the edge queue). This option is most useful with InsideChartParser.

class nltk.parse.pchart.BottomUpProbabilisticChartParser(grammar, beam_size=0, trace=0)[source]¶

Bases: nltk.parse.api.ParserI

An abstract bottom-up parser for PCFG grammars that uses a Chart to record partial results. BottomUpProbabilisticChartParser maintains a queue of edges that can be added to the chart. This queue is initialized with edges for each token in the text that is being parsed. BottomUpProbabilisticChartParser inserts these edges into the chart one at a time, starting with the most likely edges, and proceeding to less likely edges. For each edge that is added to the chart, it may become possible to insert additional edges into the chart; these are added to the queue. This process continues until enough complete parses have been generated, or until the queue is empty.

The sorting order for the queue is not specified by BottomUpProbabilisticChartParser. Different sorting orders will result in different search strategies. The sorting order for the queue is defined by the method sort_queue; subclasses are required to provide a definition for this method.

Variables:	_grammar – The grammar used to parse sentences. _trace – The level of tracing output that should be generated when parsing a text.

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

sort_queue(queue, chart)[source]¶

Sort the given queue of Edge objects, placing the edge that should be tried first at the beginning of the queue. This method will be called after each Edge is added to the queue.

Parameters:	queue (list(Edge)) – The queue of `Edge` objects to sort. Each edge in this queue is an edge that could be added to the chart by the fundamental rule; but that has not yet been added. chart (Chart) – The chart being used to parse the text. This chart can be used to provide extra information for sorting the queue.
Return type:	None

trace(trace=2)[source]¶

Set the level of tracing output that should be generated when parsing a text.

Parameters:	trace (int) – The trace level. A trace level of `0` will generate no tracing output; and higher trace levels will produce more verbose tracing output.
Return type:	None

class nltk.parse.pchart.InsideChartParser(grammar, beam_size=0, trace=0)[source]¶

Bases: nltk.parse.pchart.BottomUpProbabilisticChartParser

A bottom-up parser for PCFG grammars that tries edges in descending order of the inside probabilities of their trees. The “inside probability” of a tree is simply the probability of the entire tree, ignoring its context. In particular, the inside probability of a tree generated by production p with children c[1], c[2], …, c[n] is P(p)P(c[1])P(c[2])…P(c[n]); and the inside probability of a token is 1 if it is present in the text, and 0 if it is absent.

This sorting order results in a type of lowest-cost-first search strategy.

sort_queue(queue, chart)[source]¶

Sort the given queue of edges, in descending order of the inside probabilities of the edges’ trees.

Parameters:	queue (list(Edge)) – The queue of `Edge` objects to sort. Each edge in this queue is an edge that could be added to the chart by the fundamental rule; but that has not yet been added. chart (Chart) – The chart being used to parse the text. This chart can be used to provide extra information for sorting the queue.
Return type:	None

class nltk.parse.pchart.LongestChartParser(grammar, beam_size=0, trace=0)[source]¶

Bases: nltk.parse.pchart.BottomUpProbabilisticChartParser

A bottom-up parser for PCFG grammars that tries longer edges before shorter ones. This sorting order results in a type of best-first search strategy.

sort_queue(queue, chart)[source]¶

Sort the given queue of Edge objects, placing the edge that should be tried first at the beginning of the queue. This method will be called after each Edge is added to the queue.

Parameters:	queue (list(Edge)) – The queue of `Edge` objects to sort. Each edge in this queue is an edge that could be added to the chart by the fundamental rule; but that has not yet been added. chart (Chart) – The chart being used to parse the text. This chart can be used to provide extra information for sorting the queue.
Return type:	None

class nltk.parse.pchart.ProbabilisticBottomUpInitRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

NUM_EDGES = 0¶

apply(chart, grammar)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.pchart.ProbabilisticBottomUpPredictRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

NUM_EDGES = 1¶

apply(chart, grammar, edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.pchart.ProbabilisticFundamentalRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

NUM_EDGES = 2¶

apply(chart, grammar, left_edge, right_edge)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

class nltk.parse.pchart.ProbabilisticLeafEdge(leaf, index)[source]¶

Bases: nltk.parse.chart.LeafEdge

prob()[source]¶

class nltk.parse.pchart.ProbabilisticTreeEdge(prob, *args, **kwargs)[source]¶

Bases: nltk.parse.chart.TreeEdge

static from_production(production, index, p)[source]¶

Return type:	TreeEdge

prob()[source]¶

class nltk.parse.pchart.RandomChartParser(grammar, beam_size=0, trace=0)[source]¶

Bases: nltk.parse.pchart.BottomUpProbabilisticChartParser

A bottom-up parser for PCFG grammars that tries edges in random order. This sorting order results in a random search strategy.

sort_queue(queue, chart)[source]¶

Sort the given queue of Edge objects, placing the edge that should be tried first at the beginning of the queue. This method will be called after each Edge is added to the queue.

Parameters:	queue (list(Edge)) – The queue of `Edge` objects to sort. Each edge in this queue is an edge that could be added to the chart by the fundamental rule; but that has not yet been added. chart (Chart) – The chart being used to parse the text. This chart can be used to provide extra information for sorting the queue.
Return type:	None

class nltk.parse.pchart.SingleEdgeProbabilisticFundamentalRule[source]¶

Bases: nltk.parse.chart.AbstractChartRule

NUM_EDGES = 1¶

apply(chart, grammar, edge1)[source]¶

Parameters:	edges (list(EdgeI)) – A set of existing edges. The number of edges that should be passed to `apply()` is specified by the `NUM_EDGES` class variable.
Return type:	iter(EdgeI)

unicode_repr¶: Return repr(self).

class nltk.parse.pchart.UnsortedChartParser(grammar, beam_size=0, trace=0)[source]¶

Bases: nltk.parse.pchart.BottomUpProbabilisticChartParser

A bottom-up parser for PCFG grammars that tries edges in whatever order.

sort_queue(queue, chart)[source]¶

Sort the given queue of Edge objects, placing the edge that should be tried first at the beginning of the queue. This method will be called after each Edge is added to the queue.

Parameters:	queue (list(Edge)) – The queue of `Edge` objects to sort. Each edge in this queue is an edge that could be added to the chart by the fundamental rule; but that has not yet been added. chart (Chart) – The chart being used to parse the text. This chart can be used to provide extra information for sorting the queue.
Return type:	None

nltk.parse.pchart.demo(choice=None, draw_parses=None, print_parses=None)[source]¶: A demonstration of the probabilistic parsers. The user is prompted to select which demo to run, and how many parses should be found; and then each parser is run on the same demo, and a summary of the results are displayed.

nltk.parse.projectivedependencyparser module¶

class nltk.parse.projectivedependencyparser.ChartCell(x, y)[source]¶

Bases: object

A cell from the parse chart formed when performing the CYK algorithm. Each cell keeps track of its x and y coordinates (though this will probably be discarded), and a list of spans serving as the cell’s entries.

add(span)[source]¶

Appends the given span to the list of spans representing the chart cell’s entries.

Parameters:	span (DependencySpan) – The span to add.

unicode_repr()¶

Returns:	A concise string representation of this `ChartCell`.
Return type:	str.

class nltk.parse.projectivedependencyparser.DependencySpan(start_index, end_index, head_index, arcs, tags)[source]¶

Bases: object

A contiguous span over some part of the input string representing dependency (head -> modifier) relationships amongst words. An atomic span corresponds to only one word so it isn’t a ‘span’ in the conventional sense, as its _start_index = _end_index = _head_index for concatenation purposes. All other spans are assumed to have arcs between all nodes within the start and end indexes of the span, and one head index corresponding to the head word for the entire span. This is the same as the root node if the dependency structure were depicted as a graph.

head_index()[source]¶

Returns:	An value indexing the head of the entire `DependencySpan`.
Return type:	int

unicode_repr()¶

Returns:	A concise string representatino of the `DependencySpan`.
Return type:	str.

class nltk.parse.projectivedependencyparser.ProbabilisticProjectiveDependencyParser[source]¶

Bases: object

A probabilistic, projective dependency parser.

This parser returns the most probable projective parse derived from the probabilistic dependency grammar derived from the train() method. The probabilistic model is an implementation of Eisner’s (1996) Model C, which conditions on head-word, head-tag, child-word, and child-tag. The decoding uses a bottom-up chart-based span concatenation algorithm that’s identical to the one utilized by the rule-based projective parser.

>>> from nltk.parse.dependencygraph import conll_data2

>>> graphs = [
... DependencyGraph(entry) for entry in conll_data2.split('\n\n') if entry
... ]

>>> ppdp = ProbabilisticProjectiveDependencyParser()
>>> ppdp.train(graphs)

>>> sent = ['Cathy', 'zag', 'hen', 'wild', 'zwaaien', '.']
>>> list(ppdp.parse(sent))
[Tree('zag', ['Cathy', 'hen', Tree('zwaaien', ['wild', '.'])])]

compute_prob(dg)[source]¶

Computes the probability of a dependency graph based on the parser’s probability model (defined by the parser’s statistical dependency grammar).

Parameters:	dg (DependencyGraph) – A dependency graph to score.
Returns:	The probability of the dependency graph.
Return type:	int

concatenate(span1, span2)[source]¶

Concatenates the two spans in whichever way possible. This includes rightward concatenation (from the leftmost word of the leftmost span to the rightmost word of the rightmost span) and leftward concatenation (vice-versa) between adjacent spans. Unlike Eisner’s presentation of span concatenation, these spans do not share or pivot on a particular word/word-index.

Returns:	A list of new spans formed through concatenation.
Return type:	list(DependencySpan)

parse(tokens)[source]¶: Parses the list of tokens subject to the projectivity constraint and the productions in the parser’s grammar. This uses a method similar to the span-concatenation algorithm defined in Eisner (1996). It returns the most probable parse derived from the parser’s probabilistic dependency grammar.

train(graphs)[source]¶

Trains a ProbabilisticDependencyGrammar based on the list of input DependencyGraphs. This model is an implementation of Eisner’s (1996) Model C, which derives its statistics from head-word, head-tag, child-word, and child-tag relationships.

Parameters:	graphs – A list of dependency graphs to train from.
Type:	list(DependencyGraph)

class nltk.parse.projectivedependencyparser.ProjectiveDependencyParser(dependency_grammar)[source]¶

Bases: object

A projective, rule-based, dependency parser. A ProjectiveDependencyParser is created with a DependencyGrammar, a set of productions specifying word-to-word dependency relations. The parse() method will then return the set of all parses, in tree representation, for a given input sequence of tokens. Each parse must meet the requirements of the both the grammar and the projectivity constraint which specifies that the branches of the dependency tree are not allowed to cross. Alternatively, this can be understood as stating that each parent node and its children in the parse tree form a continuous substring of the input sequence.

concatenate(span1, span2)[source]¶

Returns:	A list of new spans formed through concatenation.
Return type:	list(DependencySpan)

parse(tokens)[source]¶

Performs a projective dependency parse on the list of tokens using a chart-based, span-concatenation algorithm similar to Eisner (1996).

Parameters:	tokens (list(str)) – The list of input tokens.
Returns:	An iterator over parse trees.
Return type:	iter(Tree)

nltk.parse.projectivedependencyparser.arity_parse_demo()[source]¶: A demonstration showing the creation of a DependencyGrammar in which a specific number of modifiers is listed for a given head. This can further constrain the number of possible parses created by a ProjectiveDependencyParser.

nltk.parse.projectivedependencyparser.demo()[source]¶

nltk.parse.projectivedependencyparser.projective_prob_parse_demo()[source]¶: A demo showing the training and use of a projective dependency parser.

nltk.parse.projectivedependencyparser.projective_rule_parse_demo()[source]¶: A demonstration showing the creation and use of a DependencyGrammar to perform a projective dependency parse.

nltk.parse.recursivedescent module¶

class nltk.parse.recursivedescent.RecursiveDescentParser(grammar, trace=0)[source]¶

Bases: nltk.parse.api.ParserI

A simple top-down CFG parser that parses texts by recursively expanding the fringe of a Tree, and matching it against a text.

RecursiveDescentParser uses a list of tree locations called a “frontier” to remember which subtrees have not yet been expanded and which leaves have not yet been matched against the text. Each tree location consists of a list of child indices specifying the path from the root of the tree to a subtree or a leaf; see the reference documentation for Tree for more information about tree locations.

When the parser begins parsing a text, it constructs a tree containing only the start symbol, and a frontier containing the location of the tree’s root node. It then extends the tree to cover the text, using the following recursive procedure:

If the frontier is empty, and the text is covered by the tree, then return the tree as a possible parse.

If the frontier is empty, and the text is not covered by the tree, then return no parses.

If the first element of the frontier is a subtree, then use CFG productions to “expand” it. For each applicable production, add the expanded subtree’s children to the frontier, and recursively find all parses that can be generated by the new tree and frontier.

If the first element of the frontier is a token, then “match” it against the next token from the text. Remove the token from the frontier, and recursively find all parses that can be generated by the new tree and frontier.

See:	`nltk.grammar`

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

trace(trace=2)[source]¶

Set the level of tracing output that should be generated when parsing a text.

Parameters:	trace (int) – The trace level. A trace level of `0` will generate no tracing output; and higher trace levels will produce more verbose tracing output.
Return type:	None

class nltk.parse.recursivedescent.SteppingRecursiveDescentParser(grammar, trace=0)[source]¶

Bases: nltk.parse.recursivedescent.RecursiveDescentParser

A RecursiveDescentParser that allows you to step through the parsing process, performing a single operation at a time.

The initialize method is used to start parsing a text. expand expands the first element on the frontier using a single CFG production, and match matches the first element on the frontier against the next text token. backtrack undoes the most recent expand or match operation. step performs a single expand, match, or backtrack operation. parses returns the set of parses that have been found by the parser.

Variables:

_history – A list of (rtext, tree, frontier) tripples, containing the previous states of the parser. This history is used to implement the backtrack operation.
_tried_e – A record of all productions that have been tried for a given tree. This record is used by expand to perform the next untried production.
_tried_m – A record of what tokens have been matched for a given tree. This record is used by step to decide whether or not to match a token.

See:

nltk.grammar

backtrack()[source]¶

Return the parser to its state before the most recent match or expand operation. Calling undo repeatedly return the parser to successively earlier states. If no match or expand operations have been performed, undo will make no changes.

Returns:	true if an operation was successfully undone.
Return type:	bool

currently_complete()[source]¶

Returns:	Whether the parser’s current state represents a complete parse.
Return type:	bool

expand(production=None)[source]¶

Expand the first element of the frontier. In particular, if the first element of the frontier is a subtree whose node type is equal to production’s left hand side, then add a child to that subtree for each element of production’s right hand side. If production is not specified, then use the first untried expandable production. If all expandable productions have been tried, do nothing.

Returns:	The production used to expand the frontier, if an expansion was performed. If no expansion was performed, return None.
Return type:	Production or None

expandable_productions()[source]¶

Returns:	A list of all the productions for which expansions are available for the current parser state.
Return type:	list(Production)

frontier()[source]¶

Returns:	A list of the tree locations of all subtrees that have not yet been expanded, and all leaves that have not yet been matched.
Return type:	list(tuple(int))

initialize(tokens)[source]¶: Start parsing a given text. This sets the parser’s tree to the start symbol, its frontier to the root node, and its remaining text to token['SUBTOKENS'].

match()[source]¶

Match the first element of the frontier. In particular, if the first element of the frontier has the same type as the next text token, then substitute the text token into the tree.

Returns:	The token matched, if a match operation was performed. If no match was performed, return None
Return type:	str or None

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

parses()[source]¶

Returns:	An iterator of the parses that have been found by this parser so far.
Return type:	list of Tree

remaining_text()[source]¶

Returns:	The portion of the text that is not yet covered by the tree.
Return type:	list(str)

set_grammar(grammar)[source]¶

Change the grammar used to parse texts.

Parameters:	grammar (CFG) – The new grammar.

step()[source]¶

Perform a single parsing operation. If an untried match is possible, then perform the match, and return the matched token. If an untried expansion is possible, then perform the expansion, and return the production that it is based on. If backtracking is possible, then backtrack, and return True. Otherwise, return None.

Returns:	None if no operation was performed; a token if a match was performed; a production if an expansion was performed; and True if a backtrack operation was performed.
Return type:	Production or String or bool

tree()[source]¶

Returns:	A partial structure for the text that is currently being parsed. The elements specified by the frontier have not yet been expanded or matched.
Return type:	Tree

untried_expandable_productions()[source]¶

Returns:	A list of all the untried productions for which expansions are available for the current parser state.
Return type:	list(Production)

untried_match()[source]¶

Returns:	Whether the first element of the frontier is a token that has not yet been matched.
Return type:	bool

nltk.parse.recursivedescent.demo()[source]¶: A demonstration of the recursive descent parser.

nltk.parse.shiftreduce module¶

class nltk.parse.shiftreduce.ShiftReduceParser(grammar, trace=0)[source]¶

Bases: nltk.parse.api.ParserI

A simple bottom-up CFG parser that uses two operations, “shift” and “reduce”, to find a single parse for a text.

ShiftReduceParser maintains a stack, which records the structure of a portion of the text. This stack is a list of strings and Trees that collectively cover a portion of the text. For example, while parsing the sentence “the dog saw the man” with a typical grammar, ShiftReduceParser will produce the following stack, which covers “the dog saw”:

[(NP: (Det: 'the') (N: 'dog')), (V: 'saw')]

ShiftReduceParser attempts to extend the stack to cover the entire text, and to combine the stack elements into a single tree, producing a complete parse for the sentence.

Initially, the stack is empty. It is extended to cover the text, from left to right, by repeatedly applying two operations:

“shift” moves a token from the beginning of the text to the end of the stack.

“reduce” uses a CFG production to combine the rightmost stack elements into a single Tree.

Often, more than one operation can be performed on a given stack. In this case, ShiftReduceParser uses the following heuristics to decide which operation to perform:

Only shift if no reductions are available.

If multiple reductions are available, then apply the reduction whose CFG production is listed earliest in the grammar.

Note that these heuristics are not guaranteed to choose an operation that leads to a parse of the text. Also, if multiple parses exists, ShiftReduceParser will return at most one of them.

See:	`nltk.grammar`

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

trace(trace=2)[source]¶

Set the level of tracing output that should be generated when parsing a text.

Parameters:	trace (int) – The trace level. A trace level of `0` will generate no tracing output; and higher trace levels will produce more verbose tracing output.
Return type:	None

class nltk.parse.shiftreduce.SteppingShiftReduceParser(grammar, trace=0)[source]¶

Bases: nltk.parse.shiftreduce.ShiftReduceParser

A ShiftReduceParser that allows you to setp through the parsing process, performing a single operation at a time. It also allows you to change the parser’s grammar midway through parsing a text.

The initialize method is used to start parsing a text. shift performs a single shift operation, and reduce performs a single reduce operation. step will perform a single reduce operation if possible; otherwise, it will perform a single shift operation. parses returns the set of parses that have been found by the parser.

Variables:	_history – A list of `(stack, remaining_text)` pairs, containing all of the previous states of the parser. This history is used to implement the `undo` operation.
See:	`nltk.grammar`

initialize(tokens)[source]¶: Start parsing a given text. This sets the parser’s stack to [] and sets its remaining text to tokens.

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

parses()[source]¶

Returns:	An iterator of the parses that have been found by this parser so far.
Return type:	iter(Tree)

reduce(production=None)[source]¶

Use production to combine the rightmost stack elements into a single Tree. If production does not match the rightmost stack elements, then do nothing.

Returns:	The production used to reduce the stack, if a reduction was performed. If no reduction was performed, return None.
Return type:	Production or None

reducible_productions()[source]¶

Returns:	A list of the productions for which reductions are available for the current parser state.
Return type:	list(Production)

remaining_text()[source]¶

Returns:	The portion of the text that is not yet covered by the stack.
Return type:	list(str)

set_grammar(grammar)[source]¶

Change the grammar used to parse texts.

Parameters:	grammar (CFG) – The new grammar.

shift()[source]¶

Move a token from the beginning of the remaining text to the end of the stack. If there are no more tokens in the remaining text, then do nothing.

Returns:	True if the shift operation was successful.
Return type:	bool

stack()[source]¶

Returns:	The parser’s stack.
Return type:	list(str and Tree)

step()[source]¶

Perform a single parsing operation. If a reduction is possible, then perform that reduction, and return the production that it is based on. Otherwise, if a shift is possible, then perform it, and return True. Otherwise, return False.

Returns:	False if no operation was performed; True if a shift was performed; and the CFG production used to reduce if a reduction was performed.
Return type:	Production or bool

undo()[source]¶

Return the parser to its state before the most recent shift or reduce operation. Calling undo repeatedly return the parser to successively earlier states. If no shift or reduce operations have been performed, undo will make no changes.

Returns:	true if an operation was successfully undone.
Return type:	bool

nltk.parse.shiftreduce.demo()[source]¶: A demonstration of the shift-reduce parser.

nltk.parse.stanford module¶

class nltk.parse.stanford.GenericStanfordParser(path_to_jar=None, path_to_models_jar=None, model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz', encoding='utf8', verbose=False, java_options='-mx4g', corenlp_options='')[source]¶

Bases: nltk.parse.api.ParserI

Interface to the Stanford Parser

parse_sents(sentences, verbose=False)[source]¶

Use StanfordParser to parse multiple sentences. Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this StanfordParser instance’s tagger. If whitespaces exists inside a token, then the token will be treated as separate tokens.

Parameters:	sentences (list(list(str))) – Input sentences to parse
Return type:	iter(iter(Tree))

raw_parse(sentence, verbose=False)[source]¶

Use StanfordParser to parse a sentence. Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the Stanford Parser.

Parameters:	sentence (str) – Input sentence to parse
Return type:	iter(Tree)

raw_parse_sents(sentences, verbose=False)[source]¶

Use StanfordParser to parse multiple sentences. Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged by the Stanford Parser.

Parameters:	sentences (list(str)) – Input sentences to parse
Return type:	iter(iter(Tree))

tagged_parse(sentence, verbose=False)[source]¶

Use StanfordParser to parse a sentence. Takes a sentence as a list of (word, tag) tuples; the sentence must have already been tokenized and tagged.

Parameters:	sentence (list(tuple(str, str))) – Input sentence to parse
Return type:	iter(Tree)

tagged_parse_sents(sentences, verbose=False)[source]¶

Use StanfordParser to parse multiple sentences. Takes multiple sentences where each sentence is a list of (word, tag) tuples. The sentences must have already been tokenized and tagged.

Parameters:	sentences (list(list(tuple(str, str)))) – Input sentences to parse
Return type:	iter(iter(Tree))

class nltk.parse.stanford.StanfordDependencyParser(*args, **kwargs)[source]¶

Bases: nltk.parse.stanford.GenericStanfordParser

>>> dep_parser=StanfordDependencyParser(
...     model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
... )

>>> [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")] 
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])])]

>>> [list(parse.triples()) for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")] 
[[((u'jumps', u'VBZ'), u'nsubj', (u'fox', u'NN')), ((u'fox', u'NN'), u'det', (u'The', u'DT')),
((u'fox', u'NN'), u'amod', (u'quick', u'JJ')), ((u'fox', u'NN'), u'amod', (u'brown', u'JJ')),
((u'jumps', u'VBZ'), u'nmod', (u'dog', u'NN')), ((u'dog', u'NN'), u'case', (u'over', u'IN')),
((u'dog', u'NN'), u'det', (u'the', u'DT')), ((u'dog', u'NN'), u'amod', (u'lazy', u'JJ'))]]

>>> sum([[parse.tree() for parse in dep_graphs] for dep_graphs in dep_parser.raw_parse_sents((
...     "The quick brown fox jumps over the lazy dog.",
...     "The quick grey wolf jumps over the lazy fox."
... ))], []) 
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])]),
Tree('jumps', [Tree('wolf', ['The', 'quick', 'grey']), Tree('fox', ['over', 'the', 'lazy'])])]

>>> sum([[parse.tree() for parse in dep_graphs] for dep_graphs in dep_parser.parse_sents((
...     "I 'm a dog".split(),
...     "This is my friends ' cat ( the tabby )".split(),
... ))], []) 
[Tree('dog', ['I', "'m", 'a']), Tree('cat', ['This', 'is', Tree('friends', ['my', "'"]), Tree('tabby', ['the'])])]

>>> sum([[list(parse.triples()) for parse in dep_graphs] for dep_graphs in dep_parser.tagged_parse_sents((
...     (
...         ("The", "DT"),
...         ("quick", "JJ"),
...         ("brown", "JJ"),
...         ("fox", "NN"),
...         ("jumped", "VBD"),
...         ("over", "IN"),
...         ("the", "DT"),
...         ("lazy", "JJ"),
...         ("dog", "NN"),
...         (".", "."),
...     ),
... ))],[]) 
[[((u'jumped', u'VBD'), u'nsubj', (u'fox', u'NN')), ((u'fox', u'NN'), u'det', (u'The', u'DT')),
((u'fox', u'NN'), u'amod', (u'quick', u'JJ')), ((u'fox', u'NN'), u'amod', (u'brown', u'JJ')),
((u'jumped', u'VBD'), u'nmod', (u'dog', u'NN')), ((u'dog', u'NN'), u'case', (u'over', u'IN')),
((u'dog', u'NN'), u'det', (u'the', u'DT')), ((u'dog', u'NN'), u'amod', (u'lazy', u'JJ'))]]

class nltk.parse.stanford.StanfordNeuralDependencyParser(*args, **kwargs)[source]¶

Bases: nltk.parse.stanford.GenericStanfordParser

>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> dep_parser=StanfordNeuralDependencyParser(java_options='-mx4g')

>>> [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")] 
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy']), '.'])]

>>> [list(parse.triples()) for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")] 
[[((u'jumps', u'VBZ'), u'nsubj', (u'fox', u'NN')), ((u'fox', u'NN'), u'det',
(u'The', u'DT')), ((u'fox', u'NN'), u'amod', (u'quick', u'JJ')), ((u'fox', u'NN'),
u'amod', (u'brown', u'JJ')), ((u'jumps', u'VBZ'), u'nmod', (u'dog', u'NN')),
((u'dog', u'NN'), u'case', (u'over', u'IN')), ((u'dog', u'NN'), u'det',
(u'the', u'DT')), ((u'dog', u'NN'), u'amod', (u'lazy', u'JJ')), ((u'jumps', u'VBZ'),
u'punct', (u'.', u'.'))]]

>>> sum([[parse.tree() for parse in dep_graphs] for dep_graphs in dep_parser.raw_parse_sents((
...     "The quick brown fox jumps over the lazy dog.",
...     "The quick grey wolf jumps over the lazy fox."
... ))], []) 
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over',
'the', 'lazy']), '.']), Tree('jumps', [Tree('wolf', ['The', 'quick', 'grey']),
Tree('fox', ['over', 'the', 'lazy']), '.'])]

>>> sum([[parse.tree() for parse in dep_graphs] for dep_graphs in dep_parser.parse_sents((
...     "I 'm a dog".split(),
...     "This is my friends ' cat ( the tabby )".split(),
... ))], []) 
[Tree('dog', ['I', "'m", 'a']), Tree('cat', ['This', 'is', Tree('friends',
['my', "'"]), Tree('tabby', ['-LRB-', 'the', '-RRB-'])])]

tagged_parse_sents(sentences, verbose=False)[source]¶: Currently unimplemented because the neural dependency parser (and the StanfordCoreNLP pipeline class) doesn’t support passing in pre- tagged tokens.

class nltk.parse.stanford.StanfordParser(*args, **kwargs)[source]¶

Bases: nltk.parse.stanford.GenericStanfordParser

>>> parser=StanfordParser(
...     model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
... )

>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog")) 
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']),
Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']),
Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

>>> sum([list(dep_graphs) for dep_graphs in parser.raw_parse_sents((
...     "the quick brown fox jumps over the lazy dog",
...     "the quick grey wolf jumps over the lazy fox"
... ))], []) 
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']),
Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']),
Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])]), Tree('ROOT', [Tree('NP',
[Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['grey']), Tree('NN', ['wolf'])]), Tree('NP',
[Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']),
Tree('JJ', ['lazy']), Tree('NN', ['fox'])])])])])])]

>>> sum([list(dep_graphs) for dep_graphs in parser.parse_sents((
...     "I 'm a dog".split(),
...     "This is my friends ' cat ( the tabby )".split(),
... ))], []) 
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['I'])]), Tree('VP', [Tree('VBP', ["'m"]),
Tree('NP', [Tree('DT', ['a']), Tree('NN', ['dog'])])])])]), Tree('ROOT', [Tree('S', [Tree('NP',
[Tree('DT', ['This'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('NP', [Tree('NP', [Tree('NP', [Tree('PRP$', ['my']),
Tree('NNS', ['friends']), Tree('POS', ["'"])]), Tree('NN', ['cat'])]), Tree('PRN', [Tree('-LRB-', [Tree('', []),
Tree('NP', [Tree('DT', ['the']), Tree('NN', ['tabby'])]), Tree('-RRB-', [])])])])])])])]

>>> sum([list(dep_graphs) for dep_graphs in parser.tagged_parse_sents((
...     (
...         ("The", "DT"),
...         ("quick", "JJ"),
...         ("brown", "JJ"),
...         ("fox", "NN"),
...         ("jumped", "VBD"),
...         ("over", "IN"),
...         ("the", "DT"),
...         ("lazy", "JJ"),
...         ("dog", "NN"),
...         (".", "."),
...     ),
... ))],[]) 
[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['quick']), Tree('JJ', ['brown']),
Tree('NN', ['fox'])]), Tree('VP', [Tree('VBD', ['jumped']), Tree('PP', [Tree('IN', ['over']), Tree('NP',
[Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])]), Tree('.', ['.'])])])]

nltk.parse.stanford.setup_module(module)[source]¶

nltk.parse.transitionparser module¶

class nltk.parse.transitionparser.Configuration(dep_graph)[source]¶

Bases: object

Class for holding configuration which is the partial analysis of the input sentence. The transition based parser aims at finding set of operators that transfer the initial configuration to the terminal configuration.

The configuration includes:

Stack: for storing partially proceeded words
Buffer: for storing remaining input words
Set of arcs: for storing partially built dependency tree

This class also provides a method to represent a configuration as list of features.

extract_features()[source]¶: Extract the set of features for the current configuration. Implement standard features as describe in Table 3.2 (page 31) in Dependency Parsing book by Sandra Kubler, Ryan McDonal, Joakim Nivre. Please note that these features are very basic. :return: list(str)

class nltk.parse.transitionparser.Transition(alg_option)[source]¶

Bases: object

This class defines a set of transition which is applied to a configuration to get another configuration Note that for different parsing algorithm, the transition is different.

LEFT_ARC = 'LEFTARC'¶

REDUCE = 'REDUCE'¶

RIGHT_ARC = 'RIGHTARC'¶

SHIFT = 'SHIFT'¶

left_arc(conf, relation)[source]¶

Note that the algorithm for left-arc is quite similar except for precondition for both arc-standard and arc-eager

param configuration:
	is the current configuration

:return : A new configuration or -1 if the pre-condition is not satisfied

reduce(conf)[source]¶

Note that the algorithm for reduce is only available for arc-eager

param configuration:
	is the current configuration

:return : A new configuration or -1 if the pre-condition is not satisfied

right_arc(conf, relation)[source]¶

Note that the algorithm for right-arc is DIFFERENT for arc-standard and arc-eager

param configuration:
	is the current configuration

:return : A new configuration or -1 if the pre-condition is not satisfied

shift(conf)[source]¶

Note that the algorithm for shift is the SAME for arc-standard and arc-eager

param configuration:
	is the current configuration

:return : A new configuration or -1 if the pre-condition is not satisfied

class nltk.parse.transitionparser.TransitionParser(algorithm)[source]¶

Bases: nltk.parse.api.ParserI

Class for transition based parser. Implement 2 algorithms which are “arc-standard” and “arc-eager”

ARC_EAGER = 'arc-eager'¶

ARC_STANDARD = 'arc-standard'¶

parse(depgraphs, modelFile)[source]¶

Parameters:	depgraphs (list(DependencyGraph)) – the list of test sentence, each sentence is represented as a dependency graph where the ‘head’ information is dummy modelfile (str) – the model file
Returns:	list (DependencyGraph) with the ‘head’ and ‘rel’ information

train(depgraphs, modelfile, verbose=True)[source]¶: :param depgraphs : list of DependencyGraph as the training data :type depgraphs : DependencyGraph :param modelfile : file name to save the trained model :type modelfile : str

nltk.parse.transitionparser.demo()[source]¶

>>> from nltk.parse import DependencyGraph, DependencyEvaluator
>>> from nltk.parse.transitionparser import TransitionParser, Configuration, Transition
>>> gold_sent = DependencyGraph("""
... Economic  JJ     2      ATT
... news  NN     3       SBJ
... has       VBD       0       ROOT
... little      JJ      5       ATT
... effect   NN     3       OBJ
... on     IN      5       ATT
... financial       JJ       8       ATT
... markets    NNS      6       PC
... .    .      3       PU
... """)

>>> conf = Configuration(gold_sent)

###################### Check the Initial Feature ########################

>>> print(', '.join(conf.extract_features()))
STK_0_POS_TOP, BUF_0_FORM_Economic, BUF_0_LEMMA_Economic, BUF_0_POS_JJ, BUF_1_FORM_news, BUF_1_POS_NN, BUF_2_POS_VBD, BUF_3_POS_JJ

###################### Check The Transition ####################### Check the Initialized Configuration >>> print(conf) Stack : [0] Buffer : [1, 2, 3, 4, 5, 6, 7, 8, 9] Arcs : []

Do some transition checks for ARC-STANDARD

>>> operation = Transition('arc-standard')
>>> operation.shift(conf)
>>> operation.left_arc(conf, "ATT")
>>> operation.shift(conf)
>>> operation.left_arc(conf,"SBJ")
>>> operation.shift(conf)
>>> operation.shift(conf)
>>> operation.left_arc(conf, "ATT")
>>> operation.shift(conf)
>>> operation.shift(conf)
>>> operation.shift(conf)
>>> operation.left_arc(conf, "ATT")

Middle Configuration and Features Check >>> print(conf) Stack : [0, 3, 5, 6] Buffer : [8, 9] Arcs : [(2, ‘ATT’, 1), (3, ‘SBJ’, 2), (5, ‘ATT’, 4), (8, ‘ATT’, 7)]

>>> print(', '.join(conf.extract_features()))
STK_0_FORM_on, STK_0_LEMMA_on, STK_0_POS_IN, STK_1_POS_NN, BUF_0_FORM_markets, BUF_0_LEMMA_markets, BUF_0_POS_NNS, BUF_1_FORM_., BUF_1_POS_., BUF_0_LDEP_ATT

>>> operation.right_arc(conf, "PC")
>>> operation.right_arc(conf, "ATT")
>>> operation.right_arc(conf, "OBJ")
>>> operation.shift(conf)
>>> operation.right_arc(conf, "PU")
>>> operation.right_arc(conf, "ROOT")
>>> operation.shift(conf)

Terminated Configuration Check >>> print(conf) Stack : [0] Buffer : [] Arcs : [(2, ‘ATT’, 1), (3, ‘SBJ’, 2), (5, ‘ATT’, 4), (8, ‘ATT’, 7), (6, ‘PC’, 8), (5, ‘ATT’, 6), (3, ‘OBJ’, 5), (3, ‘PU’, 9), (0, ‘ROOT’, 3)]

Do some transition checks for ARC-EAGER

>>> conf = Configuration(gold_sent)
>>> operation = Transition('arc-eager')
>>> operation.shift(conf)
>>> operation.left_arc(conf,'ATT')
>>> operation.shift(conf)
>>> operation.left_arc(conf,'SBJ')
>>> operation.right_arc(conf,'ROOT')
>>> operation.shift(conf)
>>> operation.left_arc(conf,'ATT')
>>> operation.right_arc(conf,'OBJ')
>>> operation.right_arc(conf,'ATT')
>>> operation.shift(conf)
>>> operation.left_arc(conf,'ATT')
>>> operation.right_arc(conf,'PC')
>>> operation.reduce(conf)
>>> operation.reduce(conf)
>>> operation.reduce(conf)
>>> operation.right_arc(conf,'PU')
>>> print(conf)
Stack : [0, 3, 9]  Buffer : []   Arcs : [(2, 'ATT', 1), (3, 'SBJ', 2), (0, 'ROOT', 3), (5, 'ATT', 4), (3, 'OBJ', 5), (5, 'ATT', 6), (8, 'ATT', 7), (6, 'PC', 8), (3, 'PU', 9)]

###################### Check The Training Function #######################

A. Check the ARC-STANDARD training >>> import tempfile >>> import os >>> input_file = tempfile.NamedTemporaryFile(prefix=’transition_parse.train’, dir=tempfile.gettempdir(), delete=False)

>>> parser_std = TransitionParser('arc-standard')
>>> print(', '.join(parser_std._create_training_examples_arc_std([gold_sent], input_file)))
 Number of training examples : 1
 Number of valid (projective) examples : 1
SHIFT, LEFTARC:ATT, SHIFT, LEFTARC:SBJ, SHIFT, SHIFT, LEFTARC:ATT, SHIFT, SHIFT, SHIFT, LEFTARC:ATT, RIGHTARC:PC, RIGHTARC:ATT, RIGHTARC:OBJ, SHIFT, RIGHTARC:PU, RIGHTARC:ROOT, SHIFT

>>> parser_std.train([gold_sent],'temp.arcstd.model', verbose=False)
 Number of training examples : 1
 Number of valid (projective) examples : 1
>>> remove(input_file.name)

Check the ARC-EAGER training

>>> input_file = tempfile.NamedTemporaryFile(prefix='transition_parse.train', dir=tempfile.gettempdir(),delete=False)
>>> parser_eager = TransitionParser('arc-eager')
>>> print(', '.join(parser_eager._create_training_examples_arc_eager([gold_sent], input_file)))
 Number of training examples : 1
 Number of valid (projective) examples : 1
SHIFT, LEFTARC:ATT, SHIFT, LEFTARC:SBJ, RIGHTARC:ROOT, SHIFT, LEFTARC:ATT, RIGHTARC:OBJ, RIGHTARC:ATT, SHIFT, LEFTARC:ATT, RIGHTARC:PC, REDUCE, REDUCE, REDUCE, RIGHTARC:PU

>>> parser_eager.train([gold_sent],'temp.arceager.model', verbose=False)
 Number of training examples : 1
 Number of valid (projective) examples : 1

>>> remove(input_file.name)

###################### Check The Parsing Function ########################

Check the ARC-STANDARD parser

>>> result = parser_std.parse([gold_sent], 'temp.arcstd.model')
>>> de = DependencyEvaluator(result, [gold_sent])
>>> de.eval() >= (0, 0)
True

B. Check the ARC-EAGER parser >>> result = parser_eager.parse([gold_sent], ‘temp.arceager.model’) >>> de = DependencyEvaluator(result, [gold_sent]) >>> de.eval() >= (0, 0) True

Remove test temporary files >>> remove(‘temp.arceager.model’) >>> remove(‘temp.arcstd.model’)

Note that result is very poor because of only one training example.

nltk.parse.util module¶

Utility functions for parsers.

class nltk.parse.util.TestGrammar(grammar, suite, accept=None, reject=None)[source]¶

Bases: object

Unit tests for CFG.

run(show_trees=False)[source]¶

Sentences in the test suite are divided into two classes:

grammatical (accept) and
ungrammatical (reject).

If a sentence should parse accordng to the grammar, the value of trees will be a non-empty list. If a sentence should be rejected according to the grammar, then the value of trees will be None.

nltk.parse.util.extract_test_sentences(string, comment_chars='#%;', encoding=None)[source]¶

Parses a string with one test sentence per line. Lines can optionally begin with:

a bool, saying if the sentence is grammatical or not, or

an int, giving the number of parse trees is should have,

The result information is followed by a colon, and then the sentence. Empty lines and lines beginning with a comment char are ignored.

Returns:	a list of tuple of sentences and expected results, where a sentence is a list of str, and a result is None, or bool, or int
Parameters:	comment_chars – `str` of possible comment characters. encoding – the encoding of the string, if it is binary

nltk.parse.util.load_parser(grammar_url, trace=0, parser=None, chart_class=None, beam_size=0, **load_args)[source]¶

Load a grammar from a file, and build a parser based on that grammar. The parser depends on the grammar format, and might also depend on properties of the grammar itself.

The following grammar formats are currently supported:

'cfg' (CFGs: CFG)
'pcfg' (probabilistic CFGs: PCFG)
'fcfg' (feature-based CFGs: FeatureGrammar)

Parameters:

grammar_url (str) – A URL specifying where the grammar is located. The default protocol is "nltk:", which searches for the file in the the NLTK data package.
trace (int) – The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.
parser – The class used for parsing; should be ChartParser or a subclass. If None, the class depends on the grammar format.
chart_class – The class used for storing the chart; should be Chart or a subclass. Only used for CFGs and feature CFGs. If None, the chart class depends on the grammar format.
beam_size (int) – The maximum length for the parser’s edge queue. Only used for probabilistic CFGs.
load_args – Keyword parameters used when loading the grammar. See data.load for more information.

nltk.parse.util.taggedsent_to_conll(sentence)[source]¶

A module to convert a single POS tagged sentence into CONLL format.

>>> from nltk import word_tokenize, pos_tag
>>> text = "This is a foobar sentence."
>>> for line in taggedsent_to_conll(pos_tag(word_tokenize(text))):
...         print(line, end="")
    1       This    _       DT      DT      _       0       a       _       _
    2       is      _       VBZ     VBZ     _       0       a       _       _
    3       a       _       DT      DT      _       0       a       _       _
    4       foobar  _       JJ      JJ      _       0       a       _       _
    5       sentence        _       NN      NN      _       0       a       _       _
    6       .               _       .       .       _       0       a       _       _

Parameters:	sentence (list(tuple(str, str))) – A single input sentence to parse
Return type:	iter(str)
Returns:	a generator yielding a single sentence in CONLL format.

nltk.parse.util.taggedsents_to_conll(sentences)[source]¶

A module to convert the a POS tagged document stream (i.e. list of list of tuples, a list of sentences) and yield lines in CONLL format. This module yields one line per word and two newlines for end of sentence.

>>> from nltk import word_tokenize, sent_tokenize, pos_tag
>>> text = "This is a foobar sentence. Is that right?"
>>> sentences = [pos_tag(word_tokenize(sent)) for sent in sent_tokenize(text)]
>>> for line in taggedsents_to_conll(sentences):
...     if line:
...         print(line, end="")
1   This    _       DT      DT      _       0       a       _       _
2   is      _       VBZ     VBZ     _       0       a       _       _
3   a       _       DT      DT      _       0       a       _       _
4   foobar  _       JJ      JJ      _       0       a       _       _
5   sentence        _       NN      NN      _       0       a       _       _
6   .               _       .       .       _       0       a       _       _


1   Is      _       VBZ     VBZ     _       0       a       _       _
2   that    _       IN      IN      _       0       a       _       _
3   right   _       NN      NN      _       0       a       _       _
4   ?       _       .       .       _       0       a       _       _

Parameters:	sentences – Input sentences to parse
Return type:	iter(str)
Returns:	a generator yielding sentences in CONLL format.

nltk.parse.viterbi module¶

class nltk.parse.viterbi.ViterbiParser(grammar, trace=0)[source]¶

Bases: nltk.parse.api.ParserI

A bottom-up PCFG parser that uses dynamic programming to find the single most likely parse for a text. The ViterbiParser parser parses texts by filling in a “most likely constituent table”. This table records the most probable tree representation for any given span and node value. In particular, it has an entry for every start index, end index, and node value, recording the most likely subtree that spans from the start index to the end index, and has the given node value.

The ViterbiParser parser fills in this table incrementally. It starts by filling in all entries for constituents that span one element of text (i.e., entries where the end index is one greater than the start index). After it has filled in all table entries for constituents that span one element of text, it fills in the entries for constitutants that span two elements of text. It continues filling in the entries for constituents spanning larger and larger portions of the text, until the entire table has been filled. Finally, it returns the table entry for a constituent spanning the entire text, whose node value is the grammar’s start symbol.

In order to find the most likely constituent with a given span and node value, the ViterbiParser parser considers all productions that could produce that node value. For each production, it finds all children that collectively cover the span and have the node values specified by the production’s right hand side. If the probability of the tree formed by applying the production to the children is greater than the probability of the current entry in the table, then the table is updated with this new tree.

A pseudo-code description of the algorithm used by ViterbiParser is:

Create an empty most likely constituent table, MLC.

For width in 1…len(text):

For start in 1…len(text)-width:

For prod in grammar.productions:

For each sequence of subtrees [t[1], t[2], …, t[n]] in MLC,

where t[i].label()==prod.rhs[i],

and the sequence covers [start:start+width]:

old_p = MLC[start, start+width, prod.lhs]

new_p = P(t[1])P(t[1])…P(t[n])P(prod)

if new_p > old_p:

new_tree = Tree(prod.lhs, t[1], t[2], …, t[n])

MLC[start, start+width, prod.lhs] = new_tree

Return MLC[0, len(text), start_symbol]

Variables:	_grammar – The grammar used to parse sentences. _trace – The level of tracing output that should be generated when parsing a text.

grammar()[source]¶

Returns:	The grammar used by this parser.

parse(tokens)[source]¶

Returns:	An iterator that generates parse trees for the sentence.

When possible this list is sorted from most likely to least likely.

Parameters:	sent (list(str)) – The sentence to be parsed
Return type:	iter(Tree)

trace(trace=2)[source]¶

Set the level of tracing output that should be generated when parsing a text.

Parameters:	trace (int) – The trace level. A trace level of `0` will generate no tracing output; and higher trace levels will produce more verbose tracing output.
Return type:	None

unicode_repr()¶: Return repr(self).

nltk.parse.viterbi.demo()[source]¶: A demonstration of the probabilistic parsers. The user is prompted to select which demo to run, and how many parses should be found; and then each parser is run on the same demo, and a summary of the results are displayed.

Module contents¶

NLTK Parsers

Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text’s “parses”. Typically, the text is a single sentence, and the tree structure represents the syntactic structure of the sentence. However, parsers can also be used in other domains. For example, parsers can be used to derive the morphological structure of the morphemes that make up a word, or to derive the discourse structure for a set of utterances.

Sometimes, a single piece of text can be represented by more than one tree structure. Texts represented by more than one tree structure are called “ambiguous” texts. Note that there are actually two ways in which a text can be ambiguous:

The text has multiple correct parses.

There is not enough information to decide which of several candidate parses is correct.

However, the parser module does not distinguish these two types of ambiguity.

The parser module defines ParserI, a standard interface for parsing texts; and two simple implementations of that interface, ShiftReduceParser and RecursiveDescentParser. It also contains three sub-modules for specialized kinds of parsing:

nltk.parser.chart defines chart parsing, which uses dynamic programming to efficiently parse texts.

nltk.parser.probabilistic defines probabilistic parsing, which associates a probability with each parse.

nltk.parse package¶

Submodules¶

nltk.parse.api module¶

nltk.parse.bllip module¶

nltk.parse.chart module¶

nltk.parse.corenlp module¶

nltk.parse.dependencygraph module¶

nltk.parse.earleychart module¶

nltk.parse.evaluate module¶

nltk.parse.featurechart module¶

nltk.parse.generate module¶

nltk.parse.malt module¶

nltk.parse.nonprojectivedependencyparser module¶

nltk.parse.pchart module¶

nltk.parse.projectivedependencyparser module¶

nltk.parse.recursivedescent module¶

nltk.parse.shiftreduce module¶

nltk.parse.stanford module¶

nltk.parse.transitionparser module¶

nltk.parse.util module¶

nltk.parse.viterbi module¶

Module contents¶

Table Of Contents

Search