Package nltk :: Package corpus :: Package reader :: Module rte

Module rte

Corpus reader for the Recognizing Textual Entailment (RTE) Challenge Corpora.

The files were taken from the RTE1, RTE2 and RTE3 datasets and the filenames were regularized.

Filenames are of the form rte*_dev.xml and rte*_test.xml. The latter are the gold standard annotated files.

Each entailment corpus is a list of 'text'/'hypothesis' pairs. The following example is taken from RTE3:

<pair id="1" entailment="YES" task="IE" length="short" >
   <t>The sale was made to pay Yukos' US$ 27.5 billion tax bill, Yuganskneftegaz was originally sold for US$ 9.4 billion to a little known company Baikalfinansgroup which was later bought by the Russian state-owned oil company Rosneft .</t>
  <h>Baikalfinansgroup was sold to Rosneft.</h>
</pair>

In order to provide globally unique IDs for each pair, a new attribute challenge has been added to the root element entailment-corpus of each file, taking values 1, 2 or 3. The GID is formatted 'm-n', where 'm' is the challenge number and 'n' is the pair ID.

Classes

[hide private]

RTEPair
Container for RTE text-hypothesis pairs.

RTECorpusReader
Corpus reader for corpora in RTE challenges.

Functions

[hide private]

int

norm(value_string)
Normalize the string value in an RTE pair's value or entailment attribute as an integer (1, 0). source code

Function Details

[hide private]

norm(value_string)

source code

Normalize the string value in an RTE pair's value or entailment attribute as an integer (1, 0).

Parameters:

value_string (str) - the label used to classify a text/hypothesis pair

Returns: int