Package nltk :: Package corpus :: Package reader :: Module api :: Class CategorizedCorpusReader
[hide private]
[frames] | no frames]

Class CategorizedCorpusReader

source code

object --+
         |
        CategorizedCorpusReader
Known Subclasses:

A mixin class used to aid in the implementation of corpus readers for categorized corpora. This class defines the method categories(), which returns a list of the categories for the corpus or for a specified set of files; and overrides files() to take a categories argument, restricting the set of files to be returned.

Subclasses are expected to:

Instance Methods [hide private]
 
__init__(self, kwargs)
Initialize this mapping based on keyword arguments, as follows:
source code
 
_init(self) source code
 
_add(self, file_id, category) source code
 
categories(self, files=None)
Return a list of the categories that are defined for this corpus, or for the file(s) if it is given.
source code
 
files(self, categories=None)
Return a list of file identifiers for the files that make up this corpus, or that make up the given category(s) if specified.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Instance Variables [hide private]
  _f2c
file-to-category mapping
  _c2f
category-to-file mapping
  _pattern
regexp specifying the mapping
  _map
dict specifying the mapping
  _file
filename of file containing the mapping
  _delimiter
delimiter for self._file
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, kwargs)
(Constructor)

source code 

Initialize this mapping based on keyword arguments, as follows:

  • cat_pattern: A regular expression pattern used to find the category for each file identifier. The pattern will be applied to each file identifier, and the first matching group will be used as the category label for that file.
  • cat_map: A dictionary, mapping from file identifiers to category labels.
  • cat_file: The name of a file that contains the mapping from file identifiers to categories. The argument cat_delimiter can be used to specify a delimiter.

The corresponding argument will be deleted from kwargs. If more than one argument is specified, an exception will be raised.

Overrides: object.__init__