Package nltk :: Module internals
[hide private]
[frames] | no frames]

Module internals

source code

Classes [hide private]
  ParseError
Exception raised by parse_* functions when they fail.
  Deprecated
A base class used to mark deprecated classes.
  Counter
A counter that auto-increments each time its value is read.
  ElementWrapper
A wrapper around ElementTree Element objects whose main purpose is to provide nicer __repr__ and __str__ methods.
Functions [hide private]
str
convert_regexp_to_nongrouping(pattern)
Convert all grouping parenthases in the given regexp pattern to non-grouping parenthases, and return the result.
source code
 
config_java(bin=None, options=None)
Configure nltk's java interface, by letting nltk know where it can find the java binary, and what extra options (if any) should be passed to java when it is run.
source code
 
java(cmd, classpath=None, stdin=None, stdout=None, stderr=None, blocking=True)
Execute the given java command, by opening a subprocess that calls java.
source code
 
parse_str(s, start_position)
If a Python string literal begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the string literal and the position where it ends.
source code
 
parse_int(s, start_position)
If an integer begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the integer and the position where it ends.
source code
 
parse_number(s, start_position)
If an integer or float begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the number and the position where it ends.
source code
 
overridden(method)
Returns: True if method overrides some method with the same name in a base class.
source code
 
_mro(cls)
Return the method resolution order for cls -- i.e., a list containing cls and all its base classes, in the order in which they would be checked by getattr.
source code
 
_add_epytext_field(obj, field, message)
Add an epytext @field to a given object's docstring.
source code
 
deprecated(message)
A decorator used to mark functions as deprecated.
source code
 
find_binary(name, path_to_bin=None, env_vars=(), searchpath=(), binary_names=None, url=None, verbose=True)
Search for the binary for a program that is used by nltk.
source code
 
import_from_stdlib(module)
When python is run from within the nltk/ directory tree, the current directory is included at the beginning of the search path.
source code
 
abstract(func)
A decorator used to mark methods as abstract.
source code
 
slice_bounds(sequence, slice_obj)
Given a slice, return the corresponding (start, stop) bounds, taking into account None indices and negative indices.
source code
Variables [hide private]
  _java_bin = None
  _java_options = []
  NLTK_JAR = '/Volumes/Data/nltk/trunk/nltk/nltk/nltk.jar'
The location of the NLTK jar file, which is used to communicate with external Java packages (such as Mallet) that do not have a sufficiently powerful native command-line interface.
  _STRING_START_RE = re.compile(r'[uU]?[rR]?("""|\'\'\'|"|\')')
  _PARSE_INT_RE = re.compile(r'-?\d+')
  _PARSE_NUMBER_VALUE = re.compile(r'-?(\d*)(\.?\d*)?')
Function Details [hide private]

convert_regexp_to_nongrouping(pattern)

source code 

Convert all grouping parenthases in the given regexp pattern to non-grouping parenthases, and return the result. E.g.:

>>> convert_regexp_to_nongrouping('ab(c(x+)(z*))?d')
'ab(?:c(?:x+)(?:z*))?d'
Parameters:
  • pattern (str)
Returns: str

config_java(bin=None, options=None)

source code 

Configure nltk's java interface, by letting nltk know where it can find the java binary, and what extra options (if any) should be passed to java when it is run.

Parameters:
  • bin (string) - The full path to the java binary. If not specified, then nltk will search the system for a java binary; and if one is not found, it will raise a LookupError exception.
  • options (list of string) - A list of options that should be passed to the java binary when it is called. A common value is ['-Xmx512m'], which tells the java binary to increase the maximum heap size to 512 megabytes. If no options are specified, then do not modify the options list.

java(cmd, classpath=None, stdin=None, stdout=None, stderr=None, blocking=True)

source code 

Execute the given java command, by opening a subprocess that calls java. If java has not yet been configured, it will be configured by calling config_java() with no arguments.

Parameters:
  • cmd (list of string) - The java command that should be called, formatted as a list of strings. Typically, the first string will be the name of the java class; and the remaining strings will be arguments for that java class.
  • classpath (string) - A ':' separated list of directories, JAR archives, and ZIP archives to search for class files.
  • stdin, stdout, stderr - Specify the executed programs' standard input, standard output and standard error file handles, respectively. Valid values are subprocess.PIPE, an existing file descriptor (a positive integer), an existing file object, and None. subprocess.PIPE indicates that a new pipe to the child should be created. With None, no redirection will occur; the child's file handles will be inherited from the parent. Additionally, stderr can be subprocess.STDOUT, which indicates that the stderr data from the applications should be captured into the same file handle as for stdout.
  • blocking - If false, then return immediately after spawning the subprocess. In this case, the return value is the Popen object, and not a (stdout, stderr) tuple.
Returns:
If blocking=True, then return a tuple (stdout, stderr), containing the stdout and stderr outputs generated by the java command if the stdout and stderr parameters were set to subprocess.PIPE; or None otherwise. If blocking=False, then return a subprocess.Popen object.
Raises:
  • OSError - If the java command returns a nonzero return code.

parse_str(s, start_position)

source code 

If a Python string literal begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the string literal and the position where it ends. Otherwise, raise a ParseError.

parse_int(s, start_position)

source code 

If an integer begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the integer and the position where it ends. Otherwise, raise a ParseError.

parse_number(s, start_position)

source code 

If an integer or float begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the number and the position where it ends. Otherwise, raise a ParseError.

overridden(method)

source code 
Parameters:
  • method (instance method)
Returns:
True if method overrides some method with the same name in a base class. This is typically used when defining abstract base classes or interfaces, to allow subclasses to define either of two related methods:
>>> class EaterI:
...     '''Subclass must define eat() or batch_eat().'''
...     def eat(self, food):
...         if overridden(self.batch_eat):
...             return self.batch_eat([food])[0]
...         else:
...             raise NotImplementedError()
...     def batch_eat(self, foods):
...         return [self.eat(food) for food in foods]

_mro(cls)

source code 

Return the method resolution order for cls -- i.e., a list containing cls and all its base classes, in the order in which they would be checked by getattr. For new-style classes, this is just cls.__mro__. For classic classes, this can be obtained by a depth-first left-to-right traversal of __bases__.

deprecated(message)

source code 

A decorator used to mark functions as deprecated. This will cause a warning to be printed the when the function is used. Usage:

>>> @deprecated('Use foo() instead')
>>> def bar(x):
...     print x/10

find_binary(name, path_to_bin=None, env_vars=(), searchpath=(), binary_names=None, url=None, verbose=True)

source code 

Search for the binary for a program that is used by nltk.

Parameters:
  • name - The name of the program
  • path_to_bin - The user-supplied binary location, or None.
  • env_vars - A list of environment variable names to check
  • binary_names - A list of alternative binary names to check.
  • searchpath - List of directories to search.

import_from_stdlib(module)

source code 

When python is run from within the nltk/ directory tree, the current directory is included at the beginning of the search path. Unfortunately, that means that modules within nltk can sometimes shadow standard library modules. As an example, the stdlib 'inspect' module will attempt to import the stdlib 'tokenzie' module, but will instead end up importing NLTK's 'tokenize' module instead (causing the import to fail).

abstract(func)

source code 

A decorator used to mark methods as abstract. I.e., methods that are marked by this decorator must be overridden by subclasses. If an abstract method is called (either in the base class or in a subclass that does not override the base class method), it will raise NotImplementedError.

slice_bounds(sequence, slice_obj)

source code 

Given a slice, return the corresponding (start, stop) bounds, taking into account None indices and negative indices. The following guarantees are made for the returned start and stop values:

  • 0 <= start <= len(sequence)
  • 0 <= stop <= len(sequence)
  • start <= stop
Raises:
  • ValueError - If slice_obj.step is not None.