Package nltk :: Module util :: Class LazyMap
[hide private]
[frames] | no frames]

Class LazyMap

source code

          object --+    
                   |    
AbstractLazySequence --+
                       |
                      LazyMap
Known Subclasses:

A lazy sequence whose elements are formed by applying a given function to each element in one or more underlying lists. The function is applied lazily -- i.e., when you read a value from the list, LazyMap will calculate that value by applying its function to the underlying lists' value(s). LazyMap is essentially a lazy version of the Python primitive function map. In particular, the following two expressions are equivalent:

>>> map(f, sequences...)
>>> list(LazyMap(f, sequences...))

Like the Python map primitive, if the source lists do not have equal size, then the value None will be supplied for the 'missing' elements.

Lazy maps can be useful for conserving memory, in cases where individual values take up a lot of space. This is especially true if the underlying list's values are constructed lazily, as is the case with many corpus readers.

A typical example of a use case for this class is performing feature detection on the tokens in a corpus. Since featuresets are encoded as dictionaries, which can take up a lot of memory, using a LazyMap can significantly reduce memory usage when training and running classifiers.

Instance Methods [hide private]
 
__init__(self, function, *lists, **config)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
iterate_from(self, index)
Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start.
source code
 
__getitem__(self, index)
Return the ith token in the corpus file underlying this corpus view.
source code
 
__len__(self)
Return the number of tokens in the corpus file underlying this corpus view.
source code

Inherited from AbstractLazySequence: __add__, __cmp__, __contains__, __hash__, __iter__, __mul__, __radd__, __repr__, __rmul__, count, index

Inherited from object: __delattr__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables [hide private]

Inherited from AbstractLazySequence (private): _MAX_REPR_SIZE

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, function, *lists, **config)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • function - The function that should be applied to elements of lists. It should take as many arguments as there are lists.
  • lists - The underlying lists.
  • cache_size - Determines the size of the cache used by this lazy map. (default=5)
Overrides: object.__init__

iterate_from(self, index)

source code 

Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.

Overrides: AbstractLazySequence.iterate_from
(inherited documentation)

__getitem__(self, index)
(Indexing operator)

source code 

Return the ith token in the corpus file underlying this corpus view. Negative indices and spans are both supported.

Overrides: AbstractLazySequence.__getitem__
(inherited documentation)

__len__(self)
(Length operator)

source code 

Return the number of tokens in the corpus file underlying this corpus view.

Overrides: AbstractLazySequence.__len__
(inherited documentation)