Package nltk :: Module util :: Class AbstractLazySequence
Class AbstractLazySequence

object --+
Known Subclasses:

An abstract base class for read-only sequences whose values are computed as needed. Lazy sequences act like tuples -- they can be indexed, sliced, and iterated over; but they may not be modified.

The most common application of lazy sequences in NLTK is for corpus view objects, which provide access to the contents of a corpus without loading the entire corpus into memory, by loading pieces of the corpus from disk as needed.

The result of modifying a mutable element of a lazy sequence is undefined. In particular, the modifications made to the element may or may not persist, depending on whether and when the lazy sequence caches that element's value or reconstructs it from scratch.

Subclasses are required to define two methods:

Return the number of tokens in the corpus file underlying this corpus view.
iterate_from(self, start)
Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start.
__getitem__(self, i)
Return the ith token in the corpus file underlying this corpus view.
Return an iterator that generates the tokens in the corpus file underlying this corpus view.
count(self, value)
Return the number of times this list contains value.
index(self, value, start=None, stop=None)
Return the index of the first occurance of value in this list that is greater than or equal to start and less than stop.
__contains__(self, value)
Return true if this list contains value.
__add__(self, other)
Return a list concatenating self with other.
__radd__(self, other)
Return a list concatenating other with self.
__mul__(self, count)
Return a list concatenating self with itself count times.
__rmul__(self, count)
Return a list concatenating self with itself count times.
Returns: A string representation for this corpus view that is similar to a list's representation; but if it would be more than 60 characters long, it is truncated.
__cmp__(self, other)
Return a number indicating how self relates to other.
source code
iterate_from(self, start)

Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.

__getitem__(self, i)
Return the ith token in the corpus file underlying this corpus view. Negative indices and spans are both supported.

index(self, value, start=None, stop=None)

Return the index of the first occurance of value in this list that is greater than or equal to start and less than stop. Negative start & stop values are treated like negative slice bounds -- i.e., they count from the end of the list.

A string representation for this corpus view that is similar to a list's representation; but if it would be more than 60 characters long, it is truncated.
__cmp__(self, other)
Return a number indicating how self relates to other.

  • If other is not a corpus view or a list, return -1.
  • Otherwise, return cmp(list(self), list(other)).

Note: corpus views do not compare equal to tuples containing equal elements. Otherwise, transitivity would be violated, since tuples do not compare equal to lists.

  • ValueError - Corpus view objects are unhashable.
