Package nltk :: Package corpus :: Package reader :: Module util :: Class PickleCorpusView
[hide private]
[frames] | no frames]

Class PickleCorpusView

source code

               object --+        
                        |        
util.AbstractLazySequence --+    
                            |    
       StreamBackedCorpusView --+
                                |
                               PickleCorpusView

A stream backed corpus view for corpus files that consist of sequences of serialized Python objects (serialized using pickle.dump). One use case for this class is to store the result of running feature detection on a corpus to disk. This can be useful when performing feature detection is expensive (so we don't want to repeat it); but the corpus is too large to store in memory. The following example illustrates this technique:

>>> feature_corpus = LazyMap(detect_features, corpus)
>>> PickleCorpusView.write(feature_corpus, some_filename)
>>> pcv = PickledCorpusView(some_filename)
Instance Methods [hide private]
 
__init__(self, filename, delete_on_gc=False)
Create a new corpus view that reads the pickle corpus filename.
source code
list of any
read_block(self, stream)
Read a block from the input stream.
source code
 
__del__(self)
If delete_on_gc was set to true when this PickleCorpusView was created, then delete the corpus view's filename.
source code

Inherited from StreamBackedCorpusView: __add__, __getitem__, __len__, __mul__, __radd__, __rmul__, close, iterate_from

Inherited from StreamBackedCorpusView (private): _open

Inherited from util.AbstractLazySequence: __cmp__, __contains__, __hash__, __iter__, __repr__, count, index

Inherited from object: __delattr__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Methods [hide private]
 
write(cls, sequence, output_file) source code
 
cache_to_tempfile(cls, sequence, delete_on_gc=True)
Write the given sequence to a temporary file as a pickle corpus; and then return a PickleCorpusView view for that temporary corpus file.
source code
Class Variables [hide private]
  BLOCK_SIZE = 100
  PROTOCOL = -1

Inherited from util.AbstractLazySequence (private): _MAX_REPR_SIZE

Instance Variables [hide private]
Properties [hide private]

Inherited from StreamBackedCorpusView: filename

Inherited from object: __class__

Method Details [hide private]

__init__(self, filename, delete_on_gc=False)
(Constructor)

source code 

Create a new corpus view that reads the pickle corpus filename.

Parameters:
  • delete_on_gc - If true, then filename will be deleted whenever this object gets garbage-collected.
Overrides: StreamBackedCorpusView.__init__

read_block(self, stream)

source code 

Read a block from the input stream.

Parameters:
  • stream - an input stream
Returns: list of any
a block of tokens from the input stream
Overrides: StreamBackedCorpusView.read_block
(inherited documentation)

__del__(self)
(Destructor)

source code 

If delete_on_gc was set to true when this PickleCorpusView was created, then delete the corpus view's filename. (This method is called whenever a PickledCorpusView is garbage-collected.

cache_to_tempfile(cls, sequence, delete_on_gc=True)
Class Method

source code 

Write the given sequence to a temporary file as a pickle corpus; and then return a PickleCorpusView view for that temporary corpus file.

Parameters:
  • delete_on_gc - If true, then the temporary file will be deleted whenever this object gets garbage-collected.