Package nltk :: Package corpus :: Package reader :: Module toolbox :: Class StandardFormat
[hide private]
[frames] | no frames]

Class StandardFormat

source code

object --+
         |
        StandardFormat
Known Subclasses:

Class for reading and processing standard format marker files and strings.

Instance Methods [hide private]
 
__init__(self, filename=None, encoding=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
open(self, sfm_file)
Open a standard format marker file for sequential reading.
source code
 
open_string(self, s)
Open a standard format marker string for sequential reading.
source code
iterator over (marker, value) tuples
raw_fields(self)
Return an iterator for the fields in the standard format marker file.
source code
iterator over (marker, value) tuples
fields(self, strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None)
Return an iterator for the fields in the standard format marker file.
source code
 
close(self)
Close a previously opened standard format marker file or string.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, filename=None, encoding=None)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__
(inherited documentation)

open(self, sfm_file)

source code 

Open a standard format marker file for sequential reading.

Parameters:
  • sfm_file (string) - name of the standard format marker input file

open_string(self, s)

source code 

Open a standard format marker string for sequential reading.

Parameters:
  • s (string) - string to parse as a standard format marker input file

raw_fields(self)

source code 

Return an iterator for the fields in the standard format marker file.

Returns: iterator over (marker, value) tuples
an iterator that returns the next field in a (marker, value) tuple. Linebreaks and trailing white space are preserved except for the final newline in each field.

fields(self, strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None)

source code 

Return an iterator for the fields in the standard format marker file.

Parameters:
  • strip (boolean) - strip trailing whitespace from the last line of each field
  • unwrap (boolean) - Convert newlines in a field to spaces.
  • encoding (string or None) - Name of an encoding to use. If it is specified then the fields method returns unicode strings rather than non unicode strings.
  • errors (string) - Error handling scheme for codec. Same as the decode inbuilt string method.
  • unicode_fields (set or dictionary (actually any sequence that supports the 'in' operator).) - Set of marker names whose values are UTF-8 encoded. Ignored if encoding is None. If the whole file is UTF-8 encoded set encoding='utf8' and leave unicode_fields with its default value of None.
Returns: iterator over (marker, value) tuples
an iterator that returns the next field in a (marker, value) tuple. marker and value are unicode strings if an encoding was specified in the fields method. Otherwise they are nonunicode strings.