Package nltk :: Package corpus :: Package reader :: Module timit :: Class TimitCorpusReader
[hide private]
[frames] | no frames]

Class TimitCorpusReader

source code

      object --+    
               |    
api.CorpusReader --+
                   |
                  TimitCorpusReader

Reader for the TIMIT corpus (or any other corpus with the same file layout and use of file formats). The corpus root directory should contain the following files:

In addition, the root directory should contain one subdirectory for each speaker, containing three files for each utterance:

Instance Methods [hide private]
 
__init__(self, root, encoding=None)
Construct a new TIMIT corpus reader in the given directory.
source code
 
files(self, filetype=None)
Return a list of file identifiers for the files that make up this corpus.
source code
 
utterances(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None)
Returns: A list of the utterance identifiers for all utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.
source code
 
transcription_dict(self)
Returns: A dictionary giving the 'standard' transcription for each word.
source code
 
spkrid(self, utterance) source code
 
sentid(self, utterance) source code
 
utterance(self, spkrid, sentid) source code
 
spkrutterances(self, speaker)
Returns: A list of all utterances associated with a given speaker.
source code
 
spkrinfo(self, speaker)
Returns: A dictionary mapping ..
source code
 
phones(self, utterances=None) source code
 
phone_times(self, utterances=None)
offset is represented as a number of 16kHz samples!
source code
 
words(self, utterances=None) source code
 
word_times(self, utterances=None) source code
 
sents(self, utterances=None) source code
 
sent_times(self, utterances=None) source code
 
phone_trees(self, utterances=None) source code
 
wav(self, utterance, start=0, end=None) source code
 
audiodata(self, utterance, start=0, end=None) source code
 
_utterance_files(self, utterances, extension) source code
 
play(self, utterance, start=0, end=None)
Play the given audio sample.
source code

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Deprecated since 0.9.1
 
spkritems(*args, **kwargs) source code

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

    Deprecated since 0.8
 
tokenized(*args, **kwargs) source code
 
phonetic(*args, **kwargs) source code
Class Variables [hide private]
  _FILE_RE = '(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|timitdic\\.tx...
A regexp matchin files that are used by this corpus reader.
  _UTTERANCE_RE = '\\w+-\\w+/\\w+\\.txt'
Instance Variables [hide private]
  _utterances
A list of the utterance identifiers for all utterances in this corpus.

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details [hide private]

__init__(self, root, encoding=None)
(Constructor)

source code 

Construct a new TIMIT corpus reader in the given directory.

Parameters:
  • root - The root directory for this corpus.
Overrides: api.CorpusReader.__init__

files(self, filetype=None)

source code 

Return a list of file identifiers for the files that make up this corpus.

Parameters:
  • filetype - If specified, then filetype indicates that only the files that have the given type should be returned. Accepted values are: txt, wrd, phn, wav, or metadata,
Overrides: api.CorpusReader.files

utterances(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None)

source code 
Returns:
A list of the utterance identifiers for all utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.

transcription_dict(self)

source code 
Returns:
A dictionary giving the 'standard' transcription for each word.

spkrutterances(self, speaker)

source code 
Returns:
A list of all utterances associated with a given speaker.

spkrinfo(self, speaker)

source code 
Returns:
A dictionary mapping .. something.

play(self, utterance, start=0, end=None)

source code 

Play the given audio sample.

Parameters:
  • utterance - The utterance id of the sample to play

spkritems(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use utterances(spkrid=...) instead.")

Deprecated: Use utterances(spkrid=...) instead.

tokenized(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .sents() or .sent_times() instead.")

Deprecated: Use .sents() or .sent_times() instead.

phonetic(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .phones() or .phone_times() instead.")

Deprecated: Use .phones() or .phone_times() instead.


Class Variable Details [hide private]

_FILE_RE

A regexp matchin files that are used by this corpus reader.

Value:
'(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|timitdic\\.txt|spkrinfo\\.txt'