Package nltk :: Package corpus :: Package reader :: Module timit :: Class TimitCorpusReader

Class TimitCorpusReader

      object --+    
               |    
api.CorpusReader --+
                   |
                  TimitCorpusReader

Reader for the TIMIT corpus (or any other corpus with the same file layout and use of file formats). The corpus root directory should contain the following files:

timitdic.txt: dictionary of standard transcriptions
spkrinfo.txt: table of speaker information

In addition, the root directory should contain one subdirectory for each speaker, containing three files for each utterance:

<utterance-id>.txt: text content of utterances
<utterance-id>.wrd: tokenized text content of utterances
<utterance-id>.phn: phonetic transcription of utterances
<utterance-id>.wav: utterance sound file

Instance Methods

[hide private]

__init__(self, root, encoding=None)
Construct a new TIMIT corpus reader in the given directory.

source code

files(self, filetype=None)
Return a list of file identifiers for the files that make up this corpus.

source code

utterances(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None)
Returns: A list of the utterance identifiers for all utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.

source code

transcription_dict(self)
Returns: A dictionary giving the 'standard' transcription for each word.

source code

spkrid(self, utterance)

source code

sentid(self, utterance)

source code

utterance(self, spkrid, sentid)

source code

spkrutterances(self, speaker)
Returns: A list of all utterances associated with a given speaker.

source code

spkrinfo(self, speaker)
Returns: A dictionary mapping ..

source code

phones(self, utterances=None)

source code

phone_times(self, utterances=None)
offset is represented as a number of 16kHz samples!

source code

words(self, utterances=None)

source code

word_times(self, utterances=None)

source code

sents(self, utterances=None)

source code

sent_times(self, utterances=None)

source code

phone_trees(self, utterances=None)

source code

wav(self, utterance, start=0, end=None)

source code

audiodata(self, utterance, start=0, end=None)

source code

_utterance_files(self, utterances, extension)

source code

play(self, utterance, start=0, end=None)
Play the given audio sample.

source code

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Deprecated since 0.9.1

spkritems(*args, **kwargs)

source code

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Deprecated since 0.8

tokenized(*args, **kwargs)

source code

phonetic(*args, **kwargs)

source code

Class Variables

[hide private]

_FILE_RE = '(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|timitdic\\.tx...
A regexp matchin files that are used by this corpus reader.

_UTTERANCE_RE = '\\w+-\\w+/\\w+\\.txt'

Instance Variables

[hide private]

_utterances
A list of the utterance identifiers for all utterances in this corpus.

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties

[hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details

[hide private]

init(self, root, encoding=None)
(Constructor)

source code

Construct a new TIMIT corpus reader in the given directory.

Parameters:

root - The root directory for this corpus.

Overrides: api.CorpusReader.__init__

files(self, filetype=None)

source code

Return a list of file identifiers for the files that make up this corpus.

Parameters:

filetype - If specified, then filetype indicates that only the files that have the given type should be returned. Accepted values are: txt, wrd, phn, wav, or metadata,

Overrides: api.CorpusReader.files

utterances(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None)

source code

Returns:: A list of the utterance identifiers for all utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.

transcription_dict(self)

source code

Returns:: A dictionary giving the 'standard' transcription for each word.

spkrutterances(self, speaker)

source code

Returns:: A list of all utterances associated with a given speaker.

spkrinfo(self, speaker)

source code

Returns:: A dictionary mapping .. something.

play(self, utterance, start=0, end=None)

source code

Play the given audio sample.

Parameters:

utterance - The utterance id of the sample to play

spkritems(*args, **kwargs)

source code

Decorators:

@deprecated("Use utterances(spkrid=...) instead.")

Deprecated: Use utterances(spkrid=...) instead.

tokenized(*args, **kwargs)

source code

Decorators:

@deprecated("Use .sents() or .sent_times() instead.")

Deprecated: Use .sents() or .sent_times() instead.

phonetic(*args, **kwargs)

source code

Decorators:

@deprecated("Use .phones() or .phone_times() instead.")

Deprecated: Use .phones() or .phone_times() instead.

Class Variable Details

[hide private]

_FILE_RE

A regexp matchin files that are used by this corpus reader.

Value:

'(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|timitdic\\.txt|spkrinfo\\.txt'

Class TimitCorpusReader

__init__(self, root, encoding=None) (Constructor)

files(self, filetype=None)

utterances(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None)

transcription_dict(self)

spkrutterances(self, speaker)

spkrinfo(self, speaker)

play(self, utterance, start=0, end=None)

spkritems(*args, **kwargs)

tokenized(*args, **kwargs)

phonetic(*args, **kwargs)

_FILE_RE

init(self, root, encoding=None)
(Constructor)