NLTK corpus readers. The modules in this package provide functions
that can be used to read corpus files in a variety of formats. These
functions can be used to read both the corpus files that are distributed
in the NLTK corpus package, and corpus files that are part of external
corpora.
Additionally, corpus reader functions can be given lists of item
names; in which case, they will return a concatenation of the
corresponding documents.
Corpus reader functions are named based on the type of information
they return. Some common examples, and their return types, are:
Metadata about the NLTK corpora, and their individual documents, is
stored using Open Language Archives Community (OLAC) metadata
records. These records can be accessed using
nltk.corpus.corpus.olac()
.
|
abc = <PlaintextCorpusReader in '/usr/share/nltk_data/corpora/...
|
|
alpino = <AlpinoCorpusReader in '/usr/share/nltk_data/corpora/...
|
|
brown = <CategorizedTaggedCorpusReader in '/usr/share/nltk_dat...
|
|
cess_cat = <BracketParseCorpusReader in '/usr/share/nltk_data/...
|
|
cess_esp = <BracketParseCorpusReader in '/usr/share/nltk_data/...
|
|
cmudict = <CMUDictCorpusReader in '/usr/share/nltk_data/corpor...
|
|
conll2000 = <ConllChunkCorpusReader in '/usr/share/nltk_data/c...
|
|
conll2002 = <ConllChunkCorpusReader in '/usr/share/nltk_data/c...
|
|
floresta = <BracketParseCorpusReader in '/usr/share/nltk_data/...
|
|
genesis = <PlaintextCorpusReader in '/usr/share/nltk_data/corp...
|
|
gutenberg = <PlaintextCorpusReader in '/usr/share/nltk_data/co...
|
|
hebrew_treebank = LazyCorpusLoader('hebrew_treebank', BracketP...
|
|
ieer = <IEERCorpusReader in '/usr/share/nltk_data/corpora/ieer'>
|
|
inaugural = <PlaintextCorpusReader in '/usr/share/nltk_data/co...
|
|
indian = <IndianCorpusReader in '/usr/share/nltk_data/corpora/...
|
|
mac_morpho = <MacMorphoCorpusReader in '/usr/share/nltk_data/c...
|
|
movie_reviews = <CategorizedPlaintextCorpusReader in '/usr/sha...
|
|
names = <WordListCorpusReader in '/usr/share/nltk_data/corpora...
|
|
nps_chat = <NPSChatCorpusReader in '/usr/share/nltk_data/corpo...
|
|
ppattach = <PPAttachmentCorpusReader in '/usr/share/nltk_data/...
|
|
propbank = <PropbankCorpusReader in '/usr/share/nltk_data/corp...
|
|
qc = <StringCategoryCorpusReader in '/usr/share/nltk_data/corp...
|
|
reuters = <CategorizedPlaintextCorpusReader in '/usr/share/nlt...
|
|
rte = <RTECorpusReader in '/usr/share/nltk_data/corpora/rte'>
|
|
senseval = <SensevalCorpusReader in '/usr/share/nltk_data/corp...
|
|
shakespeare = <XMLCorpusReader in '/usr/share/nltk_data/corpor...
|
|
sinica_treebank = <SinicaTreebankCorpusReader in '/usr/share/n...
|
|
state_union = <PlaintextCorpusReader in '/usr/share/nltk_data/...
|
|
stopwords = <WordListCorpusReader in '/usr/share/nltk_data/cor...
|
|
timit = <TimitCorpusReader in '/usr/share/nltk_data/corpora/ti...
|
|
toolbox = <ToolboxCorpusReader in '/usr/share/nltk_data/corpor...
|
|
treebank = <BracketParseCorpusReader in '/usr/share/nltk_data/...
|
|
treebank_chunk = <ChunkedCorpusReader in '/usr/share/nltk_data...
|
|
treebank_raw = <PlaintextCorpusReader in '/usr/share/nltk_data...
|
|
udhr = <PlaintextCorpusReader in '/usr/share/nltk_data/corpora...
|
|
verbnet = <VerbnetCorpusReader in '/usr/share/nltk_data/corpor...
|
|
webtext = <PlaintextCorpusReader in '/usr/share/nltk_data/corp...
|
|
words = <WordListCorpusReader in '/usr/share/nltk_data/corpora...
|
|
ycoe = LazyCorpusLoader('ycoe', YCOECorpusReader)
|