Corpus reader for the XML version of the British National Corpus. For
access to the complete XML data structure, use the xml() method. For access to simple word lists and
tagged word lists, use words(), sents(), tagged_words(), and tagged_sents().
|
|
list of str
|
words(self,
files=None,
strip_space=True,
stem=False)
Returns:
the given file or files as a list of words and punctuation symbols. |
source code
|
|
list of (str,str)
|
tagged_words(self,
files=None,
c5=False,
strip_space=True,
stem=False)
Returns:
the given file or files as a list of tagged words and punctuation
symbols, encoded as tuples (word,tag) . |
source code
|
|
list of (list of str )
|
sents(self,
files=None,
strip_space=True,
stem=False)
Returns:
the given file or files as a list of sentences or utterances, each
encoded as a list of word strings. |
source code
|
|
list of (list of (str,str) )
|
tagged_sents(self,
files=None,
c5=False,
strip_space=True,
stem=False)
Returns:
the given file or files as a list of sentences, each encoded as a
list of (word,tag) tuples. |
source code
|
|
|
_words(self,
filename,
bracket_sent,
tag,
strip_space,
stem)
Helper used to implement the view methods -- returns a list of words
or a list of sentences, optionally tagged. |
source code
|
|
Inherited from xmldocs.XMLCorpusReader :
raw ,
xml
Inherited from api.CorpusReader :
__repr__ ,
abspath ,
abspaths ,
encoding ,
files ,
open
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__str__
|
Inherited from xmldocs.XMLCorpusReader :
read
|
Inherited from api.CorpusReader :
filenames
|