nltk.test.unit package

Submodules

nltk.test.unit.test_2x_compat module

Unit tests for nltk.compat. See also nltk/test/compat.doctest.

class nltk.test.unit.test_2x_compat.TestFraction(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_unnoramlize_fraction()[source]
class nltk.test.unit.test_2x_compat.TestTextTransliteration(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_repr()[source]
test_str()[source]
txt = <Text: São Tomé and Príncipe...>
nltk.test.unit.test_2x_compat.setup_module(module)[source]

nltk.test.unit.test_aline module

Unit tests for nltk.metrics.aline

class nltk.test.unit.test_aline.TestAline(methodName='runTest')[source]

Bases: unittest.case.TestCase

Test Aline algorithm for aligning phonetic sequences

test_aline()[source]
test_aline_delta()[source]

Test aline for computing the difference between two segments

nltk.test.unit.test_chunk module

class nltk.test.unit.test_chunk.TestChunkRule(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_tag_pattern2re_pattern_quantifier()[source]

Test for bug https://github.com/nltk/nltk/issues/1597

Ensures that curly bracket quantifiers can be used inside a chunk rule. This type of quantifier has been used for the supplementary example in http://www.nltk.org/book/ch07.html#exploring-text-corpora.

nltk.test.unit.test_classify module

Unit tests for nltk.classify. See also: nltk/test/classify.doctest

nltk.test.unit.test_classify.assert_classifier_correct(algorithm)[source]
nltk.test.unit.test_classify.test_megam()[source]
nltk.test.unit.test_classify.test_tadm()[source]

nltk.test.unit.test_collocations module

class nltk.test.unit.test_collocations.TestBigram(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_bigram2()[source]
test_bigram3()[source]
test_bigram5()[source]
nltk.test.unit.test_collocations.close_enough(x, y)[source]

Verify that two sequences of n-gram association values are within _EPSILON of each other.

nltk.test.unit.test_corpora module

class nltk.test.unit.test_corpora.TestCess(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_catalan()[source]
test_esp()[source]
class nltk.test.unit.test_corpora.TestCoNLL2007(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_parsed_sents()[source]
test_sents()[source]
class nltk.test.unit.test_corpora.TestFloresta(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_words()[source]
class nltk.test.unit.test_corpora.TestIndian(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_tagged_words()[source]
test_words()[source]
class nltk.test.unit.test_corpora.TestMWAPPDB(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_entries()[source]
test_fileids()[source]
class nltk.test.unit.test_corpora.TestPTB(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_categories()[source]
test_category_words()[source]
test_fileids()[source]
test_news_fileids()[source]
test_tagged_words()[source]
test_words()[source]
class nltk.test.unit.test_corpora.TestSinicaTreebank(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_parsed_sents()[source]
test_sents()[source]
class nltk.test.unit.test_corpora.TestUdhr(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_raw_unicode()[source]
test_words()[source]

nltk.test.unit.test_corpus_views module

Corpus View Regression Tests

class nltk.test.unit.test_corpus_views.TestCorpusViews(methodName='runTest')[source]

Bases: unittest.case.TestCase

data()[source]
linetok = <nltk.tokenize.simple.LineTokenizer object>
names = ['corpora/inaugural/README', 'corpora/inaugural/1793-Washington.txt', 'corpora/inaugural/1909-Taft.txt']
test_correct_length()[source]
test_correct_values()[source]

nltk.test.unit.test_hmm module

nltk.test.unit.test_hmm.setup_module(module)[source]
nltk.test.unit.test_hmm.test_backward_probability()[source]
nltk.test.unit.test_hmm.test_forward_probability()[source]
nltk.test.unit.test_hmm.test_forward_probability2()[source]

nltk.test.unit.test_json2csv_corpus module

Regression tests for json2csv() and json2csv_entities() in Twitter package.

class nltk.test.unit.test_json2csv_corpus.TestJSON2CSV(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
tearDown()[source]
test_file_is_wrong()[source]

Sanity check that file comparison is not giving false positives.

test_retweet_original_tweet()[source]
test_textoutput()[source]
test_tweet_hashtag()[source]
test_tweet_media()[source]
test_tweet_metadata()[source]
test_tweet_place()[source]
test_tweet_place_boundingbox()[source]
test_tweet_url()[source]
test_tweet_usermention()[source]
test_user_metadata()[source]
test_userurl()[source]
nltk.test.unit.test_json2csv_corpus.are_files_identical(filename1, filename2, debug=False)[source]

Compare two files, ignoring carriage returns.

nltk.test.unit.test_naivebayes module

class nltk.test.unit.test_naivebayes.NaiveBayesClassifierTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_simple()[source]

nltk.test.unit.test_seekable_unicode_stream_reader module

The following test performs a random series of reads, seeks, and tells, and checks that the results are consistent.

nltk.test.unit.test_seekable_unicode_stream_reader.check_reader(unicode_string, encoding, n=1000)[source]
nltk.test.unit.test_seekable_unicode_stream_reader.teardown_module(module=None)[source]
nltk.test.unit.test_seekable_unicode_stream_reader.test_reader()[source]
nltk.test.unit.test_seekable_unicode_stream_reader.test_reader_on_large_string()[source]

nltk.test.unit.test_senna module

Unit tests for Senna

class nltk.test.unit.test_senna.TestSennaPipeline(methodName='runTest')[source]

Bases: unittest.case.TestCase

Unittest for nltk.classify.senna

test_senna_pipeline()[source]

Senna pipeline interface

class nltk.test.unit.test_senna.TestSennaTagger(methodName='runTest')[source]

Bases: unittest.case.TestCase

Unittest for nltk.tag.senna

test_senna_chunk_tagger()[source]
test_senna_ner_tagger()[source]
test_senna_tagger()[source]

nltk.test.unit.test_stem module

class nltk.test.unit.test_stem.PorterTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_oed_bug()[source]

Test for bug https://github.com/nltk/nltk/issues/1581

Ensures that ‘oed’ can be stemmed without throwing an error.

test_vocabulary_martin_mode()[source]

Tests all words from the test vocabulary provided by M Porter

The sample vocabulary and output were sourced from:
http://tartarus.org/martin/PorterStemmer/voc.txt http://tartarus.org/martin/PorterStemmer/output.txt

and are linked to from the Porter Stemmer algorithm’s homepage at

test_vocabulary_nltk_mode()[source]
test_vocabulary_original_mode()[source]
class nltk.test.unit.test_stem.SnowballTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_german()[source]
test_russian()[source]
test_short_strings_bug()[source]
test_spanish()[source]

nltk.test.unit.test_tag module

nltk.test.unit.test_tag.setup_module(module)[source]
nltk.test.unit.test_tag.test_basic()[source]

nltk.test.unit.test_tgrep module

Unit tests for nltk.tgrep.

class nltk.test.unit.test_tgrep.TestSequenceFunctions(methodName='runTest')[source]

Bases: unittest.case.TestCase

Class containing unit tests for nltk.tgrep.

test_bad_operator()[source]

Test error handling of undefined tgrep operators.

test_comments()[source]

Test that comments are correctly filtered out of tgrep search strings.

test_examples()[source]

Test the Basic Examples from the TGrep2 manual.

test_labeled_nodes()[source]

Test labeled nodes.

Test case from Emily M. Bender.

test_multiple_conjs()[source]

Test that multiple (3 or more) conjunctions of node relations are handled properly.

test_node_encoding()[source]

Test that tgrep search strings handles bytes and strs the same way.

test_node_nocase()[source]

Test selecting nodes using case insensitive node names.

test_node_noleaves()[source]

Test node name matching with the search_leaves flag set to False.

test_node_printing()[source]

Test that the tgrep print operator ‘ is properly ignored.

test_node_quoted()[source]

Test selecting nodes using quoted node names.

test_node_regex()[source]

Test regex matching on nodes.

test_node_regex_2()[source]

Test regex matching on nodes.

test_node_simple()[source]

Test a simple use of tgrep for finding nodes matching a given pattern.

test_node_tree_position()[source]

Test matching on nodes based on NLTK tree position.

test_rel_precedence()[source]

Test matching nodes based on precedence relations.

test_rel_sister_nodes()[source]

Test matching sister nodes in a tree.

test_tokenize_encoding()[source]

Test that tokenization handles bytes and strs the same way.

test_tokenize_examples()[source]

Test tokenization of the TGrep2 manual example patterns.

Test tokenization of basic link types.

test_tokenize_macros()[source]

Test tokenization of macro definitions.

test_tokenize_node_labels()[source]

Test tokenization of labeled nodes.

test_tokenize_nodenames()[source]

Test tokenization of node names.

test_tokenize_quoting()[source]

Test tokenization of quoting.

test_tokenize_segmented_patterns()[source]

Test tokenization of segmented patterns.

test_tokenize_simple()[source]

Simple test of tokenization.

test_trailing_semicolon()[source]

Test that semicolons at the end of a tgrep2 search string won’t cause a parse failure.

test_use_macros()[source]

Test defining and using tgrep2 macros.

tests_rel_dominance()[source]

Test matching nodes based on dominance relations.

tests_rel_indexed_children()[source]

Test matching nodes based on their index in their parent node.

nltk.test.unit.test_tokenize module

Unit tests for nltk.tokenize. See also nltk/test/tokenize.doctest

class nltk.test.unit.test_tokenize.TestTokenize(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_remove_handle()[source]

Test remove_handle() from casual.py with specially crafted edge cases

test_stanford_segmenter_arabic()[source]

Test the Stanford Word Segmenter for Arabic (default config)

test_stanford_segmenter_chinese()[source]

Test the Stanford Word Segmenter for Chinese (default config)

test_tweet_tokenizer()[source]

Test TweetTokenizer using words with special and accented characters.

nltk.test.unit.test_twitter_auth module

nltk.test.unit.utils module

nltk.test.unit.utils.skip(reason)[source]

Unconditionally skip a test.

nltk.test.unit.utils.skipIf(condition, reason)[source]

Skip a test if the condition is true.

Module contents