nltk.test.unit package¶

Subpackages¶

nltk.test.unit.translate package

Submodules¶

nltk.test.unit.test_2x_compat module¶

Unit tests for nltk.compat. See also nltk/test/compat.doctest.

class nltk.test.unit.test_2x_compat.TestFraction(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_unnoramlize_fraction()[source]¶

class nltk.test.unit.test_2x_compat.TestTextTransliteration(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_repr()[source]¶

test_str()[source]¶

txt = <Text: São Tomé and Príncipe...>¶

nltk.test.unit.test_2x_compat.setup_module(module)[source]¶

nltk.test.unit.test_aline module¶

Unit tests for nltk.metrics.aline

class nltk.test.unit.test_aline.TestAline(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

Test Aline algorithm for aligning phonetic sequences

test_aline()[source]¶

test_aline_delta()[source]¶: Test aline for computing the difference between two segments

nltk.test.unit.test_chunk module¶

class nltk.test.unit.test_chunk.TestChunkRule(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_tag_pattern2re_pattern_quantifier()[source]¶

Test for bug https://github.com/nltk/nltk/issues/1597

Ensures that curly bracket quantifiers can be used inside a chunk rule. This type of quantifier has been used for the supplementary example in http://www.nltk.org/book/ch07.html#exploring-text-corpora.

nltk.test.unit.test_classify module¶

Unit tests for nltk.classify. See also: nltk/test/classify.doctest

nltk.test.unit.test_classify.assert_classifier_correct(algorithm)[source]¶

nltk.test.unit.test_classify.test_megam()[source]¶

nltk.test.unit.test_classify.test_tadm()[source]¶

nltk.test.unit.test_collocations module¶

class nltk.test.unit.test_collocations.TestBigram(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_bigram2()[source]¶

test_bigram3()[source]¶

test_bigram5()[source]¶

nltk.test.unit.test_collocations.close_enough(x, y)[source]¶: Verify that two sequences of n-gram association values are within _EPSILON of each other.

nltk.test.unit.test_corpora module¶

class nltk.test.unit.test_corpora.TestCess(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_catalan()[source]¶

test_esp()[source]¶

class nltk.test.unit.test_corpora.TestCoNLL2007(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_parsed_sents()[source]¶

test_sents()[source]¶

class nltk.test.unit.test_corpora.TestFloresta(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_words()[source]¶

class nltk.test.unit.test_corpora.TestIndian(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_tagged_words()[source]¶

test_words()[source]¶

class nltk.test.unit.test_corpora.TestMWAPPDB(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_entries()[source]¶

test_fileids()[source]¶

class nltk.test.unit.test_corpora.TestPTB(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_categories()[source]¶

test_category_words()[source]¶

test_fileids()[source]¶

test_news_fileids()[source]¶

test_tagged_words()[source]¶

test_words()[source]¶

class nltk.test.unit.test_corpora.TestSinicaTreebank(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_parsed_sents()[source]¶

test_sents()[source]¶

class nltk.test.unit.test_corpora.TestUdhr(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_raw_unicode()[source]¶

test_words()[source]¶

nltk.test.unit.test_corpus_views module¶

Corpus View Regression Tests

class nltk.test.unit.test_corpus_views.TestCorpusViews(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

data()[source]¶

linetok = <nltk.tokenize.simple.LineTokenizer object>¶

names = ['corpora/inaugural/README', 'corpora/inaugural/1793-Washington.txt', 'corpora/inaugural/1909-Taft.txt']¶

test_correct_length()[source]¶

test_correct_values()[source]¶

nltk.test.unit.test_hmm module¶

nltk.test.unit.test_hmm.setup_module(module)[source]¶

nltk.test.unit.test_hmm.test_backward_probability()[source]¶

nltk.test.unit.test_hmm.test_forward_probability()[source]¶

nltk.test.unit.test_hmm.test_forward_probability2()[source]¶

nltk.test.unit.test_json2csv_corpus module¶

Regression tests for json2csv() and json2csv_entities() in Twitter package.

class nltk.test.unit.test_json2csv_corpus.TestJSON2CSV(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

setUp()[source]¶

tearDown()[source]¶

test_file_is_wrong()[source]¶: Sanity check that file comparison is not giving false positives.

test_retweet_original_tweet()[source]¶

test_textoutput()[source]¶

test_tweet_hashtag()[source]¶

test_tweet_media()[source]¶

test_tweet_metadata()[source]¶

test_tweet_place()[source]¶

test_tweet_place_boundingbox()[source]¶

test_tweet_url()[source]¶

test_tweet_usermention()[source]¶

test_user_metadata()[source]¶

test_userurl()[source]¶

nltk.test.unit.test_json2csv_corpus.are_files_identical(filename1, filename2, debug=False)[source]¶: Compare two files, ignoring carriage returns.

nltk.test.unit.test_naivebayes module¶

class nltk.test.unit.test_naivebayes.NaiveBayesClassifierTest(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_simple()[source]¶

nltk.test.unit.test_seekable_unicode_stream_reader module¶

The following test performs a random series of reads, seeks, and tells, and checks that the results are consistent.

nltk.test.unit.test_seekable_unicode_stream_reader.check_reader(unicode_string, encoding, n=1000)[source]¶

nltk.test.unit.test_seekable_unicode_stream_reader.teardown_module(module=None)[source]¶

nltk.test.unit.test_seekable_unicode_stream_reader.test_reader()[source]¶

nltk.test.unit.test_seekable_unicode_stream_reader.test_reader_on_large_string()[source]¶

nltk.test.unit.test_senna module¶

Unit tests for Senna

class nltk.test.unit.test_senna.TestSennaPipeline(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

Unittest for nltk.classify.senna

test_senna_pipeline()[source]¶: Senna pipeline interface

class nltk.test.unit.test_senna.TestSennaTagger(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

Unittest for nltk.tag.senna

test_senna_chunk_tagger()[source]¶

test_senna_ner_tagger()[source]¶

test_senna_tagger()[source]¶

nltk.test.unit.test_stem module¶

class nltk.test.unit.test_stem.PorterTest(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_oed_bug()[source]¶

Test for bug https://github.com/nltk/nltk/issues/1581

Ensures that ‘oed’ can be stemmed without throwing an error.

test_vocabulary_martin_mode()[source]¶

Tests all words from the test vocabulary provided by M Porter

The sample vocabulary and output were sourced from:: http://tartarus.org/martin/PorterStemmer/voc.txt http://tartarus.org/martin/PorterStemmer/output.txt

and are linked to from the Porter Stemmer algorithm’s homepage at

http://tartarus.org/martin/PorterStemmer/

test_vocabulary_nltk_mode()[source]¶

test_vocabulary_original_mode()[source]¶

class nltk.test.unit.test_stem.SnowballTest(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_german()[source]¶

test_russian()[source]¶

test_short_strings_bug()[source]¶

test_spanish()[source]¶

nltk.test.unit.test_tag module¶

nltk.test.unit.test_tag.setup_module(module)[source]¶

nltk.test.unit.test_tag.test_basic()[source]¶

nltk.test.unit.test_tgrep module¶

Unit tests for nltk.tgrep.

class nltk.test.unit.test_tgrep.TestSequenceFunctions(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

Class containing unit tests for nltk.tgrep.

test_bad_operator()[source]¶: Test error handling of undefined tgrep operators.

test_comments()[source]¶: Test that comments are correctly filtered out of tgrep search strings.

test_examples()[source]¶: Test the Basic Examples from the TGrep2 manual.

test_labeled_nodes()[source]¶

Test labeled nodes.

Test case from Emily M. Bender.

test_multiple_conjs()[source]¶: Test that multiple (3 or more) conjunctions of node relations are handled properly.

test_node_encoding()[source]¶: Test that tgrep search strings handles bytes and strs the same way.

test_node_nocase()[source]¶: Test selecting nodes using case insensitive node names.

test_node_noleaves()[source]¶: Test node name matching with the search_leaves flag set to False.

test_node_printing()[source]¶: Test that the tgrep print operator ‘ is properly ignored.

test_node_quoted()[source]¶: Test selecting nodes using quoted node names.

test_node_regex()[source]¶: Test regex matching on nodes.

test_node_regex_2()[source]¶: Test regex matching on nodes.

test_node_simple()[source]¶: Test a simple use of tgrep for finding nodes matching a given pattern.

test_node_tree_position()[source]¶: Test matching on nodes based on NLTK tree position.

test_rel_precedence()[source]¶: Test matching nodes based on precedence relations.

test_rel_sister_nodes()[source]¶: Test matching sister nodes in a tree.

test_tokenize_encoding()[source]¶: Test that tokenization handles bytes and strs the same way.

test_tokenize_examples()[source]¶: Test tokenization of the TGrep2 manual example patterns.

test_tokenize_link_types()[source]¶: Test tokenization of basic link types.

test_tokenize_macros()[source]¶: Test tokenization of macro definitions.

test_tokenize_node_labels()[source]¶: Test tokenization of labeled nodes.

test_tokenize_nodenames()[source]¶: Test tokenization of node names.

test_tokenize_quoting()[source]¶: Test tokenization of quoting.

test_tokenize_segmented_patterns()[source]¶: Test tokenization of segmented patterns.

test_tokenize_simple()[source]¶: Simple test of tokenization.

test_trailing_semicolon()[source]¶: Test that semicolons at the end of a tgrep2 search string won’t cause a parse failure.

test_use_macros()[source]¶: Test defining and using tgrep2 macros.

tests_rel_dominance()[source]¶: Test matching nodes based on dominance relations.

tests_rel_indexed_children()[source]¶: Test matching nodes based on their index in their parent node.

nltk.test.unit.test_tokenize module¶

Unit tests for nltk.tokenize. See also nltk/test/tokenize.doctest

class nltk.test.unit.test_tokenize.TestTokenize(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_remove_handle()[source]¶: Test remove_handle() from casual.py with specially crafted edge cases

test_stanford_segmenter_arabic()[source]¶: Test the Stanford Word Segmenter for Arabic (default config)

test_stanford_segmenter_chinese()[source]¶: Test the Stanford Word Segmenter for Chinese (default config)

test_tweet_tokenizer()[source]¶: Test TweetTokenizer using words with special and accented characters.

nltk.test.unit.test_twitter_auth module¶

nltk.test.unit.utils module¶

nltk.test.unit.utils.skip(reason)[source]¶: Unconditionally skip a test.

nltk.test.unit.utils.skipIf(condition, reason)[source]¶: Skip a test if the condition is true.

nltk.test.unit package¶

Subpackages¶

Submodules¶

nltk.test.unit.test_2x_compat module¶

nltk.test.unit.test_aline module¶

nltk.test.unit.test_chunk module¶

nltk.test.unit.test_classify module¶

nltk.test.unit.test_collocations module¶

nltk.test.unit.test_corpora module¶

nltk.test.unit.test_corpus_views module¶

nltk.test.unit.test_hmm module¶

nltk.test.unit.test_json2csv_corpus module¶

nltk.test.unit.test_naivebayes module¶

nltk.test.unit.test_seekable_unicode_stream_reader module¶

nltk.test.unit.test_senna module¶

nltk.test.unit.test_stem module¶

nltk.test.unit.test_tag module¶

nltk.test.unit.test_tgrep module¶

nltk.test.unit.test_tokenize module¶

nltk.test.unit.test_twitter_auth module¶

nltk.test.unit.utils module¶

Module contents¶

Table Of Contents

Search