nltk.twitter package

Submodules

nltk.twitter.api module

This module provides an interface for TweetHandlers, and support for timezone handling.

class nltk.twitter.api.BasicTweetHandler(limit=20)[source]

Bases: object

Minimal implementation of TweetHandler.

Counts the number of Tweets and decides when the client should stop fetching them.

counter = None

A flag to indicate to the client whether to stop fetching data given some condition (e.g., reaching a date limit).

do_continue()[source]

Returns False if the client should stop fetching Tweets.

do_stop = None

Stores the id of the last fetched Tweet to handle pagination.

class nltk.twitter.api.LocalTimezoneOffsetWithUTC[source]

Bases: datetime.tzinfo

This is not intended to be a general purpose class for dealing with the local timezone. In particular:

  • it assumes that the date passed has been created using datetime(..., tzinfo=Local), where Local is an instance of the object LocalTimezoneOffsetWithUTC;
  • for such an object, it returns the offset with UTC, used for date comparisons.

Reference: https://docs.python.org/3/library/datetime.html

DSTOFFSET = datetime.timedelta(0, 37800)
STDOFFSET = datetime.timedelta(0, 34200)
utcoffset(dt)[source]

Access the relevant time offset.

class nltk.twitter.api.TweetHandlerI(limit=20, upper_date_limit=None, lower_date_limit=None)[source]

Bases: nltk.twitter.api.BasicTweetHandler

Interface class whose subclasses should implement a handle method that Twitter clients can delegate to.

check_date_limit(data, verbose=False)[source]

Validate date limits.

handle(data)[source]

Deal appropriately with data returned by the Twitter API

on_finish()[source]

Actions when the tweet limit has been reached

nltk.twitter.common module

Utility functions for the :module:`twitterclient` module which do not require the twython library to have been installed.

nltk.twitter.common.extract_fields(tweet, fields)[source]

Extract field values from a full tweet and return them as a list

Parameters:
  • tweet (json) – The tweet in JSON format
  • fields (list) – The fields to be extracted from the tweet
Return type:

list(str)

nltk.twitter.common.get_header_field_list(main_fields, entity_type, entity_fields)[source]
nltk.twitter.common.json2csv(fp, outfile, fields, encoding='utf8', errors='replace', gzip_compress=False)[source]

Extract selected fields from a file of line-separated JSON tweets and write to a file in CSV format.

This utility function allows a file of full tweets to be easily converted to a CSV file for easier processing. For example, just TweetIDs or just the text content of the Tweets can be extracted.

Additionally, the function allows combinations of fields of other Twitter objects (mainly the users, see below).

For Twitter entities (e.g. hashtags of a Tweet), and for geolocation, see json2csv_entities

Parameters:
  • infile (str) – The name of the file containing full tweets
  • outfile (str) – The name of the text file where results should be written
  • fields (list) – The list of fields to be extracted. Useful examples are ‘id_str’ for the tweetID and ‘text’ for the text of the tweet. See <https://dev.twitter.com/overview/api/tweets> for a full list of fields. e. g.: [‘id_str’], [‘id’, ‘text’, ‘favorite_count’, ‘retweet_count’] Additonally, it allows IDs from other Twitter objects, e. g., [‘id’, ‘text’, ‘user.id’, ‘user.followers_count’, ‘user.friends_count’]
  • error – Behaviour for encoding errors, see https://docs.python.org/3/library/codecs.html#codec-base-classes
  • gzip_compress – if True, output files are compressed with gzip
nltk.twitter.common.json2csv_entities(tweets_file, outfile, main_fields, entity_type, entity_fields, encoding='utf8', errors='replace', gzip_compress=False)[source]

Extract selected fields from a file of line-separated JSON tweets and write to a file in CSV format.

This utility function allows a file of full Tweets to be easily converted to a CSV file for easier processing of Twitter entities. For example, the hashtags or media elements of a tweet can be extracted.

It returns one line per entity of a Tweet, e.g. if a tweet has two hashtags there will be two lines in the output file, one per hashtag

Parameters:
  • tweets_file – the file-like object containing full Tweets
  • outfile (str) – The path of the text file where results should be written
  • main_fields (list) – The list of fields to be extracted from the main object, usually the tweet. Useful examples: ‘id_str’ for the tweetID. See <https://dev.twitter.com/overview/api/tweets> for a full list of fields.

e. g.: [‘id_str’], [‘id’, ‘text’, ‘favorite_count’, ‘retweet_count’] If entity_type is expressed with hierarchy, then it is the list of fields of the object that corresponds to the key of the entity_type, (e.g., for entity_type=’user.urls’, the fields in the main_fields list belong to the user object; for entity_type=’place.bounding_box’, the files in the main_field list belong to the place object of the tweet).

Parameters:
  • entity_type (list) – The name of the entity: ‘hashtags’, ‘media’, ‘urls’ and ‘user_mentions’ for the tweet object. For a user object, this needs to be expressed with a hierarchy: ‘user.urls’. For the bounding box of the Tweet location, use ‘place.bounding_box’.
  • entity_fields (list) – The list of fields to be extracted from the entity. E.g. [‘text’] (of the Tweet)
  • error – Behaviour for encoding errors, see https://docs.python.org/3/library/codecs.html#codec-base-classes
  • gzip_compress – if True, ouput files are compressed with gzip
nltk.twitter.common.outf_writer_compat(outfile, encoding, errors, gzip_compress=False)[source]

Identify appropriate CSV writer given the Python version

nltk.twitter.twitter_demo module

nltk.twitter.twitterclient module

nltk.twitter.util module

Module contents

NLTK Twitter Package

This package contains classes for retrieving Tweet documents using the Twitter API.