Package nltk :: Package tokenize :: Module punkt :: Class PunktToken
[hide private]
[frames] | no frames]

Class PunktToken

source code

object --+
         |
        PunktToken

Stores a token of text with annotations produced during sentence boundary detection.

Instance Methods [hide private]
 
__init__(self, tok, **params)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

    Derived properties
 
_get_type(self)
Returns a case-normalized representation of the token.
source code
    String representation
 
__repr__(self)
A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.
source code
 
__str__(self)
A string representation akin to that used by Kiss and Strunk.
source code
Class Variables [hide private]
  _properties = ['parastart', 'linestart', 'sentbreak', 'abbr', ...
    Regular expressions for properties
  _RE_ELLIPSIS = re.compile(r'\.\.+$')
  _RE_NUMERIC = re.compile(r'^-?[\.,]?\d[\d,\.-]*\.?$')
  _RE_INITIAL = re.compile(r'(?u)[^\W\d]\.$')
  _RE_ALPHA = re.compile(r'(?u)[^\W\d]+$')
Properties [hide private]
  abbr
  ellipsis
  linestart
  parastart
  period_final
  sentbreak
  tok
  type

Inherited from object: __class__

    Derived properties
  type_no_period
The type with its final period removed if it has one.
  type_no_sentperiod
The type with its final period removed if it is marked as a sentence break.
  first_upper
True if the token's first character is uppercase.
  first_lower
True if the token's first character is lowercase.
  first_case
  is_ellipsis
True if the token text is that of an ellipsis.
  is_number
True if the token text is that of a number.
  is_initial
True if the token text is that of an initial.
  is_alpha
True if the token text is all alphabetic.
  is_non_punct
True if the token is either a number or is alphabetic.
Method Details [hide private]

__init__(self, tok, **params)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.

Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 

A string representation akin to that used by Kiss and Strunk.

Overrides: object.__str__

Class Variable Details [hide private]

_properties

Value:
['parastart', 'linestart', 'sentbreak', 'abbr', 'ellipsis']

Property Details [hide private]

type_no_period

The type with its final period removed if it has one.

Get Method:
unreachable.type_no_period(self) - The type with its final period removed if it has one.

type_no_sentperiod

The type with its final period removed if it is marked as a sentence break.

Get Method:
unreachable.type_no_sentperiod(self) - The type with its final period removed if it is marked as a sentence break.

first_upper

True if the token's first character is uppercase.

Get Method:
unreachable.first_upper(self) - True if the token's first character is uppercase.

first_lower

True if the token's first character is lowercase.

Get Method:
unreachable.first_lower(self) - True if the token's first character is lowercase.

first_case

Get Method:
unreachable.first_case(self)

is_ellipsis

True if the token text is that of an ellipsis.

Get Method:
unreachable.is_ellipsis(self) - True if the token text is that of an ellipsis.

is_number

True if the token text is that of a number.

Get Method:
unreachable.is_number(self) - True if the token text is that of a number.

is_initial

True if the token text is that of an initial.

Get Method:
unreachable.is_initial(self) - True if the token text is that of an initial.

is_alpha

True if the token text is all alphabetic.

Get Method:
unreachable.is_alpha(self) - True if the token text is all alphabetic.

is_non_punct

True if the token is either a number or is alphabetic.

Get Method:
unreachable.is_non_punct(self) - True if the token is either a number or is alphabetic.