| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
PunktToken
Stores a token of text with annotations produced during sentence boundary detection.
|
|||
|
|||
|
Inherited from |
|||
| Derived properties | |||
|---|---|---|---|
|
|||
| String representation | |||
|
|||
|
|||
|
|||
_properties =
|
|||
| Regular expressions for properties | |||
|---|---|---|---|
_RE_ELLIPSIS = re.compile(r'\.\.
|
|||
_RE_NUMERIC = re.compile(r'^-
|
|||
_RE_INITIAL = re.compile(r'
|
|||
_RE_ALPHA = re.compile(r'
|
|||
|
|||
| abbr | |||
| ellipsis | |||
| linestart | |||
| parastart | |||
| period_final | |||
| sentbreak | |||
| tok | |||
| type | |||
|
Inherited from |
|||
| Derived properties | |||
|---|---|---|---|
|
type_no_period The type with its final period removed if it has one. |
|||
|
type_no_sentperiod The type with its final period removed if it is marked as a sentence break. |
|||
|
first_upper True if the token's first character is uppercase. |
|||
|
first_lower True if the token's first character is lowercase. |
|||
| first_case | |||
|
is_ellipsis True if the token text is that of an ellipsis. |
|||
|
is_number True if the token text is that of a number. |
|||
|
is_initial True if the token text is that of an initial. |
|||
|
is_alpha True if the token text is all alphabetic. |
|||
|
is_non_punct True if the token is either a number or is alphabetic. |
|||
|
|||
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
|
A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.
|
A string representation akin to that used by Kiss and Strunk.
|
|
|||
_properties
|
|
|||
type_no_periodThe type with its final period removed if it has one.
|
type_no_sentperiodThe type with its final period removed if it is marked as a sentence break.
|
first_upperTrue if the token's first character is uppercase.
|
first_lowerTrue if the token's first character is lowercase.
|
first_case
|
is_ellipsisTrue if the token text is that of an ellipsis.
|
is_numberTrue if the token text is that of a number.
|
is_initialTrue if the token text is that of an initial.
|
is_alphaTrue if the token text is all alphabetic.
|
is_non_punctTrue if the token is either a number or is alphabetic.
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:58 2008 | http://epydoc.sourceforge.net |