Tokenize the text into s-expressions. For example:
>>> SExprTokenizer().tokenize('(a b (c d)) e f (g)')
['(a b (c d))', 'e', 'f', '(g)']
All parenthases are assumed to mark sexprs. In particular, no special
processing is done to exclude parenthases that occur inside strings, or
following backslash characters.
If the given expression contains non-matching parenthases, then the
behavior of the tokenizer depends on the strict parameter to
the constructor. If strict is True , then raise
a ValueError . If strict is False ,
then any unmatched close parenthases will be listed as their own
s-expression; and the last partial sexpr with unmatched open parenthases
will be listed as its own sexpr:
>>> SExprTokenizer(strict=False).tokenize('c) d) e (f (g')
['c', ')', 'd', ')', 'e', '(f (g']
- Parameters:
text (string or iter(string) ) - the string to be tokenized
- Returns:
- An iterator over tokens (each of which is an s-expression)
- Overrides:
api.TokenizerI.tokenize
|