class documentation
class EncoderState: (source)
Known subclasses: tivars.tokenizer.state.Line, tivars.tokenizer.state.MaxMode, tivars.tokenizer.state.MinMode, tivars.tokenizer.state.SmartMode
Constructor: EncoderState(length)
Base class for encoder states
Each state represents some encoding context which affects tokenization.
| Method | __init__ |
Undocumented |
| Method | munch |
Munch the input string and determine the resulting token, encoder state, and remainder of the string |
| Method | next |
Determines the next encode state given a token |
| Class Variable | max |
The maximum number of tokens to emit before leaving this state |
| Class Variable | mode |
Whether to munch maximally (0) or minimally (-1) |
| Instance Variable | length |
Undocumented |
def munch(self, string:
str, trie: TITokenTrie) -> tuple[ TIToken, str, list[ EncoderState]]:
(source)
¶
Munch the input string and determine the resulting token, encoder state, and remainder of the string
| Parameters | |
string:str | The text string to tokenize |
trie:TITokenTrie | The TokenTrie object to use for tokenization |
| Returns | |
tuple[ | A tuple of the output Token, the remainder of string, and a list of states to add to the stack |
overridden in
tivars.tokenizer.state.Line, tivars.tokenizer.state.SmartModeDetermines the next encode state given a token
The current state is popped from the stack, and the states returned by this method are pushed.
- If the list of returned states is...
- empty, then the encoder is exiting the current state.
- length one, then the encoder's current state is being replaced by a new state.
- length two, then the encoder is entering a new state, able to exit back to this one.
| Parameters | |
token:TIToken | The current token |
| Returns | |
list[ | A list of encoder states to add to the stack |
overridden in
tivars.tokenizer.state.InterpolationStart, tivars.tokenizer.state.MaxMode, tivars.tokenizer.state.MinMode, tivars.tokenizer.state.Name, tivars.tokenizer.state.SmartMode, tivars.tokenizer.state.StringWhether to munch maximally (0) or minimally (-1)