module documentation

Context-aware text encoder

Function encode Encodes a string of tokens represented as text into a byte stream and its minimum supported OS version
Function normalize Applies NFC normalization to a given string to ensure recognition of certain Unicode characters used as token names
def encode(string: str, *, trie: TokenTrie = None, mode: str = None, normalize: bool = True) -> tuple[bytes, OsVersion]: (source)

Encodes a string of tokens represented as text into a byte stream and its minimum supported OS version

Tokenization is performed using one of three procedures, dictated by mode:
  • max: Always munch maximally, i.e. consume the most input possible to produce a token
  • smart: Munch maximally or minimally depending on context
  • string: Always munch minimally (equivalent to smart string context)
The smart tokenization mode uses the following contexts, munching maximally otherwise:
  • Strings: munch minimally, except when interpolating using Send(
  • Program names: munch minimally up to 8 tokens
  • List names: munch minimally up to 5 tokens
For reference, here are the tokenization modes utilized by popular IDEs and other software:
  • SourceCoder: max
  • TokenIDE: max
  • TI Connect CE: smart
  • TI-Planet Project Builder: smart
  • tivars_lib_cpp: smart

All tokenization modes respect token glyphs for substituting Unicode symbols.

Parameters
string:strThe text string to encode
trie:TokenTrieThe TokenTrie object to use for tokenization (defaults to the TI-84+CE trie)
mode:strThe tokenization mode to use (defaults to smart)
normalize:boolWhether to apply NFC normalization to the input before encoding (defaults to true)
Returns
tuple[bytes, OsVersion]A tuple of a stream of token bytes and a minimum OsVersion
def normalize(string: str): (source)

Applies NFC normalization to a given string to ensure recognition of certain Unicode characters used as token names

Parameters
string:strThe text to normalize
Returns
The text in string normalized