Tokenizer

class Tokenizer

Encodes text to sequences and decodes sequences to text.

encode

def encode(text)

Encodes a given string into a sequence of indices.

Args
  • text: Text to encode.

decode

def decode(sequence)

Decodees a given sequence into a text.

Args
  • sequence: Sequence to decode.

vocab_size

def vocab_size()

Size of token vocab.