Shortcuts

ModelTokenizer

class torchtune.modules.transforms.tokenizers.ModelTokenizer(*args, **kwargs)[source]

Abstract tokenizer that implements model-specific special token logic in the tokenize_messages method. See Llama3Tokenizer for an example implementation of this protocol.

tokenize_messages(messages: list[torchtune.data._messages.Message], **kwargs: dict[str, Any]) tuple[list[int], list[bool]][source]

Given a list of messages, return a list of tokens and list of masks for the concatenated and formatted messages.

Parameters:
  • messages (list[Message]) – The list of messages to tokenize.

  • **kwargs (dict[str, Any]) – kwargs.

Returns:

The list of token ids and the list of masks.

Return type:

tuple[list[int], list[bool]]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources