easydel.modules.mistral3.mistral3_tokenizer#

class easydel.modules.mistral3.mistral3_tokenizer.Mistral3Tokenizer(mistral_tokenizer: None)[source]#

Bases: object

A wrapper class to make the mistral-common tokenizer behave like a Hugging Face transformers tokenizer. This is useful for maintaining a consistent API in projects that might use various tokenizers.

mistral_tokenizer#

The original MistralTokenizer instance.

pad_token_id#

The ID of the padding token.

eos_token_id#

The ID of the end-of-sequence token.

bos_token_id#

The ID of the beginning-of-sequence token.

apply_chat_template(conversation: list[dict[str, str]], tokenize: bool = True, add_special_tokens: bool = True, padding: bool = False, truncation: bool = False, max_length: int | None = None, return_tensors: str | None = None, **kwargs) str | list[int] | dict[str, Any][source]#

Applies a chat template to a conversation history.

Parameters
  • conversation – A list of message dictionaries, each with ‘role’ and ‘content’.

  • tokenize – If False, returns the formatted string. If True, tokenizes it.

  • add_special_tokens – Whether to add special tokens.

  • padding – Whether to pad the sequences.

  • truncation – Whether to truncate the sequences.

  • max_length – The maximum length for truncation or padding.

  • return_tensors – The tensor format for the output (e.g., ‘np’).

Returns

The processed output, which can be a string, list of IDs, or a dict.

batch_encode_plus(*args, **kwargs) dict[str, Any][source]#

Alias for __call__ for Hugging Face compatibility.

decode(token_ids: list[int], skip_special_tokens: bool = True) str[source]#

Decodes a list of token IDs back into a string.

Parameters
  • token_ids – The list of token IDs to decode.

  • skip_special_tokens – Whether to remove special tokens from the decoded string.

Returns

The decoded text string.

encode(text: str, add_special_tokens: bool = True) list[int][source]#

Encodes a single string into a list of token IDs.

This method maps the add_special_tokens flag to the bos and eos arguments of the underlying Mistral tokenizer.

Parameters
  • text – The input text to encode.

  • add_special_tokens – Whether to add special tokens (BOS/EOS).

Returns

A list of token IDs.

encode_plus(*args, **kwargs) dict[str, Any][source]#

Alias for __call__ for Hugging Face compatibility.

classmethod from_hf_hub(model_name: str = 'mistralai/Mistral-Nemo-Instruct-2407')[source]#

Creates an instance from a model name on the Hugging Face Hub.

Parameters

model_name – The name of the Mistral model on the Hub.

Returns

An instance of Mistral3Tokenizer.