easydel.modules.mistral3.mistral3_tokenizer#
- class easydel.modules.mistral3.mistral3_tokenizer.Mistral3Tokenizer(mistral_tokenizer: None)[source]#
Bases:
objectA wrapper class to make the mistral-common tokenizer behave like a Hugging Face transformers tokenizer. This is useful for maintaining a consistent API in projects that might use various tokenizers.
- mistral_tokenizer#
The original MistralTokenizer instance.
- pad_token_id#
The ID of the padding token.
- eos_token_id#
The ID of the end-of-sequence token.
- bos_token_id#
The ID of the beginning-of-sequence token.
- apply_chat_template(conversation: list[dict[str, str]], tokenize: bool = True, add_special_tokens: bool = True, padding: bool = False, truncation: bool = False, max_length: int | None = None, return_tensors: str | None = None, **kwargs) str | list[int] | dict[str, Any][source]#
Applies a chat template to a conversation history.
- Parameters
conversation – A list of message dictionaries, each with ‘role’ and ‘content’.
tokenize – If False, returns the formatted string. If True, tokenizes it.
add_special_tokens – Whether to add special tokens.
padding – Whether to pad the sequences.
truncation – Whether to truncate the sequences.
max_length – The maximum length for truncation or padding.
return_tensors – The tensor format for the output (e.g., ‘np’).
- Returns
The processed output, which can be a string, list of IDs, or a dict.
- batch_encode_plus(*args, **kwargs) dict[str, Any][source]#
Alias for __call__ for Hugging Face compatibility.
- decode(token_ids: list[int], skip_special_tokens: bool = True) str[source]#
Decodes a list of token IDs back into a string.
- Parameters
token_ids – The list of token IDs to decode.
skip_special_tokens – Whether to remove special tokens from the decoded string.
- Returns
The decoded text string.
- encode(text: str, add_special_tokens: bool = True) list[int][source]#
Encodes a single string into a list of token IDs.
This method maps the add_special_tokens flag to the bos and eos arguments of the underlying Mistral tokenizer.
- Parameters
text – The input text to encode.
add_special_tokens – Whether to add special tokens (BOS/EOS).
- Returns
A list of token IDs.