easydel.inference.tools.parsers.llama_tool_parser#

class easydel.inference.tools.parsers.llama_tool_parser.Llama3JsonToolParser(tokenizer: PreTrainedTokenizerBase)[source]#

Bases: ToolParser

Tool call parser for Llama 3.x and 4 models with JSON format.

Intended for use with the examples/tool_chat_template_llama.jinja template. Handles JSON-formatted tool calls with support for both single and multiple tool invocations separated by semicolons.

Format supported: - Single: {“name”: “func”, “arguments”: {…}} - Multiple: {“name”: “func1”, …}; {“name”: “func2”, …} - Uses <|python_tag|> token as optional marker - Supports both “arguments” and “parameters” field names

Used when –enable-auto-tool-choice –tool-call-parser llama3_json or llama4_json are set.

bot_token#

Special token marking tool calls

Type

str

tool_call_regex#

Pattern for extracting JSON tool calls

Type

re.Pattern

prev_tool_call_arr#

Previous tool calls for streaming comparison

Type

list

extract_tool_calls(model_output: str, request: ChatCompletionRequest) ExtractedToolCallInformation[source]#

Extract tool calls from complete Llama model response.

Extracts JSON content and ignores surrounding plain text. Supports both single JSON and multiple JSONs separated by semicolons. Handles both “arguments” and “parameters” field names for compatibility.

Parameters
  • model_output – Complete model output text

  • request – Original request (unused)

Returns

ExtractedToolCallInformation with parsed JSON tool calls

extract_tool_calls_streaming(previous_text: str, current_text: str, delta_text: str, previous_token_ids: Sequence[int], current_token_ids: Sequence[int], delta_token_ids: Sequence[int], request: ChatCompletionRequest) easydel.inference.openai_api_modules.DeltaMessage | None[source]#

Extract tool calls from streaming model output.

Processes incremental model output to identify partial tool calls and emit appropriate streaming updates. Maintains state across calls to handle incomplete JSON/XML structures.

Parameters
  • previous_text – Text accumulated up to previous call

  • current_text – Text accumulated including current chunk

  • delta_text – New text in current chunk

  • previous_token_ids – Token IDs up to previous call

  • current_token_ids – Token IDs including current chunk

  • delta_token_ids – New token IDs in current chunk

  • request – Original request with tool definitions

Returns

Incremental tool call update, or None if no update

Return type

DeltaMessage

Raises

NotImplementedError – Must be implemented by subclasses

Note

This method is stateful - it uses instance variables to track parsing progress across streaming chunks.