easydel.inference.oai_proxies#

Enhanced FastAPI server that proxies requests to OpenAI API.

This module provides a proxy server that forwards requests to OpenAI’s API while adding EasyDeL-specific monitoring and compatibility features. It enables seamless integration between EasyDeL inference engines and OpenAI-compatible clients.

Classes:

InferenceApiRouter: Main proxy server class with OpenAI API compatibility ServerStatus: Enum for server operational states ServerMetrics: Performance metrics tracking EndpointConfig: API endpoint configuration ErrorResponse: Standardized error response format

Example

>>> from easydel.inference import InferenceApiRouter
>>> # Create a proxy to OpenAI API
>>> router = InferenceApiRouter(
...     api_key="your-api-key",
...     base_url="https://api.openai.com/v1"
... )
>>> router.run(host="0.0.0.0", port=8084)
>>> # Or proxy to a local EasyDeL server
>>> router = InferenceApiRouter(
...     base_url="http://localhost:8000/v1",
...     enable_function_calling=True
... )
>>> router.run()
class easydel.inference.oai_proxies.EndpointConfig(*, path: str, handler: Callable, methods: list[str], summary: str | None = None, tags: list[str] | None = None, response_model: Any = None)[source]#

Bases: BaseModel

Configuration for a FastAPI endpoint.

handler: tp.Callable#
methods: list[str]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

path: str#
response_model: tp.Any#
summary: str | None#
tags: list[str] | None#
class easydel.inference.oai_proxies.ErrorResponse(*, error: dict[str, str], request_id: str | None = None, timestamp: float = <factory>)[source]#

Bases: BaseModel

Standard error response model.

error: dict[str, str]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

request_id: str | None#
timestamp: float#
class easydel.inference.oai_proxies.InferenceApiRouter(api_key: str | None = None, base_url: str | None = None, organization: str | None = None, enable_function_calling: bool = True, **kwargs)[source]#

Bases: object

Enhanced FastAPI server acting as an OpenAI API proxy.

This server provides a complete OpenAI API-compatible interface that can forward requests to either OpenAI’s API or a local EasyDeL inference server. It includes additional monitoring, health check, and function calling endpoints.

The router automatically detects backend capabilities and provides appropriate fallbacks when features are not available.

client#

AsyncOpenAI client for backend communication

app#

FastAPI application instance

status#

Current server status

metrics#

Performance metrics tracker

base_url#

Backend API base URL

enable_function_calling#

Whether function calling is enabled

build_oai_params_from_chat_request(request: ChatCompletionRequest) dict[str, float | int | str | bool | list][source]#

Build OpenAI parameters from chat completion request.

Converts a ChatCompletionRequest object into a dictionary of parameters suitable for the OpenAI API, including function calling parameters if present.

Parameters

request – The chat completion request to convert

Returns

Dictionary of OpenAI API parameters with optional tool/function definitions

build_oai_params_from_request(request: CompletionRequest) dict[str, float | int | str | bool | list][source]#

Build OpenAI parameters from completion request.

Converts a CompletionRequest object into a dictionary of parameters suitable for the OpenAI API.

Parameters

request – The completion request to convert

Returns

Dictionary of OpenAI API parameters

async chat_completions(request: ChatCompletionRequest) Any[source]#

Handle chat completion requests with function calling support. (POST /v1/chat/completions)

async completions(request: CompletionRequest) Any[source]#

Handle completion requests. (POST /v1/completions)

async execute_tool(request: Request) JSONResponse[source]#

Execute a tool/function call. (POST /v1/tools/execute)

fire(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) None#

Start the server with enhanced configuration.

Parameters
  • host – Host address to bind to

  • port – Port to listen on

  • log_level – Logging level

  • ssl_keyfile – Path to SSL key file

  • ssl_certfile – Path to SSL certificate file

  • workers – Number of worker processes

  • reload – Enable auto-reload for development

async get_metrics() JSONResponse[source]#

Get server performance metrics. (GET /metrics)

async get_model(model_id: str) JSONResponse[source]#

Get detailed information about a specific model. (GET /v1/models/{model_id})

async health_check() JSONResponse[source]#

Comprehensive health check. (GET /health)

async list_models() JSONResponse[source]#

List available models with metadata. (GET /v1/models)

async list_tools() JSONResponse[source]#

List available tools/functions for each model. (GET /v1/tools)

async liveness() JSONResponse[source]#

Liveness check endpoint. (GET /liveness)

process_request_params(openai_params: dict, request: easydel.inference.openai_api_modules.ChatCompletionRequest | easydel.inference.openai_api_modules.CompletionRequest) tuple[dict, pydantic.main.BaseModel | None][source]#

Process request parameters before sending to OpenAI.

Hook for subclasses to modify parameters or extract metadata before forwarding to the backend.

Parameters
  • openai_params – Dictionary of OpenAI API parameters

  • request – Original request object

Returns

Tuple of (processed_params, optional_metadata)

async readiness() JSONResponse[source]#

Readiness check endpoint. (GET /readiness)

run(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) None[source]#

Start the server with enhanced configuration.

Parameters
  • host – Host address to bind to

  • port – Port to listen on

  • log_level – Logging level

  • ssl_keyfile – Path to SSL key file

  • ssl_certfile – Path to SSL certificate file

  • workers – Number of worker processes

  • reload – Enable auto-reload for development

class easydel.inference.oai_proxies.ServerMetrics(total_requests: int = 0, successful_requests: int = 0, failed_requests: int = 0, total_tokens_generated: int = 0, average_tokens_per_second: float = 0.0, uptime_seconds: float = 0.0, start_time: float = <factory>)[source]#

Bases: object

Server performance metrics.

average_tokens_per_second: float = 0.0#
failed_requests: int = 0#
start_time: float#
successful_requests: int = 0#
total_requests: int = 0#
total_tokens_generated: int = 0#
uptime_seconds: float = 0.0#
class easydel.inference.oai_proxies.ServerStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: str, Enum

Server status enumeration.

BUSY = 'busy'#
ERROR = 'error'#
READY = 'ready'#
SHUTTING_DOWN = 'shutting_down'#
STARTING = 'starting'#
easydel.inference.oai_proxies.create_error_response(status_code: HTTPStatus, message: str, request_id: str | None = None) JSONResponse[source]#

Creates a standardized JSON error response.