easydel.inference.oai_proxies

easydel.inference.oai_proxies#

Enhanced FastAPI server that proxies requests to OpenAI API.

This module provides a proxy server that forwards requests to OpenAI’s API while adding EasyDeL-specific monitoring and compatibility features. It enables seamless integration between EasyDeL inference engines and OpenAI-compatible clients.

Classes:: InferenceApiRouter: Main proxy server class with OpenAI API compatibility ServerStatus: Enum for server operational states ServerMetrics: Performance metrics tracking EndpointConfig: API endpoint configuration ErrorResponse: Standardized error response format

Example

>>> from easydel.inference import InferenceApiRouter
>>> # Create a proxy to OpenAI API
>>> router = InferenceApiRouter(
...     api_key="your-api-key",
...     base_url="https://api.openai.com/v1"
... )
>>> router.run(host="0.0.0.0", port=8084)

>>> # Or proxy to a local EasyDeL server
>>> router = InferenceApiRouter(
...     base_url="http://localhost:8000/v1",
...     enable_function_calling=True
... )
>>> router.run()

class easydel.inference.oai_proxies.EndpointConfig(*, path: str, handler: Callable, methods: list[str], summary: str | None = None, tags: list[str] | None = None, response_model: Any = None)[source]#

Bases: BaseModel

Configuration for a FastAPI endpoint.

handler: tp.Callable#

methods: list[str]#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

path: str#

response_model: tp.Any#

summary: str | None#

tags: list[str] | None#

class easydel.inference.oai_proxies.ErrorResponse(*, error: dict[str, str], request_id: str | None = None, timestamp: float = <factory>)[source]#

Bases: BaseModel

Standard error response model.

error: dict[str, str]#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

request_id: str | None#

timestamp: float#

class easydel.inference.oai_proxies.InferenceApiRouter(api_key: str | None = None, base_url: str | None = None, organization: str | None = None, enable_function_calling: bool = True, **kwargs)[source]#

Bases: object

Enhanced FastAPI server acting as an OpenAI API proxy.

This server provides a complete OpenAI API-compatible interface that can forward requests to either OpenAI’s API or a local EasyDeL inference server. It includes additional monitoring, health check, and function calling endpoints.

The router automatically detects backend capabilities and provides appropriate fallbacks when features are not available.

client#: AsyncOpenAI client for backend communication

app#: FastAPI application instance

status#: Current server status

metrics#: Performance metrics tracker

base_url#: Backend API base URL

enable_function_calling#: Whether function calling is enabled

build_oai_params_from_chat_request(request: ChatCompletionRequest) → dict[str, float | int | str | bool | list][source]#

Build OpenAI parameters from chat completion request.

Converts a ChatCompletionRequest object into a dictionary of parameters suitable for the OpenAI API, including function calling parameters if present.

Parameters: request – The chat completion request to convert
Returns: Dictionary of OpenAI API parameters with optional tool/function definitions

build_oai_params_from_request(request: CompletionRequest) → dict[str, float | int | str | bool | list][source]#

Build OpenAI parameters from completion request.

Converts a CompletionRequest object into a dictionary of parameters suitable for the OpenAI API.

Parameters: request – The completion request to convert
Returns: Dictionary of OpenAI API parameters

async chat_completions(request: ChatCompletionRequest) → Any[source]#: Handle chat completion requests with function calling support. (POST /v1/chat/completions)

async completions(request: CompletionRequest) → Any[source]#: Handle completion requests. (POST /v1/completions)

async execute_tool(request: Request) → JSONResponse[source]#: Execute a tool/function call. (POST /v1/tools/execute)

fire(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) → None#

Start the server with enhanced configuration.

Parameters

host – Host address to bind to
port – Port to listen on
log_level – Logging level
ssl_keyfile – Path to SSL key file
ssl_certfile – Path to SSL certificate file
workers – Number of worker processes
reload – Enable auto-reload for development

async get_metrics() → JSONResponse[source]#: Get server performance metrics. (GET /metrics)

async get_model(model_id: str) → JSONResponse[source]#: Get detailed information about a specific model. (GET /v1/models/{model_id})

async health_check() → JSONResponse[source]#: Comprehensive health check. (GET /health)

async list_models() → JSONResponse[source]#: List available models with metadata. (GET /v1/models)

async list_tools() → JSONResponse[source]#: List available tools/functions for each model. (GET /v1/tools)

async liveness() → JSONResponse[source]#: Liveness check endpoint. (GET /liveness)

process_request_params(openai_params: dict, request: easydel.inference.openai_api_modules.ChatCompletionRequest | easydel.inference.openai_api_modules.CompletionRequest) → tuple[dict, pydantic.main.BaseModel | None][source]#

Process request parameters before sending to OpenAI.

Hook for subclasses to modify parameters or extract metadata before forwarding to the backend.

Parameters

openai_params – Dictionary of OpenAI API parameters
request – Original request object

Returns

Tuple of (processed_params, optional_metadata)

async readiness() → JSONResponse[source]#: Readiness check endpoint. (GET /readiness)

run(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) → None[source]#

Start the server with enhanced configuration.

Parameters

host – Host address to bind to
port – Port to listen on
log_level – Logging level
ssl_keyfile – Path to SSL key file
ssl_certfile – Path to SSL certificate file
workers – Number of worker processes
reload – Enable auto-reload for development

class easydel.inference.oai_proxies.ServerMetrics(total_requests: int = 0, successful_requests: int = 0, failed_requests: int = 0, total_tokens_generated: int = 0, average_tokens_per_second: float = 0.0, uptime_seconds: float = 0.0, start_time: float = <factory>)[source]#

Bases: object

Server performance metrics.

average_tokens_per_second: float = 0.0#

failed_requests: int = 0#

start_time: float#

successful_requests: int = 0#

total_requests: int = 0#

total_tokens_generated: int = 0#

uptime_seconds: float = 0.0#

class easydel.inference.oai_proxies.ServerStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: str, Enum

Server status enumeration.

BUSY = 'busy'#

ERROR = 'error'#

READY = 'ready'#

SHUTTING_DOWN = 'shutting_down'#

STARTING = 'starting'#

easydel.inference.oai_proxies.create_error_response(status_code: HTTPStatus, message: str, request_id: str | None = None) → JSONResponse[source]#: Creates a standardized JSON error response.

easydel.inference.oai_proxies

Contents

easydel.inference.oai_proxies#