easydel.inference.oai_proxies#
Enhanced FastAPI server that proxies requests to OpenAI API.
This module provides a proxy server that forwards requests to OpenAI’s API while adding EasyDeL-specific monitoring and compatibility features. It enables seamless integration between EasyDeL inference engines and OpenAI-compatible clients.
- Classes:
InferenceApiRouter: Main proxy server class with OpenAI API compatibility ServerStatus: Enum for server operational states ServerMetrics: Performance metrics tracking EndpointConfig: API endpoint configuration ErrorResponse: Standardized error response format
Example
>>> from easydel.inference import InferenceApiRouter
>>> # Create a proxy to OpenAI API
>>> router = InferenceApiRouter(
... api_key="your-api-key",
... base_url="https://api.openai.com/v1"
... )
>>> router.run(host="0.0.0.0", port=8084)
>>> # Or proxy to a local EasyDeL server
>>> router = InferenceApiRouter(
... base_url="http://localhost:8000/v1",
... enable_function_calling=True
... )
>>> router.run()
- class easydel.inference.oai_proxies.EndpointConfig(*, path: str, handler: Callable, methods: list[str], summary: str | None = None, tags: list[str] | None = None, response_model: Any = None)[source]#
Bases:
BaseModelConfiguration for a FastAPI endpoint.
- handler: tp.Callable#
- methods: list[str]#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- path: str#
- response_model: tp.Any#
- class easydel.inference.oai_proxies.ErrorResponse(*, error: dict[str, str], request_id: str | None = None, timestamp: float = <factory>)[source]#
Bases:
BaseModelStandard error response model.
- error: dict[str, str]#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- timestamp: float#
- class easydel.inference.oai_proxies.InferenceApiRouter(api_key: str | None = None, base_url: str | None = None, organization: str | None = None, enable_function_calling: bool = True, **kwargs)[source]#
Bases:
objectEnhanced FastAPI server acting as an OpenAI API proxy.
This server provides a complete OpenAI API-compatible interface that can forward requests to either OpenAI’s API or a local EasyDeL inference server. It includes additional monitoring, health check, and function calling endpoints.
The router automatically detects backend capabilities and provides appropriate fallbacks when features are not available.
- client#
AsyncOpenAI client for backend communication
- app#
FastAPI application instance
- status#
Current server status
- metrics#
Performance metrics tracker
- base_url#
Backend API base URL
- enable_function_calling#
Whether function calling is enabled
- build_oai_params_from_chat_request(request: ChatCompletionRequest) dict[str, float | int | str | bool | list][source]#
Build OpenAI parameters from chat completion request.
Converts a ChatCompletionRequest object into a dictionary of parameters suitable for the OpenAI API, including function calling parameters if present.
- Parameters
request – The chat completion request to convert
- Returns
Dictionary of OpenAI API parameters with optional tool/function definitions
- build_oai_params_from_request(request: CompletionRequest) dict[str, float | int | str | bool | list][source]#
Build OpenAI parameters from completion request.
Converts a CompletionRequest object into a dictionary of parameters suitable for the OpenAI API.
- Parameters
request – The completion request to convert
- Returns
Dictionary of OpenAI API parameters
- async chat_completions(request: ChatCompletionRequest) Any[source]#
Handle chat completion requests with function calling support. (POST /v1/chat/completions)
- async completions(request: CompletionRequest) Any[source]#
Handle completion requests. (POST /v1/completions)
- async execute_tool(request: Request) JSONResponse[source]#
Execute a tool/function call. (POST /v1/tools/execute)
- fire(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) None#
Start the server with enhanced configuration.
- Parameters
host – Host address to bind to
port – Port to listen on
log_level – Logging level
ssl_keyfile – Path to SSL key file
ssl_certfile – Path to SSL certificate file
workers – Number of worker processes
reload – Enable auto-reload for development
- async get_model(model_id: str) JSONResponse[source]#
Get detailed information about a specific model. (GET /v1/models/{model_id})
- async list_tools() JSONResponse[source]#
List available tools/functions for each model. (GET /v1/tools)
- process_request_params(openai_params: dict, request: easydel.inference.openai_api_modules.ChatCompletionRequest | easydel.inference.openai_api_modules.CompletionRequest) tuple[dict, pydantic.main.BaseModel | None][source]#
Process request parameters before sending to OpenAI.
Hook for subclasses to modify parameters or extract metadata before forwarding to the backend.
- Parameters
openai_params – Dictionary of OpenAI API parameters
request – Original request object
- Returns
Tuple of (processed_params, optional_metadata)
- run(host: str = '0.0.0.0', port: int = 8084, log_level: str = 'info', ssl_keyfile: str | None = None, ssl_certfile: str | None = None, workers: int = 1, reload: bool = False) None[source]#
Start the server with enhanced configuration.
- Parameters
host – Host address to bind to
port – Port to listen on
log_level – Logging level
ssl_keyfile – Path to SSL key file
ssl_certfile – Path to SSL certificate file
workers – Number of worker processes
reload – Enable auto-reload for development
- class easydel.inference.oai_proxies.ServerMetrics(total_requests: int = 0, successful_requests: int = 0, failed_requests: int = 0, total_tokens_generated: int = 0, average_tokens_per_second: float = 0.0, uptime_seconds: float = 0.0, start_time: float = <factory>)[source]#
Bases:
objectServer performance metrics.
- average_tokens_per_second: float = 0.0#
- failed_requests: int = 0#
- start_time: float#
- successful_requests: int = 0#
- total_requests: int = 0#
- total_tokens_generated: int = 0#
- uptime_seconds: float = 0.0#