easydel.inference.vsurge.api_server

easydel.inference.vsurge.api_server#

Implements a FastAPI server for serving vEngine models, mimicking OpenAI API.

class easydel.inference.vsurge.api_server.EndpointConfig(path: str, handler: Callable, methods: list[str], summary: Optional[str] = None, tags: Optional[list[str]] = None)[source]#

Bases: object

Configuration for a FastAPI endpoint.

classmethod from_dict(data: Dict[str, Any]) → T#: Deserializes a dictionary into a PyTree object.

classmethod from_json(json_str: str) → T#: Deserializes a JSON string into a PyTree object.

handler: Callable#

methods: list[str]#

path: str#

replace(**kwargs)#: Creates a new instance with specified fields replaced.

summary: Optional[str] = None#

tags: Optional[list[str]] = None#

to_dict() → Dict[str, Any]#: Serializes the PyTree object to a dictionary.

to_json(**kwargs) → str#: Serializes the PyTree object to a JSON string.

easydel.inference.vsurge.api_server.create_error_response(status_code: HTTPStatus, message: str) → JSONResponse[source]#: Creates a standardized JSON error response.

class easydel.inference.vsurge.api_server.vSurgeApiServer(vsurge_map: Union[Dict[str, vSurge], vSurge] = None, max_workers: int = 10, oai_like_processor: bool = True)[source]#

Bases: object

FastAPI server for serving vEngine instances.

This server provides endpoints mimicking the OpenAI API structure for chat completions, liveness/readiness checks, token counting, and listing available models. It handles both streaming and non-streaming requests asynchronously using a thread pool.

async available_inference()[source]#: Lists available models (GET /v1/models).

async chat_completions(request: ChatCompletionRequest)[source]#

Handles chat completion requests (POST /v1/chat/completions).

Validates the request, retrieves the appropriate vEngine model, tokenizes the input, and delegates to streaming or non-streaming handlers.

Parameters

request (ChatCompletionRequest) – The incoming request data.

Returns

The generated response, either: a complete JSON object or a streaming event-stream.

Return type

Union[JSONResponse, StreamingResponse]

async completions(request: CompletionRequest)[source]#

Handles completion requests (POST /v1/completions).

Processes the prompt for completion and returns generated text.

Parameters: request (CompletionRequest) – The incoming request data.
Returns: The generated response.
Return type: Union[JSONResponse, StreamingResponse]

async count_tokens(request: CountTokenRequest)[source]#: Token counting endpoint (POST /v1/count_tokens).

fire(host='0.0.0.0', port=11556, metrics_port: Optional[int] = None, log_level='info', ssl_keyfile: Optional[str] = None, ssl_certfile: Optional[str] = None)[source]#

Starts the uvicorn server to run the FastAPI application.

Parameters

host (str) – The host address to bind to. Defaults to “0.0.0.0”.
port (int) – The port to listen on. Defaults to 11556.
metrics_port (tp.Optional[int]) – The port for the Prometheus metrics server. If None, defaults to port + 1. Set to -1 to disable.
log_level (str) – The logging level for uvicorn. Defaults to “info”.
ssl_keyfile (tp.Optional[str]) – Path to the SSL key file for HTTPS.
ssl_certfile (tp.Optional[str]) – Path to the SSL certificate file for HTTPS.

async liveness()[source]#: Liveness check endpoint (GET /liveness).

async readiness()[source]#: Readiness check endpoint (GET /readiness).

easydel.inference.vsurge.api_server

Contents

easydel.inference.vsurge.api_server#