easydel.inference.vsurge.api_server#

Implements a FastAPI server for serving vEngine models, mimicking OpenAI API.

class easydel.inference.vsurge.api_server.EndpointConfig(path: str, handler: Callable, methods: list[str], summary: Optional[str] = None, tags: Optional[list[str]] = None)[source]#

Bases: object

Configuration for a FastAPI endpoint.

classmethod from_dict(data: Dict[str, Any]) T#

Deserializes a dictionary into a PyTree object.

classmethod from_json(json_str: str) T#

Deserializes a JSON string into a PyTree object.

handler: Callable#
methods: list[str]#
path: str#
replace(**kwargs)#

Creates a new instance with specified fields replaced.

summary: Optional[str] = None#
tags: Optional[list[str]] = None#
to_dict() Dict[str, Any]#

Serializes the PyTree object to a dictionary.

to_json(**kwargs) str#

Serializes the PyTree object to a JSON string.

easydel.inference.vsurge.api_server.create_error_response(status_code: HTTPStatus, message: str) JSONResponse[source]#

Creates a standardized JSON error response.

class easydel.inference.vsurge.api_server.vSurgeApiServer(vsurge_map: Union[Dict[str, vSurge], vSurge] = None, max_workers: int = 10, oai_like_processor: bool = True)[source]#

Bases: object

FastAPI server for serving vEngine instances.

This server provides endpoints mimicking the OpenAI API structure for chat completions, liveness/readiness checks, token counting, and listing available models. It handles both streaming and non-streaming requests asynchronously using a thread pool.

async available_inference()[source]#

Lists available models (GET /v1/models).

async chat_completions(request: ChatCompletionRequest)[source]#

Handles chat completion requests (POST /v1/chat/completions).

Validates the request, retrieves the appropriate vEngine model, tokenizes the input, and delegates to streaming or non-streaming handlers.

Parameters

request (ChatCompletionRequest) โ€“ The incoming request data.

Returns

The generated response, either

a complete JSON object or a streaming event-stream.

Return type

Union[JSONResponse, StreamingResponse]

async completions(request: CompletionRequest)[source]#

Handles completion requests (POST /v1/completions).

Processes the prompt for completion and returns generated text.

Parameters

request (CompletionRequest) โ€“ The incoming request data.

Returns

The generated response.

Return type

Union[JSONResponse, StreamingResponse]

async count_tokens(request: CountTokenRequest)[source]#

Token counting endpoint (POST /v1/count_tokens).

fire(host='0.0.0.0', port=11556, metrics_port: Optional[int] = None, log_level='info', ssl_keyfile: Optional[str] = None, ssl_certfile: Optional[str] = None)[source]#

Starts the uvicorn server to run the FastAPI application.

Parameters
  • host (str) โ€“ The host address to bind to. Defaults to โ€œ0.0.0.0โ€.

  • port (int) โ€“ The port to listen on. Defaults to 11556.

  • metrics_port (tp.Optional[int]) โ€“ The port for the Prometheus metrics server. If None, defaults to port + 1. Set to -1 to disable.

  • log_level (str) โ€“ The logging level for uvicorn. Defaults to โ€œinfoโ€.

  • ssl_keyfile (tp.Optional[str]) โ€“ Path to the SSL key file for HTTPS.

  • ssl_certfile (tp.Optional[str]) โ€“ Path to the SSL certificate file for HTTPS.

async liveness()[source]#

Liveness check endpoint (GET /liveness).

async readiness()[source]#

Readiness check endpoint (GET /readiness).