easydel.inference.vsurge.api_server#
Implements a FastAPI server for serving vEngine models, mimicking OpenAI API.
- class easydel.inference.vsurge.api_server.EndpointConfig(path: str, handler: Callable, methods: list[str], summary: Optional[str] = None, tags: Optional[list[str]] = None)[source]#
Bases:
objectConfiguration for a FastAPI endpoint.
- classmethod from_dict(data: Dict[str, Any]) T#
Deserializes a dictionary into a PyTree object.
- classmethod from_json(json_str: str) T#
Deserializes a JSON string into a PyTree object.
- handler: Callable#
- methods: list[str]#
- path: str#
- replace(**kwargs)#
Creates a new instance with specified fields replaced.
- summary: Optional[str] = None#
- tags: Optional[list[str]] = None#
- to_dict() Dict[str, Any]#
Serializes the PyTree object to a dictionary.
- to_json(**kwargs) str#
Serializes the PyTree object to a JSON string.
- easydel.inference.vsurge.api_server.create_error_response(status_code: HTTPStatus, message: str) JSONResponse[source]#
Creates a standardized JSON error response.
- class easydel.inference.vsurge.api_server.vSurgeApiServer(vsurge_map: Union[Dict[str, vSurge], vSurge] = None, max_workers: int = 10, oai_like_processor: bool = True)[source]#
Bases:
objectFastAPI server for serving vEngine instances.
This server provides endpoints mimicking the OpenAI API structure for chat completions, liveness/readiness checks, token counting, and listing available models. It handles both streaming and non-streaming requests asynchronously using a thread pool.
- async chat_completions(request: ChatCompletionRequest)[source]#
Handles chat completion requests (POST /v1/chat/completions).
Validates the request, retrieves the appropriate vEngine model, tokenizes the input, and delegates to streaming or non-streaming handlers.
- Parameters
request (ChatCompletionRequest) โ The incoming request data.
- Returns
- The generated response, either
a complete JSON object or a streaming event-stream.
- Return type
Union[JSONResponse, StreamingResponse]
- async completions(request: CompletionRequest)[source]#
Handles completion requests (POST /v1/completions).
Processes the prompt for completion and returns generated text.
- Parameters
request (CompletionRequest) โ The incoming request data.
- Returns
The generated response.
- Return type
Union[JSONResponse, StreamingResponse]
- async count_tokens(request: CountTokenRequest)[source]#
Token counting endpoint (POST /v1/count_tokens).
- fire(host='0.0.0.0', port=11556, metrics_port: Optional[int] = None, log_level='info', ssl_keyfile: Optional[str] = None, ssl_certfile: Optional[str] = None)[source]#
Starts the uvicorn server to run the FastAPI application.
- Parameters
host (str) โ The host address to bind to. Defaults to โ0.0.0.0โ.
port (int) โ The port to listen on. Defaults to 11556.
metrics_port (tp.Optional[int]) โ The port for the Prometheus metrics server. If None, defaults to port + 1. Set to -1 to disable.
log_level (str) โ The logging level for uvicorn. Defaults to โinfoโ.
ssl_keyfile (tp.Optional[str]) โ Path to the SSL key file for HTTPS.
ssl_certfile (tp.Optional[str]) โ Path to the SSL certificate file for HTTPS.