easydel.inference.esurge.engine_types

easydel.inference.esurge.engine_types#

Type definitions and data structures for the eSurge engine.

This module defines the core data types used throughout the eSurge engine, including request types, event types, and output structures.

Classes:: FinishReason: Enumeration of reasons why generation finished EngineCoreRequest: Core request structure for engine processing EngineCoreEventType: Types of engine events EngineCoreEvent: Timestamped engine event EngineCoreOutput: Output from engine core processing UtilityResult: Wrapper for special serialization handling

class easydel.inference.esurge.engine_types.EngineCoreEvent(type: EngineCoreEventType, timestamp: float)[source]#

Bases: Struct

Timestamped engine core event.

Records events that occur during request processing with monotonic timestamps for accurate interval calculation.

type#

Type of the event.

Type: easydel.inference.esurge.engine_types.EngineCoreEventType

timestamp#

Monotonic timestamp of when event occurred.

Type: float

Note

Timestamps are monotonic and should only be compared within the same process. They are used to calculate intervals between events for performance monitoring.

classmethod new_event(event_type: EngineCoreEventType, timestamp: float | None = None) → EngineCoreEvent[source]#

Create a new engine event.

Parameters

event_type – Type of the event.
timestamp – Optional timestamp (uses current time if None).

Returns

New EngineCoreEvent instance.

timestamp: float#

type: EngineCoreEventType#

class easydel.inference.esurge.engine_types.EngineCoreEventType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Types of engine core events.

QUEUED#: Request was added to the queue.

SCHEDULED#: Request was scheduled for processing.

PREEMPTED#: Request was preempted by higher priority work.

PREEMPTED = 3#

QUEUED = 1#

SCHEDULED = 2#

class easydel.inference.esurge.engine_types.EngineCoreOutput(request_id: str, new_token_ids: list[int], new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None = None, new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None = None, finish_reason: easydel.inference.esurge.engine_types.FinishReason | None = None, stop_reason: int | str | None = None, events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None = None, num_cached_tokens: int = 0)[source]#

Bases: Struct

Output from engine core processing.

Contains generated tokens and associated metadata.

request_id#

ID of the request this output belongs to.

Type: str

new_token_ids#

List of newly generated token IDs.

Type: list[int]

new_logprobs#

Log probabilities for generated tokens.

Type: easydel.inference.esurge.outputs.LogprobsLists | None

new_prompt_logprobs_tensors#

Log probabilities for prompt tokens.

Type: easydel.inference.esurge.outputs.LogprobsTensors | None

finish_reason#

Reason generation finished (if finished).

Type: easydel.inference.esurge.engine_types.FinishReason | None

stop_reason#

Specific stop string/token that triggered finish.

Type: int | str | None

events#

List of events that occurred during processing.

Type: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None

num_cached_tokens#

Number of tokens retrieved from cache.

Type: int

events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None#

finish_reason: easydel.inference.esurge.engine_types.FinishReason | None#

property finished: bool#

Check if generation has finished.

Returns: True if finish_reason is set, False otherwise.

new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None#

new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None#

new_token_ids: list[int]#

num_cached_tokens: int#

request_id: str#

stop_reason: int | str | None#

class easydel.inference.esurge.engine_types.EngineCoreOutputs(engine_index: int = 0, outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput] = <factory>, timestamp: float = 0.0, utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None = None, finished_requests: set[str] | None = None, wave_complete: int | None = None, start_wave: int | None = None)[source]#

Bases: Struct

engine_index: int#

finished_requests: set[str] | None#

outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput]#

start_wave: int | None#

timestamp: float#

utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None#

wave_complete: int | None#

class easydel.inference.esurge.engine_types.EngineCoreRequest(request_id: str, prompt_token_ids: list[int], sampling_params: easydel.inference.sampling_params.SamplingParams | None, eos_token_id: int | None, arrival_time: float, data_parallel_rank: int | None, client_index: int = 0, current_wave: int = 0, priority: int = 0)[source]#

Bases: Struct

Core request structure for engine processing.

Efficient msgspec-based structure for request data.

request_id#

Unique identifier for the request.

Type: str

prompt_token_ids#

List of token IDs in the prompt.

Type: list[int]

sampling_params#

Parameters controlling generation behavior.

Type: easydel.inference.sampling_params.SamplingParams | None

eos_token_id#

End-of-sequence token ID.

Type: int | None

arrival_time#

Timestamp when request arrived.

Type: float

data_parallel_rank#

Rank for data parallel processing.

Type: int | None

client_index#

Index of the client making the request.

Type: int

current_wave#

Current processing wave number.

Type: int

priority#

Request priority for scheduling.

Type: int

arrival_time: float#

client_index: int#

current_wave: int#

data_parallel_rank: int | None#

eos_token_id: int | None#

priority: int#

prompt_token_ids: list[int]#

request_id: str#

sampling_params: easydel.inference.sampling_params.SamplingParams | None#

class easydel.inference.esurge.engine_types.EngineCoreRequestType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

EngineRequest types defined as hex byte strings, so it can be sent over sockets without separate encoding step.

ABORT = b'\x01'#

ADD = b'\x00'#

EXECUTOR_FAILED = b'\x04'#

START_DP_WAVE = b'\x02'#

UTILITY = b'\x03'#

class easydel.inference.esurge.engine_types.FinishReason(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Reason why text generation finished.

Uses integer values for compact serialization.

STOP#: A stop string/token was generated.

LENGTH#: Maximum tokens or model length was reached.

ABORT#: Generation was aborted for another reason.

Example

>>> reason = FinishReason.STOP
>>> print(reason)  # Output: "stop"
>>> reason.value  # Output: 0

ABORT = 2#

LENGTH = 1#

STOP = 0#

class easydel.inference.esurge.engine_types.ReconfigureDistributedRequest(new_data_parallel_size: int, new_data_parallel_rank: int, new_data_parallel_rank_local: int, new_data_parallel_master_ip: str, new_data_parallel_master_port: int)[source]#

Bases: Struct

new_data_parallel_master_ip: str#

new_data_parallel_master_port: int#

new_data_parallel_rank: int#

new_data_parallel_rank_local: int#

new_data_parallel_size: int#

class easydel.inference.esurge.engine_types.ReconfigureRankType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Rank type for reconfiguring distributed request.

KEEP_CURRENT_RANK = -1#

SHUTDOWN_CURRENT_RANK = -2#

class easydel.inference.esurge.engine_types.UtilityOutput(call_id: int, failure_message: str | None = None, result: easydel.inference.esurge.engine_types.UtilityResult | None = None)[source]#

Bases: Struct

call_id: int#

failure_message: str | None#

result: easydel.inference.esurge.engine_types.UtilityResult | None#

class easydel.inference.esurge.engine_types.UtilityResult(r: Any = None)[source]#

Bases: object

Wrapper for special serialization/deserialization handling.

Provides a container for results that require custom serialization behavior or special handling during data transfer.

r#: The wrapped result object.

easydel.inference.esurge.engine_types

Contents

easydel.inference.esurge.engine_types#