easydel.inference.esurge.engine_types

Contents

easydel.inference.esurge.engine_types#

Type definitions and data structures for the eSurge engine.

This module defines the core data types used throughout the eSurge engine, including request types, event types, and output structures.

Classes:

FinishReason: Enumeration of reasons why generation finished EngineCoreRequest: Core request structure for engine processing EngineCoreEventType: Types of engine events EngineCoreEvent: Timestamped engine event EngineCoreOutput: Output from engine core processing UtilityResult: Wrapper for special serialization handling

class easydel.inference.esurge.engine_types.EngineCoreEvent(type: EngineCoreEventType, timestamp: float)[source]#

Bases: Struct

Timestamped engine core event.

Records events that occur during request processing with monotonic timestamps for accurate interval calculation.

type#

Type of the event.

Type

easydel.inference.esurge.engine_types.EngineCoreEventType

timestamp#

Monotonic timestamp of when event occurred.

Type

float

Note

Timestamps are monotonic and should only be compared within the same process. They are used to calculate intervals between events for performance monitoring.

classmethod new_event(event_type: EngineCoreEventType, timestamp: float | None = None) EngineCoreEvent[source]#

Create a new engine event.

Parameters
  • event_type – Type of the event.

  • timestamp – Optional timestamp (uses current time if None).

Returns

New EngineCoreEvent instance.

timestamp: float#
type: EngineCoreEventType#
class easydel.inference.esurge.engine_types.EngineCoreEventType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Types of engine core events.

QUEUED#

Request was added to the queue.

SCHEDULED#

Request was scheduled for processing.

PREEMPTED#

Request was preempted by higher priority work.

PREEMPTED = 3#
QUEUED = 1#
SCHEDULED = 2#
class easydel.inference.esurge.engine_types.EngineCoreOutput(request_id: str, new_token_ids: list[int], new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None = None, new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None = None, finish_reason: easydel.inference.esurge.engine_types.FinishReason | None = None, stop_reason: int | str | None = None, events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None = None, num_cached_tokens: int = 0)[source]#

Bases: Struct

Output from engine core processing.

Contains generated tokens and associated metadata.

request_id#

ID of the request this output belongs to.

Type

str

new_token_ids#

List of newly generated token IDs.

Type

list[int]

new_logprobs#

Log probabilities for generated tokens.

Type

easydel.inference.esurge.outputs.LogprobsLists | None

new_prompt_logprobs_tensors#

Log probabilities for prompt tokens.

Type

easydel.inference.esurge.outputs.LogprobsTensors | None

finish_reason#

Reason generation finished (if finished).

Type

easydel.inference.esurge.engine_types.FinishReason | None

stop_reason#

Specific stop string/token that triggered finish.

Type

int | str | None

events#

List of events that occurred during processing.

Type

list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None

num_cached_tokens#

Number of tokens retrieved from cache.

Type

int

events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None#
finish_reason: easydel.inference.esurge.engine_types.FinishReason | None#
property finished: bool#

Check if generation has finished.

Returns

True if finish_reason is set, False otherwise.

new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None#
new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None#
new_token_ids: list[int]#
num_cached_tokens: int#
request_id: str#
stop_reason: int | str | None#
class easydel.inference.esurge.engine_types.EngineCoreOutputs(engine_index: int = 0, outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput] = <factory>, timestamp: float = 0.0, utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None = None, finished_requests: set[str] | None = None, wave_complete: int | None = None, start_wave: int | None = None)[source]#

Bases: Struct

engine_index: int#
finished_requests: set[str] | None#
outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput]#
start_wave: int | None#
timestamp: float#
utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None#
wave_complete: int | None#
class easydel.inference.esurge.engine_types.EngineCoreRequest(request_id: str, prompt_token_ids: list[int], sampling_params: easydel.inference.sampling_params.SamplingParams | None, eos_token_id: int | None, arrival_time: float, data_parallel_rank: int | None, client_index: int = 0, current_wave: int = 0, priority: int = 0)[source]#

Bases: Struct

Core request structure for engine processing.

Efficient msgspec-based structure for request data.

request_id#

Unique identifier for the request.

Type

str

prompt_token_ids#

List of token IDs in the prompt.

Type

list[int]

sampling_params#

Parameters controlling generation behavior.

Type

easydel.inference.sampling_params.SamplingParams | None

eos_token_id#

End-of-sequence token ID.

Type

int | None

arrival_time#

Timestamp when request arrived.

Type

float

data_parallel_rank#

Rank for data parallel processing.

Type

int | None

client_index#

Index of the client making the request.

Type

int

current_wave#

Current processing wave number.

Type

int

priority#

Request priority for scheduling.

Type

int

arrival_time: float#
client_index: int#
current_wave: int#
data_parallel_rank: int | None#
eos_token_id: int | None#
priority: int#
prompt_token_ids: list[int]#
request_id: str#
sampling_params: easydel.inference.sampling_params.SamplingParams | None#
class easydel.inference.esurge.engine_types.EngineCoreRequestType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

EngineRequest types defined as hex byte strings, so it can be sent over sockets without separate encoding step.

ABORT = b'\x01'#
ADD = b'\x00'#
EXECUTOR_FAILED = b'\x04'#
START_DP_WAVE = b'\x02'#
UTILITY = b'\x03'#
class easydel.inference.esurge.engine_types.FinishReason(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Reason why text generation finished.

Uses integer values for compact serialization.

STOP#

A stop string/token was generated.

LENGTH#

Maximum tokens or model length was reached.

ABORT#

Generation was aborted for another reason.

Example

>>> reason = FinishReason.STOP
>>> print(reason)  # Output: "stop"
>>> reason.value  # Output: 0
ABORT = 2#
LENGTH = 1#
STOP = 0#
class easydel.inference.esurge.engine_types.ReconfigureDistributedRequest(new_data_parallel_size: int, new_data_parallel_rank: int, new_data_parallel_rank_local: int, new_data_parallel_master_ip: str, new_data_parallel_master_port: int)[source]#

Bases: Struct

new_data_parallel_master_ip: str#
new_data_parallel_master_port: int#
new_data_parallel_rank: int#
new_data_parallel_rank_local: int#
new_data_parallel_size: int#
class easydel.inference.esurge.engine_types.ReconfigureRankType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: IntEnum

Rank type for reconfiguring distributed request.

KEEP_CURRENT_RANK = -1#
SHUTDOWN_CURRENT_RANK = -2#
class easydel.inference.esurge.engine_types.UtilityOutput(call_id: int, failure_message: str | None = None, result: easydel.inference.esurge.engine_types.UtilityResult | None = None)[source]#

Bases: Struct

call_id: int#
failure_message: str | None#
result: easydel.inference.esurge.engine_types.UtilityResult | None#
class easydel.inference.esurge.engine_types.UtilityResult(r: Any = None)[source]#

Bases: object

Wrapper for special serialization/deserialization handling.

Provides a container for results that require custom serialization behavior or special handling during data transfer.

r#

The wrapped result object.