easydel.inference.esurge.engine_types#
Type definitions and data structures for the eSurge engine.
This module defines the core data types used throughout the eSurge engine, including request types, event types, and output structures.
- Classes:
FinishReason: Enumeration of reasons why generation finished EngineCoreRequest: Core request structure for engine processing EngineCoreEventType: Types of engine events EngineCoreEvent: Timestamped engine event EngineCoreOutput: Output from engine core processing UtilityResult: Wrapper for special serialization handling
- class easydel.inference.esurge.engine_types.EngineCoreEvent(type: EngineCoreEventType, timestamp: float)[source]#
Bases:
StructTimestamped engine core event.
Records events that occur during request processing with monotonic timestamps for accurate interval calculation.
- type#
Type of the event.
- timestamp#
Monotonic timestamp of when event occurred.
- Type
float
Note
Timestamps are monotonic and should only be compared within the same process. They are used to calculate intervals between events for performance monitoring.
- classmethod new_event(event_type: EngineCoreEventType, timestamp: float | None = None) EngineCoreEvent[source]#
Create a new engine event.
- Parameters
event_type – Type of the event.
timestamp – Optional timestamp (uses current time if None).
- Returns
New EngineCoreEvent instance.
- timestamp: float#
- type: EngineCoreEventType#
- class easydel.inference.esurge.engine_types.EngineCoreEventType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
IntEnumTypes of engine core events.
- QUEUED#
Request was added to the queue.
- SCHEDULED#
Request was scheduled for processing.
- PREEMPTED#
Request was preempted by higher priority work.
- PREEMPTED = 3#
- QUEUED = 1#
- SCHEDULED = 2#
- class easydel.inference.esurge.engine_types.EngineCoreOutput(request_id: str, new_token_ids: list[int], new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None = None, new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None = None, finish_reason: easydel.inference.esurge.engine_types.FinishReason | None = None, stop_reason: int | str | None = None, events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None = None, num_cached_tokens: int = 0)[source]#
Bases:
StructOutput from engine core processing.
Contains generated tokens and associated metadata.
- request_id#
ID of the request this output belongs to.
- Type
str
- new_token_ids#
List of newly generated token IDs.
- Type
list[int]
- new_logprobs#
Log probabilities for generated tokens.
- Type
- new_prompt_logprobs_tensors#
Log probabilities for prompt tokens.
- Type
- finish_reason#
Reason generation finished (if finished).
- stop_reason#
Specific stop string/token that triggered finish.
- Type
int | str | None
- events#
List of events that occurred during processing.
- Type
list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None
- num_cached_tokens#
Number of tokens retrieved from cache.
- Type
int
- events: list[easydel.inference.esurge.engine_types.EngineCoreEvent] | None#
- finish_reason: easydel.inference.esurge.engine_types.FinishReason | None#
- property finished: bool#
Check if generation has finished.
- Returns
True if finish_reason is set, False otherwise.
- new_logprobs: easydel.inference.esurge.outputs.LogprobsLists | None#
- new_prompt_logprobs_tensors: easydel.inference.esurge.outputs.LogprobsTensors | None#
- new_token_ids: list[int]#
- num_cached_tokens: int#
- request_id: str#
- class easydel.inference.esurge.engine_types.EngineCoreOutputs(engine_index: int = 0, outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput] = <factory>, timestamp: float = 0.0, utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None = None, finished_requests: set[str] | None = None, wave_complete: int | None = None, start_wave: int | None = None)[source]#
Bases:
Struct- engine_index: int#
- outputs: list[easydel.inference.esurge.engine_types.EngineCoreOutput]#
- timestamp: float#
- utility_output: easydel.inference.esurge.engine_types.UtilityOutput | None#
- class easydel.inference.esurge.engine_types.EngineCoreRequest(request_id: str, prompt_token_ids: list[int], sampling_params: easydel.inference.sampling_params.SamplingParams | None, eos_token_id: int | None, arrival_time: float, data_parallel_rank: int | None, client_index: int = 0, current_wave: int = 0, priority: int = 0)[source]#
Bases:
StructCore request structure for engine processing.
Efficient msgspec-based structure for request data.
- request_id#
Unique identifier for the request.
- Type
str
- prompt_token_ids#
List of token IDs in the prompt.
- Type
list[int]
- sampling_params#
Parameters controlling generation behavior.
- Type
- eos_token_id#
End-of-sequence token ID.
- Type
int | None
- arrival_time#
Timestamp when request arrived.
- Type
float
- data_parallel_rank#
Rank for data parallel processing.
- Type
int | None
- client_index#
Index of the client making the request.
- Type
int
- current_wave#
Current processing wave number.
- Type
int
- priority#
Request priority for scheduling.
- Type
int
- arrival_time: float#
- client_index: int#
- current_wave: int#
- priority: int#
- prompt_token_ids: list[int]#
- request_id: str#
- sampling_params: easydel.inference.sampling_params.SamplingParams | None#
- class easydel.inference.esurge.engine_types.EngineCoreRequestType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
EnumEngineRequest types defined as hex byte strings, so it can be sent over sockets without separate encoding step.
- ABORT = b'\x01'#
- ADD = b'\x00'#
- EXECUTOR_FAILED = b'\x04'#
- START_DP_WAVE = b'\x02'#
- UTILITY = b'\x03'#
- class easydel.inference.esurge.engine_types.FinishReason(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
IntEnumReason why text generation finished.
Uses integer values for compact serialization.
- STOP#
A stop string/token was generated.
- LENGTH#
Maximum tokens or model length was reached.
- ABORT#
Generation was aborted for another reason.
Example
>>> reason = FinishReason.STOP >>> print(reason) # Output: "stop" >>> reason.value # Output: 0
- ABORT = 2#
- LENGTH = 1#
- STOP = 0#
- class easydel.inference.esurge.engine_types.ReconfigureDistributedRequest(new_data_parallel_size: int, new_data_parallel_rank: int, new_data_parallel_rank_local: int, new_data_parallel_master_ip: str, new_data_parallel_master_port: int)[source]#
Bases:
Struct- new_data_parallel_master_ip: str#
- new_data_parallel_master_port: int#
- new_data_parallel_rank: int#
- new_data_parallel_rank_local: int#
- new_data_parallel_size: int#
- class easydel.inference.esurge.engine_types.ReconfigureRankType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
IntEnumRank type for reconfiguring distributed request.
- KEEP_CURRENT_RANK = -1#
- SHUTDOWN_CURRENT_RANK = -2#
- class easydel.inference.esurge.engine_types.UtilityOutput(call_id: int, failure_message: str | None = None, result: easydel.inference.esurge.engine_types.UtilityResult | None = None)[source]#
Bases:
Struct- call_id: int#
- class easydel.inference.esurge.engine_types.UtilityResult(r: Any = None)[source]#
Bases:
objectWrapper for special serialization/deserialization handling.
Provides a container for results that require custom serialization behavior or special handling during data transfer.
- r#
The wrapped result object.