easydel.layers.attention_operator.modules.paged_attn

easydel.layers.attention_operator.modules.paged_attn#

class easydel.layers.attention_operator.modules.paged_attn.PagedAttn(metadata: AttentionMetadata)[source]#

forward_cpu(*args, **kwargs) → AttentionOutput[source]#: CPU forward pass. Not implemented for Paged Attention.

forward_cuda(*args, **kwargs) → AttentionOutput[source]#: CUDA GPU forward pass. Not implemented for Paged Attention.

forward_gpu(*args, **kwargs) → AttentionOutput[source]#: GPU forward pass. Not implemented for Paged Attention.

forward_native(*args, **kwargs) → AttentionOutput[source]#: Native (CPU) forward pass. Not implemented for Paged Attention.

forward_rocm(*args, **kwargs) → AttentionOutput[source]#: ROCm GPU forward pass. Not implemented for Paged Attention.

TPU-specific implementation of the operation.

Defaults to calling forward_native. Subclasses can override this for TPU-specific optimizations.

Parameters

Returns

The result of the operation, potentially optimized for TPU.

Returns the metadata associated with this attention implementation instance.

classmethod get_impl_name() → Union[str, Tuple[str]][source]#

Returns the registered name of this attention implementation.