easydel.layers.attention_operator.modules.paged_attn#

class easydel.layers.attention_operator.modules.paged_attn.PagedAttn(metadata: AttentionMetadata)[source]#

Bases: AttentionImpl

forward_cpu(*args, **kwargs) AttentionOutput[source]#

CPU forward pass. Not implemented for Paged Attention.

forward_cuda(*args, **kwargs) AttentionOutput[source]#

CUDA GPU forward pass. Not implemented for Paged Attention.

forward_gpu(*args, **kwargs) AttentionOutput[source]#

GPU forward pass. Not implemented for Paged Attention.

forward_native(*args, **kwargs) AttentionOutput[source]#

Native (CPU) forward pass. Not implemented for Paged Attention.

forward_rocm(*args, **kwargs) AttentionOutput[source]#

ROCm GPU forward pass. Not implemented for Paged Attention.

forward_tpu(q: Array, k: Array, v: Array, cache_view: PagedAttentionCacheView, cache_metadata: PagedAttentionMetadata, **ignore) AttentionOutput[source]#

TPU-specific implementation of the operation.

Defaults to calling forward_native. Subclasses can override this for TPU-specific optimizations.

Parameters
  • *args – Positional arguments for the operation.

  • **kwargs – Keyword arguments for the operation.

Returns

The result of the operation, potentially optimized for TPU.

get_impl_metadata() AttentionMetadata[source]#

Returns the metadata associated with this attention implementation instance.

Returns

The AttentionMetadata provided during initialization.

classmethod get_impl_name() Union[str, Tuple[str]][source]#

Returns the registered name of this attention implementation.

Returns

The string “paged_attention”.