easydel.layers.attention_operator.modules.paged_attn#
- class easydel.layers.attention_operator.modules.paged_attn.PagedAttn(metadata: AttentionMetadata)[source]#
Bases:
AttentionImpl- forward_cpu(*args, **kwargs) AttentionOutput[source]#
CPU forward pass. Not implemented for Paged Attention.
- forward_cuda(*args, **kwargs) AttentionOutput[source]#
CUDA GPU forward pass. Not implemented for Paged Attention.
- forward_gpu(*args, **kwargs) AttentionOutput[source]#
GPU forward pass. Not implemented for Paged Attention.
- forward_native(*args, **kwargs) AttentionOutput[source]#
Native (CPU) forward pass. Not implemented for Paged Attention.
- forward_rocm(*args, **kwargs) AttentionOutput[source]#
ROCm GPU forward pass. Not implemented for Paged Attention.
- forward_tpu(q: Array, k: Array, v: Array, cache_view: PagedAttentionCacheView, cache_metadata: PagedAttentionMetadata, **ignore) AttentionOutput[source]#
TPU-specific implementation of the operation.
Defaults to calling forward_native. Subclasses can override this for TPU-specific optimizations.
- Parameters
*args – Positional arguments for the operation.
**kwargs – Keyword arguments for the operation.
- Returns
The result of the operation, potentially optimized for TPU.
- get_impl_metadata() AttentionMetadata[source]#
Returns the metadata associated with this attention implementation instance.
- Returns
The AttentionMetadata provided during initialization.