easydel.inference.esurge.page_table

easydel.inference.esurge.page_table#

Page table management for KV-cache allocation.

This module provides class-based page table management with dual CPU/GPU representation.

class easydel.inference.esurge.page_table.MultiGroupPageTable(max_num_reqs: int, max_model_len: int, max_num_batched_tokens: int, page_sizes: list[int])[source]#

Bases: object

Multi-group page table for grouped-query attention.

Manages multiple PageTable instances, one per KV-cache group. This is used when the model uses grouped-query attention (GQA) where different attention heads may share KV-caches.

page_tables#: List of PageTable instances, one per group.

Note

All operations coordinate across all groups and modify state in-place.

add_row(page_ids: list[list[int]], row_idx: int) → None[source]#

Replace a row across all groups.

Parameters

page_ids – List of new page ID lists, one per group.
row_idx – Row index to replace.

Note

Call commit() to sync changes to GPU.

append_row(page_ids: list[list[int]], row_idx: int) → None[source]#

Append pages to a row across all groups.

Parameters

page_ids – List of page ID lists, one per group.
row_idx – Row index to append to.

Note

The length of page_ids must match the number of groups. Call commit() to sync changes to GPU.

append_rows_batch(page_ids_per_req: list[list[list[int]]], req_indices: list[int]) → None[source]#

Batch append pages across all groups.

Parameters

page_ids_per_req – List of page ID lists per group per request. Shape: [num_requests][num_groups][variable_length_pages]
req_indices – Row indices to append to.

Note

Call commit() to sync changes to GPU after batch operations.

Example

>>> # Two requests, two groups
>>> multi_table.append_rows_batch(
...     [[[10, 11], [20, 21]],      # Request 0: group 0 and 1 pages
...      [[30], [40, 41, 42]]],     # Request 1: group 0 and 1 pages
...     [0, 1]                       # Append to rows 0 and 1
... )
>>> multi_table.commit(2)

clear() → None[source]#: Clear all page tables across all groups.

Note

This clears both CPU and GPU arrays for all groups.

commit(num_reqs: int) → None[source]#

Commit CPU modifications to GPU for all groups.

Parameters: num_reqs – Number of request rows to commit.

Note

This commits changes across all page table groups.

move_row(src: int, tgt: int) → None[source]#

Move a row across all groups.

Parameters

src – Source row index.
tgt – Target row index.

Note

Call commit() to sync changes to GPU.

swap_row(src: int, tgt: int) → None[source]#

Swap two rows across all groups.

Parameters

src – First row index.
tgt – Second row index.

Note

Call commit() to sync changes to GPU.

class easydel.inference.esurge.page_table.PageTable(max_num_reqs: int, max_num_pages_per_req: int, max_num_batched_tokens: int)[source]#

Bases: object

Manages page allocation for paged KV-cache layouts.

A class-based page table manager with dual CPU/GPU representation for efficient host-device synchronization.

The page table maintains a 2D array where each row corresponds to a request and contains the page IDs allocated to that request. CPU-side modifications are explicitly committed to the GPU via the commit() method.

max_num_reqs#: Maximum number of concurrent requests.

max_num_pages_per_req#: Maximum pages allocatable per request.

max_num_batched_tokens#: Maximum tokens processable in a batch.

page_table#: JAX device array [max_num_reqs, max_num_pages_per_req].

page_table_cpu#: NumPy CPU array [max_num_reqs, max_num_pages_per_req].

num_pages_per_row#: NumPy array tracking valid pages [max_num_reqs].

Note

All modifications operate on CPU arrays and require explicit commit() to synchronize with the GPU.

add_row(page_ids: list[int], row_idx: int) → None[source]#

Replace a row with new page IDs.

Resets the row to empty, then adds the provided page IDs to the CPU array. Changes are not visible on GPU until commit() is called.

Parameters

page_ids – New page IDs for the row.
row_idx – Index of the row to replace.

Note

This is equivalent to clearing the row and then appending. Call commit() to sync changes to GPU.

append_row(page_ids: list[int], row_idx: int) → None[source]#

Append page IDs to a single row.

Adds new pages to the end of an existing row’s page list in the CPU array. Changes are not visible on GPU until commit() is called.

Parameters

page_ids – Page IDs to append.
row_idx – Index of the row to append to.

Note

If page_ids is empty, this is a no-op. Call commit() to sync changes to GPU.

clear() → None[source]#

Clear all data in the page table.

Resets both CPU and GPU arrays to zero.

Note

This clears both the page table and the page counts.

commit(num_reqs: int) → None[source]#

Commit CPU modifications to GPU.

Copies the first num_reqs rows from CPU array to GPU array. This synchronizes modifications made via append_row, add_row, move_row, or swap_row.

Parameters: num_reqs – Number of request rows to commit.

Note

Uses non-blocking transfer for better performance.

get_cpu_tensor() → ndarray[source]#

Get the CPU tensor of the page table.

Returns: The 2D NumPy array on CPU [max_num_reqs, max_num_pages_per_req].

Note

This returns the CPU-side array where modifications are made.

get_device_tensor() → Array[source]#

Get the GPU device tensor of the page table.

Returns: The 2D JAX array on device [max_num_reqs, max_num_pages_per_req].

Note

This returns the GPU-side array. It may not reflect recent CPU-side modifications until commit() is called.

move_row(src: int, tgt: int) → None[source]#

Move row content from source to target.

Copies the page IDs and count from source row to target row in the CPU array. Only copies the valid pages (up to source length).

Parameters

src – Source row index.
tgt – Target row index.

Note

The source row is not cleared; use this for copying. Call commit() to sync changes to GPU.

swap_row(src: int, tgt: int) → None[source]#

Swap two rows in the page table.

Exchanges both page IDs and page counts between two rows in the CPU array.

Parameters

src – First row index.
tgt – Second row index.

Note

This is a full swap including both content and metadata. Call commit() to sync changes to GPU.

easydel.inference.esurge.page_table.cdiv(a: int, b: int) → int[source]#

Compute ceiling division.

Parameters

a – Dividend.
b – Divisor.

Returns

The ceiling of a/b.

Example

>>> cdiv(7, 3)  # Returns 3
>>> cdiv(6, 3)  # Returns 2
>>> cdiv(5, 3)  # Returns 2

easydel.inference.esurge.page_table

Contents

easydel.inference.esurge.page_table#