easydel.inference.esurge.page_table#
Page table management for KV-cache allocation.
This module provides class-based page table management with dual CPU/GPU representation.
- class easydel.inference.esurge.page_table.MultiGroupPageTable(max_num_reqs: int, max_model_len: int, max_num_batched_tokens: int, page_sizes: list[int])[source]#
Bases:
objectMulti-group page table for grouped-query attention.
Manages multiple PageTable instances, one per KV-cache group. This is used when the model uses grouped-query attention (GQA) where different attention heads may share KV-caches.
- page_tables#
List of PageTable instances, one per group.
Note
All operations coordinate across all groups and modify state in-place.
- add_row(page_ids: list[list[int]], row_idx: int) None[source]#
Replace a row across all groups.
- Parameters
page_ids – List of new page ID lists, one per group.
row_idx – Row index to replace.
Note
Call commit() to sync changes to GPU.
- append_row(page_ids: list[list[int]], row_idx: int) None[source]#
Append pages to a row across all groups.
- Parameters
page_ids – List of page ID lists, one per group.
row_idx – Row index to append to.
Note
The length of page_ids must match the number of groups. Call commit() to sync changes to GPU.
- append_rows_batch(page_ids_per_req: list[list[list[int]]], req_indices: list[int]) None[source]#
Batch append pages across all groups.
- Parameters
page_ids_per_req – List of page ID lists per group per request. Shape: [num_requests][num_groups][variable_length_pages]
req_indices – Row indices to append to.
Note
Call commit() to sync changes to GPU after batch operations.
Example
>>> # Two requests, two groups >>> multi_table.append_rows_batch( ... [[[10, 11], [20, 21]], # Request 0: group 0 and 1 pages ... [[30], [40, 41, 42]]], # Request 1: group 0 and 1 pages ... [0, 1] # Append to rows 0 and 1 ... ) >>> multi_table.commit(2)
- clear() None[source]#
Clear all page tables across all groups.
Note
This clears both CPU and GPU arrays for all groups.
- commit(num_reqs: int) None[source]#
Commit CPU modifications to GPU for all groups.
- Parameters
num_reqs – Number of request rows to commit.
Note
This commits changes across all page table groups.
- class easydel.inference.esurge.page_table.PageTable(max_num_reqs: int, max_num_pages_per_req: int, max_num_batched_tokens: int)[source]#
Bases:
objectManages page allocation for paged KV-cache layouts.
A class-based page table manager with dual CPU/GPU representation for efficient host-device synchronization.
The page table maintains a 2D array where each row corresponds to a request and contains the page IDs allocated to that request. CPU-side modifications are explicitly committed to the GPU via the commit() method.
- max_num_reqs#
Maximum number of concurrent requests.
- max_num_pages_per_req#
Maximum pages allocatable per request.
- max_num_batched_tokens#
Maximum tokens processable in a batch.
- page_table#
JAX device array [max_num_reqs, max_num_pages_per_req].
- page_table_cpu#
NumPy CPU array [max_num_reqs, max_num_pages_per_req].
- num_pages_per_row#
NumPy array tracking valid pages [max_num_reqs].
Note
All modifications operate on CPU arrays and require explicit commit() to synchronize with the GPU.
- add_row(page_ids: list[int], row_idx: int) None[source]#
Replace a row with new page IDs.
Resets the row to empty, then adds the provided page IDs to the CPU array. Changes are not visible on GPU until commit() is called.
- Parameters
page_ids – New page IDs for the row.
row_idx – Index of the row to replace.
Note
This is equivalent to clearing the row and then appending. Call commit() to sync changes to GPU.
- append_row(page_ids: list[int], row_idx: int) None[source]#
Append page IDs to a single row.
Adds new pages to the end of an existing row’s page list in the CPU array. Changes are not visible on GPU until commit() is called.
- Parameters
page_ids – Page IDs to append.
row_idx – Index of the row to append to.
Note
If page_ids is empty, this is a no-op. Call commit() to sync changes to GPU.
- clear() None[source]#
Clear all data in the page table.
Resets both CPU and GPU arrays to zero.
Note
This clears both the page table and the page counts.
- commit(num_reqs: int) None[source]#
Commit CPU modifications to GPU.
Copies the first num_reqs rows from CPU array to GPU array. This synchronizes modifications made via append_row, add_row, move_row, or swap_row.
- Parameters
num_reqs – Number of request rows to commit.
Note
Uses non-blocking transfer for better performance.
- get_cpu_tensor() ndarray[source]#
Get the CPU tensor of the page table.
- Returns
The 2D NumPy array on CPU [max_num_reqs, max_num_pages_per_req].
Note
This returns the CPU-side array where modifications are made.
- get_device_tensor() Array[source]#
Get the GPU device tensor of the page table.
- Returns
The 2D JAX array on device [max_num_reqs, max_num_pages_per_req].
Note
This returns the GPU-side array. It may not reflect recent CPU-side modifications until commit() is called.
- move_row(src: int, tgt: int) None[source]#
Move row content from source to target.
Copies the page IDs and count from source row to target row in the CPU array. Only copies the valid pages (up to source length).
- Parameters
src – Source row index.
tgt – Target row index.
Note
The source row is not cleared; use this for copying. Call commit() to sync changes to GPU.
- swap_row(src: int, tgt: int) None[source]#
Swap two rows in the page table.
Exchanges both page IDs and page counts between two rows in the CPU array.
- Parameters
src – First row index.
tgt – Second row index.
Note
This is a full swap including both content and metadata. Call commit() to sync changes to GPU.