easydel.inference.esurge.core.page_pool

easydel.inference.esurge.core.page_pool#

Page pool management for KV-cache allocation.

Manages a pool of cache pages that can be allocated, freed, and cached for efficient memory management during inference.

Classes:: PagePool: Main page pool manager

Example

>>> pool = PagePool(num_pages=1000, enable_caching=True)
>>> pages = pool.allocate(num_pages=10)
>>> pool.free(pages)

class easydel.inference.esurge.core.page_pool.PagePool(num_pages: int, enable_caching: bool)[source]#

Bases: object

PagePool that manages CachePages. It provides methods to allocate, free and cache the kv cache pages. The free_page_queue stores the free pages in eviction order to enable allocation, free, and cache eviction. The cached_page_hash_to_page maps between page hash and cached page to support finding cached pages by their page hash.

Parameters

num_pages – The number of pages in the pool.
enable_caching – Whether to enable prefix caching.

cache_full_pages(request: EngineRequest, pages: list[easydel.inference.esurge.core.utils.CachePage], page_hashes: list[easydel.inference.esurge.core.utils.PageHash], num_cached_pages: int, num_full_pages: int, page_size: int, kv_cache_group_id: int) → None[source]#

Cache a list of full pages for prefix caching. This function takes a list of pages that will have their page hash metadata to be updated and cached. Given a request, it computes the page hashes for the pages starting from num_cached_pages to num_full_pages, updating the metadata for each page and caching them in the cached_page_hash_to_page.

Parameters

request – The request to cache the pages.
pages – All pages in the request.
page_hashes – Page hashes of the pages in the request. Note that
the (this list may be shorter than the pages list. In this case) –
function. (missed page hash will be computed in this) –
num_cached_pages – The number of pages that are already cached.
num_full_pages – The number of pages that are full and should be cached after this function.
page_size – Number of tokens in each page.
kv_cache_group_id – The id of the KV cache group.

free_pages(ordered_pages: Iterable[CachePage]) → None[source]#

Free a list of pages. The pages should be ordered by their eviction priority, where the first page will be evicted first.

Parameters: ordered_pages – A list of pages to free ordered by their eviction priority.

get_cached_page(page_hash: PageHash, kv_cache_group_ids: list[int]) → list[easydel.inference.esurge.core.utils.CachePage] | None[source]#

Get the cached page by the page hash for each group in kv_cache_group_ids, or None if cache miss for any group. If there are duplicated pages, we return the first page in the cache.

Parameters

page_hash – The hash value of the page.
kv_cache_group_ids – The ids of the KV cache groups.

Returns

The cached pages if exists, or None.

get_new_pages(num_pages: int) → list[easydel.inference.esurge.core.utils.CachePage][source]#

Get new pages from the free page pool.

Note that we do not check page cache in this function.

Parameters: num_pages – The number of pages to allocate.
Returns: A list of new page.

get_num_free_pages() → int[source]#

Get the number of free pages in the pool.

Returns: The number of free pages.

get_usage() → float[source]#

Get the KV cache usage.

Returns: The KV cache usage (between 0.0 and 1.0).

reset_prefix_cache() → bool[source]#

Reset prefix cache. This function may be used in RLHF flows to invalid prefix caching after the weights are updated, or used for resetting prefix caching status for benchmarking.

Returns: True if the prefix cache is successfully reset, False otherwise.
Return type: bool

touch(pages: tuple[list[easydel.inference.esurge.core.utils.CachePage], ...]) → None[source]#

Touch a page increases its reference count by 1, and may remove the page from the free queue. This is used when a page is hit by another request with the same prefix.

Parameters: pages – A list of pages to touch.

easydel.inference.esurge.core.page_pool

Contents

easydel.inference.esurge.core.page_pool#