easydel.inference.esurge.core.page_pool#
Page pool management for KV-cache allocation.
Manages a pool of cache pages that can be allocated, freed, and cached for efficient memory management during inference.
- Classes:
PagePool: Main page pool manager
Example
>>> pool = PagePool(num_pages=1000, enable_caching=True)
>>> pages = pool.allocate(num_pages=10)
>>> pool.free(pages)
- class easydel.inference.esurge.core.page_pool.PagePool(num_pages: int, enable_caching: bool)[source]#
Bases:
objectPagePool that manages CachePages. It provides methods to allocate, free and cache the kv cache pages. The free_page_queue stores the free pages in eviction order to enable allocation, free, and cache eviction. The cached_page_hash_to_page maps between page hash and cached page to support finding cached pages by their page hash.
- Parameters
num_pages – The number of pages in the pool.
enable_caching – Whether to enable prefix caching.
- cache_full_pages(request: EngineRequest, pages: list[easydel.inference.esurge.core.utils.CachePage], page_hashes: list[easydel.inference.esurge.core.utils.PageHash], num_cached_pages: int, num_full_pages: int, page_size: int, kv_cache_group_id: int) None[source]#
Cache a list of full pages for prefix caching. This function takes a list of pages that will have their page hash metadata to be updated and cached. Given a request, it computes the page hashes for the pages starting from num_cached_pages to num_full_pages, updating the metadata for each page and caching them in the cached_page_hash_to_page.
- Parameters
request – The request to cache the pages.
pages – All pages in the request.
page_hashes – Page hashes of the pages in the request. Note that
the (this list may be shorter than the pages list. In this case) –
function. (missed page hash will be computed in this) –
num_cached_pages – The number of pages that are already cached.
num_full_pages – The number of pages that are full and should be cached after this function.
page_size – Number of tokens in each page.
kv_cache_group_id – The id of the KV cache group.
- free_pages(ordered_pages: Iterable[CachePage]) None[source]#
Free a list of pages. The pages should be ordered by their eviction priority, where the first page will be evicted first.
- Parameters
ordered_pages – A list of pages to free ordered by their eviction priority.
- get_cached_page(page_hash: PageHash, kv_cache_group_ids: list[int]) list[easydel.inference.esurge.core.utils.CachePage] | None[source]#
Get the cached page by the page hash for each group in kv_cache_group_ids, or None if cache miss for any group. If there are duplicated pages, we return the first page in the cache.
- Parameters
page_hash – The hash value of the page.
kv_cache_group_ids – The ids of the KV cache groups.
- Returns
The cached pages if exists, or None.
- get_new_pages(num_pages: int) list[easydel.inference.esurge.core.utils.CachePage][source]#
Get new pages from the free page pool.
Note that we do not check page cache in this function.
- Parameters
num_pages – The number of pages to allocate.
- Returns
A list of new page.
- get_num_free_pages() int[source]#
Get the number of free pages in the pool.
- Returns
The number of free pages.
- get_usage() float[source]#
Get the KV cache usage.
- Returns
The KV cache usage (between 0.0 and 1.0).
- reset_prefix_cache() bool[source]#
Reset prefix cache. This function may be used in RLHF flows to invalid prefix caching after the weights are updated, or used for resetting prefix caching status for benchmarking.
- Returns
True if the prefix cache is successfully reset, False otherwise.
- Return type
bool
- touch(pages: tuple[list[easydel.inference.esurge.core.utils.CachePage], ...]) None[source]#
Touch a page increases its reference count by 1, and may remove the page from the free queue. This is used when a page is hit by another request with the same prefix.
- Parameters
pages – A list of pages to touch.