easydel.inference.esurge.core.page_pool#

Page pool management for KV-cache allocation.

Manages a pool of cache pages that can be allocated, freed, and cached for efficient memory management during inference.

Classes:

PagePool: Main page pool manager

Example

>>> pool = PagePool(num_pages=1000, enable_caching=True)
>>> pages = pool.allocate(num_pages=10)
>>> pool.free(pages)
class easydel.inference.esurge.core.page_pool.PagePool(num_pages: int, enable_caching: bool)[source]#

Bases: object

PagePool that manages CachePages. It provides methods to allocate, free and cache the kv cache pages. The free_page_queue stores the free pages in eviction order to enable allocation, free, and cache eviction. The cached_page_hash_to_page maps between page hash and cached page to support finding cached pages by their page hash.

Parameters
  • num_pages – The number of pages in the pool.

  • enable_caching – Whether to enable prefix caching.

cache_full_pages(request: EngineRequest, pages: list[easydel.inference.esurge.core.utils.CachePage], page_hashes: list[easydel.inference.esurge.core.utils.PageHash], num_cached_pages: int, num_full_pages: int, page_size: int, kv_cache_group_id: int) None[source]#

Cache a list of full pages for prefix caching. This function takes a list of pages that will have their page hash metadata to be updated and cached. Given a request, it computes the page hashes for the pages starting from num_cached_pages to num_full_pages, updating the metadata for each page and caching them in the cached_page_hash_to_page.

Parameters
  • request – The request to cache the pages.

  • pages – All pages in the request.

  • page_hashes – Page hashes of the pages in the request. Note that

  • the (this list may be shorter than the pages list. In this case) –

  • function. (missed page hash will be computed in this) –

  • num_cached_pages – The number of pages that are already cached.

  • num_full_pages – The number of pages that are full and should be cached after this function.

  • page_size – Number of tokens in each page.

  • kv_cache_group_id – The id of the KV cache group.

free_pages(ordered_pages: Iterable[CachePage]) None[source]#

Free a list of pages. The pages should be ordered by their eviction priority, where the first page will be evicted first.

Parameters

ordered_pages – A list of pages to free ordered by their eviction priority.

get_cached_page(page_hash: PageHash, kv_cache_group_ids: list[int]) list[easydel.inference.esurge.core.utils.CachePage] | None[source]#

Get the cached page by the page hash for each group in kv_cache_group_ids, or None if cache miss for any group. If there are duplicated pages, we return the first page in the cache.

Parameters
  • page_hash – The hash value of the page.

  • kv_cache_group_ids – The ids of the KV cache groups.

Returns

The cached pages if exists, or None.

get_new_pages(num_pages: int) list[easydel.inference.esurge.core.utils.CachePage][source]#

Get new pages from the free page pool.

Note that we do not check page cache in this function.

Parameters

num_pages – The number of pages to allocate.

Returns

A list of new page.

get_num_free_pages() int[source]#

Get the number of free pages in the pool.

Returns

The number of free pages.

get_usage() float[source]#

Get the KV cache usage.

Returns

The KV cache usage (between 0.0 and 1.0).

reset_prefix_cache() bool[source]#

Reset prefix cache. This function may be used in RLHF flows to invalid prefix caching after the weights are updated, or used for resetting prefix caching status for benchmarking.

Returns

True if the prefix cache is successfully reset, False otherwise.

Return type

bool

touch(pages: tuple[list[easydel.inference.esurge.core.utils.CachePage], ...]) None[source]#

Touch a page increases its reference count by 1, and may remove the page from the free queue. This is used when a page is hit by another request with the same prefix.

Parameters

pages – A list of pages to touch.