easydel.inference.esurge.core.utils#
- class easydel.inference.esurge.core.utils.CachePage(page_id: int, ref_cnt: int = 0, _page_hash: easydel.inference.esurge.core.utils.PageHashWithGroupId | None = None, prev_free_page: Optional[CachePage] = None, next_free_page: Optional[CachePage] = None, is_null: bool = False)[source]#
Bases:
objectKV-cache page metadata.
- is_null: bool = False#
- property page_hash: easydel.inference.esurge.core.utils.PageHashWithGroupId | None#
- page_id: int#
- ref_cnt: int = 0#
- class easydel.inference.esurge.core.utils.FreeCachePageQueue(pages: list[easydel.inference.esurge.core.utils.CachePage])[source]#
Bases:
objectThis class organizes a list of CachePage objects to a doubly linked list of free pages. We implement this class instead of using Python builtin deque to support removing a page in the middle of the queue in O(1) time. To close the performance gap to the builtin deque which is implemented in C++, this class does not allocate any Python objects when manipulating the linked list. Instead, this class manipulates the prev_free_page and next_free_page attributes of the given pages.
The queue is ordered by page ID in the beginning. When a page is allocated and then freed, it will be appended back with the eviction order: 1. The least recent used page is at the front (LRU). 2. If two pages have the same last accessed time (allocated by the
same sequence), the one with more hash tokens (the tail of a page chain) is at the front.
Note that we maintain this order by reversing the page order when free pages of a request. This operation is outside of this class.
- Parameters
pages – A list of CachePage objects.
- append(page: CachePage) None[source]#
Put a page back into the free list and increase num_free_pages by 1.
- Parameters
page – The page to append.
- append_n(pages: list[easydel.inference.esurge.core.utils.CachePage]) None[source]#
Put a list of pages back into the free list
- Parameters
pages – The pages to append.
- get_all_free_pages() list[easydel.inference.esurge.core.utils.CachePage][source]#
Get all free pages in the free list. Mainly used for testing.
- Returns
A list of free pages.
- popleft() CachePage[source]#
Pop the first free page and reduce num_free_pages by 1.
- Returns
The first free page.
- popleft_n(n: int) list[easydel.inference.esurge.core.utils.CachePage][source]#
Pop the first n free pages and reduce num_free_pages by n.
- Parameters
n – The number of pages to pop.
- Returns
A list of n free pages.
- easydel.inference.esurge.core.utils.NONE_HASH: int#
- class easydel.inference.esurge.core.utils.PageHash(hash_value: int, token_ids: tuple[int, ...], extra_keys: Any | None = None)[source]#
Bases:
NamedTupleHash value of a page (int), the token IDs in the page, and extra keys. We keep a tuple of token IDs and extra keys to reduce the likelihood of hash collisions when the hash value is the same. By using SHA256 however, hash collisions are practically impossible.
- hash_value: int#
Alias for field number 0
- token_ids: tuple[int, ...]#
Alias for field number 1
- class easydel.inference.esurge.core.utils.PageHashWithGroupId(page_hash, group_id)[source]#
Bases:
NamedTuple- group_id: int#
Alias for field number 1
- easydel.inference.esurge.core.utils.hash_page_tokens(hash_function: Callable, parent_page_hash: int | None, curr_page_token_ids: Sequence[int], extra_keys: tuple[Any, ...] | None = None) PageHash[source]#
Computes a hash value corresponding to the contents of a page and the contents of the preceding page(s). The hash value is used for prefix caching. We use LRU cache for this function to avoid recomputing hash values for the same page contents.
- Parameters
parent_page_hash – The hash of the parent page. None if this is the first page.
curr_page_token_ids – A list of token ids in the current page. The current page is assumed to be full.
extra_keys – Extra keys for the page.
- Returns
The hash value of the page and the token ids in the page. The entire tuple is used as the hash key of the page.
- easydel.inference.esurge.core.utils.hash_request_tokens(hash_function: Any, page_size: int, request: EngineRequest) list[easydel.inference.esurge.core.utils.PageHash][source]#
Computes hash values of a chain of pages given a sequence of token IDs. The hash value is used for prefix caching.
- Parameters
page_size – The size of each page.
request – The request object.
- Returns
The list of computed hash values.