easydel.inference.esurge.core.utils

easydel.inference.esurge.core.utils#

class easydel.inference.esurge.core.utils.CachePage(page_id: int, ref_cnt: int = 0, _page_hash: easydel.inference.esurge.core.utils.PageHashWithGroupId | None = None, prev_free_page: Optional[CachePage] = None, next_free_page: Optional[CachePage] = None, is_null: bool = False)[source]#

Bases: object

KV-cache page metadata.

decr_ref()[source]#

incr_ref()[source]#

is_null: bool = False#

next_free_page: Optional[CachePage] = None#

property page_hash: easydel.inference.esurge.core.utils.PageHashWithGroupId | None#

page_id: int#

prev_free_page: Optional[CachePage] = None#

ref_cnt: int = 0#

reset_hash()[source]#: Reset the page hash when the page is evicted.

class easydel.inference.esurge.core.utils.FreeCachePageQueue(pages: list[easydel.inference.esurge.core.utils.CachePage])[source]#

Bases: object

This class organizes a list of CachePage objects to a doubly linked list of free pages. We implement this class instead of using Python builtin deque to support removing a page in the middle of the queue in O(1) time. To close the performance gap to the builtin deque which is implemented in C++, this class does not allocate any Python objects when manipulating the linked list. Instead, this class manipulates the prev_free_page and next_free_page attributes of the given pages.

The queue is ordered by page ID in the beginning. When a page is allocated and then freed, it will be appended back with the eviction order: 1. The least recent used page is at the front (LRU). 2. If two pages have the same last accessed time (allocated by the

same sequence), the one with more hash tokens (the tail of a page chain) is at the front.

Note that we maintain this order by reversing the page order when free pages of a request. This operation is outside of this class.

Parameters: pages – A list of CachePage objects.

append(page: CachePage) → None[source]#

Put a page back into the free list and increase num_free_pages by 1.

Parameters: page – The page to append.

append_n(pages: list[easydel.inference.esurge.core.utils.CachePage]) → None[source]#

Put a list of pages back into the free list

Parameters: pages – The pages to append.

get_all_free_pages() → list[easydel.inference.esurge.core.utils.CachePage][source]#

Get all free pages in the free list. Mainly used for testing.

Returns: A list of free pages.

popleft() → CachePage[source]#

Pop the first free page and reduce num_free_pages by 1.

Returns: The first free page.

popleft_n(n: int) → list[easydel.inference.esurge.core.utils.CachePage][source]#

Pop the first n free pages and reduce num_free_pages by n.

Parameters: n – The number of pages to pop.
Returns: A list of n free pages.

remove(page: CachePage) → None[source]#

Remove a page in the free list and reduce num_free_pages by 1.

Parameters: page – The page to remove.

easydel.inference.esurge.core.utils.NONE_HASH: int#

class easydel.inference.esurge.core.utils.PageHash(hash_value: int, token_ids: tuple[int, ...], extra_keys: Any | None = None)[source]#

Bases: NamedTuple

Hash value of a page (int), the token IDs in the page, and extra keys. We keep a tuple of token IDs and extra keys to reduce the likelihood of hash collisions when the hash value is the same. By using SHA256 however, hash collisions are practically impossible.

extra_keys: Any | None#: Alias for field number 2

hash_value: int#: Alias for field number 0

token_ids: tuple[int, ...]#: Alias for field number 1

class easydel.inference.esurge.core.utils.PageHashWithGroupId(page_hash, group_id)[source]#

Bases: NamedTuple

get_hash_value() → int[source]#

group_id: int#: Alias for field number 1

page_hash: PageHash#: Alias for field number 0

easydel.inference.esurge.core.utils.hash_page_tokens(hash_function: Callable, parent_page_hash: int | None, curr_page_token_ids: Sequence[int], extra_keys: tuple[Any, ...] | None = None) → PageHash[source]#

Computes a hash value corresponding to the contents of a page and the contents of the preceding page(s). The hash value is used for prefix caching. We use LRU cache for this function to avoid recomputing hash values for the same page contents.

Parameters

parent_page_hash – The hash of the parent page. None if this is the first page.
curr_page_token_ids – A list of token ids in the current page. The current page is assumed to be full.
extra_keys – Extra keys for the page.

Returns

The hash value of the page and the token ids in the page. The entire tuple is used as the hash key of the page.

easydel.inference.esurge.core.utils.hash_request_tokens(hash_function: Any, page_size: int, request: EngineRequest) → list[easydel.inference.esurge.core.utils.PageHash][source]#

Computes hash values of a chain of pages given a sequence of token IDs. The hash value is used for prefix caching.

Parameters

page_size – The size of each page.
request – The request object.

Returns

The list of computed hash values.

easydel.inference.esurge.core.utils.init_none_hash()[source]#

easydel.inference.esurge.core.utils

Contents

easydel.inference.esurge.core.utils#