easydel.modules.xerxes.modeling_xerxes#

class easydel.modules.xerxes.modeling_xerxes.Identity(*args: Any, **kwargs: Any)[source]#

Bases: Module

No-op module used as a placeholder when optional layers are disabled.

class easydel.modules.xerxes.modeling_xerxes.PostCross(*args: Any, **kwargs: Any)[source]#

Bases: Module

Applies a bounded tanh transform after cross attention.

class easydel.modules.xerxes.modeling_xerxes.XerxesAttention(*args: Any, **kwargs: Any)[source]#

Bases: UnifiedAttention

Xerxes Attention with conditional Q/K normalization.

Inherits Q/K normalization from QKNormAttention. Features: - Conditional Q/K normalization via xe_kvnorm flag - Layer-specific sliding window (different patterns based on layer_idx or window_pattern)

class easydel.modules.xerxes.modeling_xerxes.XerxesDecoderLayer(*args: Any, **kwargs: Any)[source]#

Bases: Module

Transformer decoder block with optional cross-attention and MoE.

class easydel.modules.xerxes.modeling_xerxes.XerxesForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Xerxes language model with LM head for causal generation.

get_decoder()[source]#

Returns the decoder part of the model’s graph definition.

get_embedding()[source]#

Returns the embedding layer of the module.

get_encoder()[source]#

Returns the encoder part of the model’s graph definition. Decoder-Only models don’t have an encoder.

get_lm_head()[source]#

Returns the language model head of the module.

class easydel.modules.xerxes.modeling_xerxes.XerxesMLP(*args: Any, **kwargs: Any)[source]#

Bases: Module

Feed-forward network for Xerxes decoder blocks.

class easydel.modules.xerxes.modeling_xerxes.XerxesModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Xerxes decoder stack wiring embeddings, decoder layers, and final norm.

property default_frequencies#
get_decoder()[source]#

Returns the decoder part of the model’s graph definition.

get_embedding()[source]#

Returns the embedding layer of the module.

get_encoder()[source]#

Returns the encoder part of the model’s graph definition. Decoder-Only models don’t have an encoder.

get_lm_head()[source]#

Returns the language model head of the module. Base Models don’t have a Language Model Head.

class easydel.modules.xerxes.modeling_xerxes.XerxesSparseMoeBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

Sparse mixture-of-experts feed-forward block used in selected layers.