easydel.modules.dbrx.modeling_dbrx_flax#

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxAttention(*args: Any, **kwargs: Any)[source]#

Bases: AttentionModule

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxExpertGLU(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxExperts(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxFFN(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxForSequenceClassification(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Base DBRX Model outputting raw hidden-states.

This model is a Transformer-based model with a mixture of experts (MoE) architecture, implementing the DBRX architecture as described in the original paper.

The model uses specialized attention modules and a router-based MoE FFN layer.

property frequencies#

Retrieves or computes the frequency components (e.g., for RoPE) from the configuration.

Uses self.config.get_basic_frequencies() and caches the result.

Returns

The frequency components, potentially cached.

Return type

jnp.ndarray

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxNormAttentionNorm(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.dbrx.modeling_dbrx_flax.DbrxRouter(*args: Any, **kwargs: Any)[source]#

Bases: Module

jitter(x: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#