easydel.modules.dbrx.modeling_dbrx_flax#
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxAttention(*args: Any, **kwargs: Any)[source]#
Bases:
AttentionModule
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxBlock(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxExpertGLU(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxExperts(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxFFN(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxForCausalLM(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxForSequenceClassification(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule
- class easydel.modules.dbrx.modeling_dbrx_flax.DbrxModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleBase DBRX Model outputting raw hidden-states.
This model is a Transformer-based model with a mixture of experts (MoE) architecture, implementing the DBRX architecture as described in the original paper.
The model uses specialized attention modules and a router-based MoE FFN layer.
- property frequencies#
Retrieves or computes the frequency components (e.g., for RoPE) from the configuration.
Uses self.config.get_basic_frequencies() and caches the result.
- Returns
The frequency components, potentially cached.
- Return type
jnp.ndarray