easydel.modules.qwen2.modeling_qwen#
- class easydel.modules.qwen2.modeling_qwen.Qwen2Attention(*args: Any, **kwargs: Any)[source]#
Bases:
UnifiedAttentionQwen2 Attention module with sliding window support.
Inherits from UnifiedAttention with Qwen2-specific customizations: - Sliding window attention (layer-specific) - Custom bias configuration (Q/K/V use bias, O doesn’t) - Residual dropout
- class easydel.modules.qwen2.modeling_qwen.Qwen2DecoderLayer(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleQwen2 Transformer Decoder Layer.
This module represents a single decoder layer in the Qwen2 model, combining self-attention and MLP sub-layers with residual connections and RMS normalization.
- config#
Configuration object for the model.
- Type
- dtype#
Data type for computations.
- Type
jnp.dtype
- param_dtype#
Data type for parameters.
- Type
jnp.dtype
- precision#
Precision setting for JAX operations.
- Type
jax.lax.PrecisionLike
- rngs#
Random number generators.
- Type
nn.Rngs
- self_attn#
The self-attention module.
- Type
- class easydel.modules.qwen2.modeling_qwen.Qwen2ForCausalLM(*args: Any, **kwargs: Any)[source]#
Bases:
BaseCausalLMModule[Qwen2Model,Qwen2Config]Qwen2 model with a Causal Language Modeling head.
- class easydel.modules.qwen2.modeling_qwen.Qwen2ForSequenceClassification(*args: Any, **kwargs: Any)[source]#
Bases:
BaseSequenceClassificationModule[Qwen2Model,Qwen2Config]Qwen2 model with a Sequence Classification head.
- class easydel.modules.qwen2.modeling_qwen.Qwen2MLP(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleQwen2 MLP module.
This module implements the feed-forward network (MLP) used in the Qwen2 model. It uses a Gated Linear Unit (GLU) structure with SiLU activation and includes dropout.
- config#
Configuration object for the model.
- Type
- dtype#
Data type for computations.
- Type
jnp.dtype
- param_dtype#
Data type for parameters.
- Type
jnp.dtype
- precision#
Precision setting for JAX operations.
- Type
jax.lax.PrecisionLike
- gate_proj#
Linear layer for the GLU gate.
- Type
- down_proj#
Linear layer for the down projection.
- Type
- up_proj#
Linear layer for the GLU value.
- Type
- dropout#
Dropout layer applied to the output.
- Type
nn.Dropout
- act_fn#
Activation function (SiLU).
- Type
callable
- class easydel.modules.qwen2.modeling_qwen.Qwen2Model(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleThe base Qwen2 model transformer.
This class represents the core transformer architecture of the Qwen2 model, consisting of an embedding layer, multiple Qwen2DecoderLayer layers, and a final RMS normalization layer.
- config#
Configuration object for the model.
- Type
- dtype#
Data type for computation.
- Type
jnp.dtype
- param_dtype#
Data type for parameters.
- Type
jnp.dtype
- precision#
Precision setting for JAX operations.
- Type
jax.lax.PrecisionLike
- rngs#
Random number generators.
- Type
nn.Rngs
- embed_tokens#
Embedding layer for input tokens.
- Type
nn.Embed
- layers#
List of decoder layers.
- Type
tp.List[Qwen2DecoderLayer]
- dropout#
Dropout layer applied after embeddings.
- Type
nn.Dropout
- gradient_checkpointing#
Gradient checkpointing configuration.