easydel.modules.deepseek_v3.modeling_deepseek#
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3Attention(*args: Any, **kwargs: Any)[source]#
Bases:
UnifiedAttentionDeepSeek V3 Multi-head Latent Attention.
Inherits MLA implementation from UnifiedAttention base class.
- define_network(config: DeepseekV3Config, dtype: dtype, param_dtype: dtype, precision: Precision, rngs: Rngs)[source]#
Define MLA-specific network structure.
- projection_mapping: ClassVar[dict[str, str]] = {'mla_kv_a_layernorm': 'kv_a_layernorm', 'mla_kv_a_proj_with_mqa': 'kv_a_proj_with_mqa', 'mla_kv_b_proj': 'kv_b_proj', 'mla_q_a_layernorm': 'q_a_layernorm', 'mla_q_a_proj': 'q_a_proj', 'mla_q_b_proj': 'q_b_proj', 'mla_q_proj': 'q_proj', 'output_projection': 'o_proj'}#
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3DecoderLayer(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleSingle DeepSeek V3 transformer block with MLA attention and optional MoE MLP.
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3ForCausalLM(*args: Any, **kwargs: Any)[source]#
Bases:
BaseCausalLMModule[DeepseekV3Model,DeepseekV3Config]DeepseekV3 model with a language modeling head for causal language modeling tasks.
This model extends the base DeepseekV3Model by adding a linear language modeling head on top of the transformer model. It incorporates Mixture of Experts (MoE) architecture and is designed for generative tasks and text generation.
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MLP(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleStandard DeepSeek V3 feed-forward network used in dense decoder layers.
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MLPMoE(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleMixture-of-experts feed-forward module parameterized by the DeepSeek V3 config.
- reform_param: ClassVar = {'down_proj$': {'inverse_spliter': <function DeepseekV3MLPMoE.<lambda>>, 'splits': [{'name': 'down_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}]}, 'gate_up_proj$': {'inverse_spliter': <function DeepseekV3MLPMoE.<lambda>>, 'splits': [{'name': 'gate_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}, {'name': 'up_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}]}}#
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MoE(*args: Any, **kwargs: Any)[source]#
Bases:
BaseMoeModuleWraps gating and expert networks to apply DeepSeek V3 MoE feed-forward processing.
- class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3Model(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleFull DeepSeek V3 decoder-only transformer composed of MLA blocks and MoE feed-forward layers.
- property frequencies#
Compute RoPE frequencies using config’s get_basic_frequencies method.