easydel.modules.deepseek_v2.modeling_deepseek#
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2Attention(*args: Any, **kwargs: Any)[source]#
Bases:
UnifiedAttentionDeepSeek V2 Multi-head Latent Attention.
Inherits MLA implementation from UnifiedAttention base class.
- define_network(config: DeepseekV2Config, dtype: dtype, param_dtype: dtype, precision: Precision, rngs: Rngs)[source]#
Define MLA-specific network structure.
- projection_mapping: ClassVar[dict[str, str]] = {'mla_kv_a_layernorm': 'kv_a_layernorm', 'mla_kv_a_proj_with_mqa': 'kv_a_proj_with_mqa', 'mla_kv_b_proj': 'kv_b_proj', 'mla_q_a_layernorm': 'q_a_layernorm', 'mla_q_a_proj': 'q_a_proj', 'mla_q_b_proj': 'q_b_proj', 'mla_q_proj': 'q_proj', 'output_projection': 'o_proj'}#
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2DecoderLayer(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleSingle DeepSeek V2 transformer block with MLA attention and optional MoE MLP.
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2ForCausalLM(*args: Any, **kwargs: Any)[source]#
Bases:
BaseCausalLMModule[DeepseekV2Model,DeepseekV2Config]DeepseekV2 model with a language modeling head for causal language modeling tasks.
This model extends the base DeepseekV2Model by adding a linear language modeling head on top of the transformer model. It’s designed for generative tasks and can be used for text generation.
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2MLP(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleStandard DeepSeek V2 feed-forward block for dense layers.
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2MLPMoE(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleMixture-of-experts feed-forward used in DeepSeek V2 MoE layers.
- reform_param: ClassVar = {'down_proj$': {'inverse_spliter': <function DeepseekV2MLPMoE.<lambda>>, 'splits': [{'name': 'down_proj.kernel', 'spliter': <function DeepseekV2MLPMoE.<lambda>>}]}, 'gate_up_proj$': {'inverse_spliter': <function DeepseekV2MLPMoE.<lambda>>, 'splits': [{'name': 'gate_proj.kernel', 'spliter': <function DeepseekV2MLPMoE.<lambda>>}, {'name': 'up_proj.kernel', 'spliter': <function DeepseekV2MLPMoE.<lambda>>}]}}#
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2MoE(*args: Any, **kwargs: Any)[source]#
Bases:
BaseMoeModuleWraps gating and experts to apply DeepSeek V2 mixture-of-experts feed-forward.
- class easydel.modules.deepseek_v2.modeling_deepseek.DeepseekV2Model(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleDeepSeek V2 decoder stack connecting embeddings, decoder layers, and final norm.
- property frequencies#
Compute RoPE frequencies using config’s get_basic_frequencies method.
- get_decoder() Module[source]#
Returns the decoder part of the model’s graph definition. For DeepseekV2Model, this is the model itself.