easydel.modules.deepseek_v3.modeling_deepseek#

class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3Attention(*args: Any, **kwargs: Any)[source]#

Bases: UnifiedAttention

DeepSeek V3 Multi-head Latent Attention.

Inherits MLA implementation from UnifiedAttention base class.

define_network(config: DeepseekV3Config, dtype: dtype, param_dtype: dtype, precision: Precision, rngs: Rngs)[source]#

Define MLA-specific network structure.

projection_mapping: ClassVar[dict[str, str]] = {'mla_kv_a_layernorm': 'kv_a_layernorm', 'mla_kv_a_proj_with_mqa': 'kv_a_proj_with_mqa', 'mla_kv_b_proj': 'kv_b_proj', 'mla_q_a_layernorm': 'q_a_layernorm', 'mla_q_a_proj': 'q_a_proj', 'mla_q_b_proj': 'q_b_proj', 'mla_q_proj': 'q_proj', 'output_projection': 'o_proj'}#
class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3DecoderLayer(*args: Any, **kwargs: Any)[source]#

Bases: Module

Single DeepSeek V3 transformer block with MLA attention and optional MoE MLP.

class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3ForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: BaseCausalLMModule[DeepseekV3Model, DeepseekV3Config]

DeepseekV3 model with a language modeling head for causal language modeling tasks.

This model extends the base DeepseekV3Model by adding a linear language modeling head on top of the transformer model. It incorporates Mixture of Experts (MoE) architecture and is designed for generative tasks and text generation.

class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MLP(*args: Any, **kwargs: Any)[source]#

Bases: Module

Standard DeepSeek V3 feed-forward network used in dense decoder layers.

class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MLPMoE(*args: Any, **kwargs: Any)[source]#

Bases: Module

Mixture-of-experts feed-forward module parameterized by the DeepSeek V3 config.

reform_param: ClassVar = {'down_proj$': {'inverse_spliter': <function DeepseekV3MLPMoE.<lambda>>, 'splits': [{'name': 'down_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}]}, 'gate_up_proj$': {'inverse_spliter': <function DeepseekV3MLPMoE.<lambda>>, 'splits': [{'name': 'gate_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}, {'name': 'up_proj.kernel', 'spliter': <function DeepseekV3MLPMoE.<lambda>>}]}}#
class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3MoE(*args: Any, **kwargs: Any)[source]#

Bases: BaseMoeModule

Wraps gating and expert networks to apply DeepSeek V3 MoE feed-forward processing.

class easydel.modules.deepseek_v3.modeling_deepseek.DeepseekV3Model(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Full DeepSeek V3 decoder-only transformer composed of MLA blocks and MoE feed-forward layers.

property frequencies#

Compute RoPE frequencies using config’s get_basic_frequencies method.

get_decoder()[source]#

Returns the decoder part of the model’s graph definition.

get_embedding()[source]#

Returns the embedding layer of the module.

get_encoder()[source]#

Returns the encoder part of the model’s graph definition. Decoder-Only models don’t have an encoder.

get_lm_head()[source]#

Returns the language model head of the module. Base Models don’t have a Language Model Head.

class easydel.modules.deepseek_v3.modeling_deepseek.MoEGate(*args: Any, **kwargs: Any)[source]#

Bases: Module

Top-k routing gate that scores tokens for the mixture-of-experts blocks.