easydel.modules.falcon.modeling_falcon_flax#
- class easydel.modules.falcon.modeling_falcon_flax.FalconAttention(*args: Any, **kwargs: Any)[source]#
Bases:
AttentionModule
- class easydel.modules.falcon.modeling_falcon_flax.FalconBlock(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.falcon.modeling_falcon_flax.FalconForCausalLM(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleFalcon model with a language modeling head for causal language modeling tasks.
This model extends the base FalconModel by incorporating a linear language modeling head on top of the base model, designed for generative tasks and text generation. The model can use either alibi positional embeddings or rotary position embeddings (RoPE) based on configuration.
- class easydel.modules.falcon.modeling_falcon_flax.FalconMlp(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.falcon.modeling_falcon_flax.FalconModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule
- easydel.modules.falcon.modeling_falcon_flax.built_bloom_alibi(attention_mask, num_attention_heads)[source]#
The built_bloom_alibi function is used to create a bloom alibi for the attention mask. The bloom alibi is used in the Bloom Attention layer to ensure that each token has a unique attention vector, even if itโs masked out. This ensures that all tokens have an equal chance of being selected as the most important token in the sequence, which helps with training stability and performance.
- Parameters
attention_mask โ Mask out the padding tokens in the input sequence
num_attention_heads โ Determine the number of attention heads in the model
- Returns
A tensor of shape (batch_size, num_attention_heads, 1, sequence_length)
- easydel.modules.falcon.modeling_falcon_flax.dropout_add(nn_drop: Dropout, x: Union[Array, ndarray, bool, number], residual: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#
The dropout_add function is a helper function that adds the residual to the output of the dropout layer. This is necessary because we want to use deterministic=True when we are evaluating our model, but we still need to add in the residual. The reason for this is that during training, we have two paths through our network: one with dropout and one without. The path without dropout (residual) allows us to backpropagate gradients through both paths at once.
- Parameters
nn_drop โ nn.Dropout: Specify the dropout layer
x โ chex.Array: Pass in the input to the dropout layer
residual โ chex.Array: Add the residual to the output of dropout_add
deterministic โ bool: Determine whether the dropout layer is active or not
- Returns
A tensor that is the sum of the residual and a dropout layer