easydel.modules.falcon.modeling_falcon_flax#

class easydel.modules.falcon.modeling_falcon_flax.FalconAttention(*args: Any, **kwargs: Any)[source]#

Bases: AttentionModule

class easydel.modules.falcon.modeling_falcon_flax.FalconBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Falcon model with a language modeling head for causal language modeling tasks.

This model extends the base FalconModel by incorporating a linear language modeling head on top of the base model, designed for generative tasks and text generation. The model can use either alibi positional embeddings or rotary position embeddings (RoPE) based on configuration.

class easydel.modules.falcon.modeling_falcon_flax.FalconMlp(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

easydel.modules.falcon.modeling_falcon_flax.built_bloom_alibi(attention_mask, num_attention_heads)[source]#

The built_bloom_alibi function is used to create a bloom alibi for the attention mask. The bloom alibi is used in the Bloom Attention layer to ensure that each token has a unique attention vector, even if itโ€™s masked out. This ensures that all tokens have an equal chance of being selected as the most important token in the sequence, which helps with training stability and performance.

Parameters
  • attention_mask โ€“ Mask out the padding tokens in the input sequence

  • num_attention_heads โ€“ Determine the number of attention heads in the model

Returns

A tensor of shape (batch_size, num_attention_heads, 1, sequence_length)

easydel.modules.falcon.modeling_falcon_flax.dropout_add(nn_drop: Dropout, x: Union[Array, ndarray, bool, number], residual: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#

The dropout_add function is a helper function that adds the residual to the output of the dropout layer. This is necessary because we want to use deterministic=True when we are evaluating our model, but we still need to add in the residual. The reason for this is that during training, we have two paths through our network: one with dropout and one without. The path without dropout (residual) allows us to backpropagate gradients through both paths at once.

Parameters
  • nn_drop โ€“ nn.Dropout: Specify the dropout layer

  • x โ€“ chex.Array: Pass in the input to the dropout layer

  • residual โ€“ chex.Array: Add the residual to the output of dropout_add

  • deterministic โ€“ bool: Determine whether the dropout layer is active or not

Returns

A tensor that is the sum of the residual and a dropout layer