easydel.modules.falcon.modeling_falcon_flax#

class easydel.modules.falcon.modeling_falcon_flax.FalconAttention(*args: Any, **kwargs: Any)[source]#

Bases: FlaxAttentionModule

class easydel.modules.falcon.modeling_falcon_flax.FalconBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.falcon.modeling_falcon_flax.FalconMlp(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

easydel.modules.falcon.modeling_falcon_flax.built_bloom_alibi(attention_mask, num_attention_heads)[source]#

The built_bloom_alibi function is used to create a bloom alibi for the attention mask. The bloom alibi is used in the Bloom Attention layer to ensure that each token has a unique attention vector, even if itโ€™s masked out. This ensures that all tokens have an equal chance of being selected as the most important token in the sequence, which helps with training stability and performance.

Parameters
  • attention_mask โ€“ Mask out the padding tokens in the input sequence

  • num_attention_heads โ€“ Determine the number of attention heads in the model

Returns

A tensor of shape (batch_size, num_attention_heads, 1, sequence_length)

easydel.modules.falcon.modeling_falcon_flax.dropout_add(nn_drop: Dropout, x: Union[Array, ndarray, bool, number], residual: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#

The dropout_add function is a helper function that adds the residual to the output of the dropout layer. This is necessary because we want to use deterministic=True when we are evaluating our model, but we still need to add in the residual. The reason for this is that during training, we have two paths through our network: one with dropout and one without. The path without dropout (residual) allows us to backpropagate gradients through both paths at once.

Parameters
  • nn_drop โ€“ nn.Dropout: Specify the dropout layer

  • x โ€“ chex.Array: Pass in the input to the dropout layer

  • residual โ€“ chex.Array: Add the residual to the output of dropout_add

  • deterministic โ€“ bool: Determine whether the dropout layer is active or not

Returns

A tensor that is the sum of the residual and a dropout layer