easydel.modules.falcon.modeling_falcon_flax

easydel.modules.falcon.modeling_falcon_flax#

class easydel.modules.falcon.modeling_falcon_flax.FalconAttention(*args: Any, **kwargs: Any)[source]#: Bases: FlaxAttentionModule

class easydel.modules.falcon.modeling_falcon_flax.FalconBlock(*args: Any, **kwargs: Any)[source]#: Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconForCausalLM(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

class easydel.modules.falcon.modeling_falcon_flax.FalconMlp(*args: Any, **kwargs: Any)[source]#: Bases: Module

class easydel.modules.falcon.modeling_falcon_flax.FalconModel(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

easydel.modules.falcon.modeling_falcon_flax.built_bloom_alibi(attention_mask, num_attention_heads)[source]#

The built_bloom_alibi function is used to create a bloom alibi for the attention mask. The bloom alibi is used in the Bloom Attention layer to ensure that each token has a unique attention vector, even if it’s masked out. This ensures that all tokens have an equal chance of being selected as the most important token in the sequence, which helps with training stability and performance.

Parameters

attention_mask – Mask out the padding tokens in the input sequence
num_attention_heads – Determine the number of attention heads in the model

Returns

A tensor of shape (batch_size, num_attention_heads, 1, sequence_length)

easydel.modules.falcon.modeling_falcon_flax.dropout_add(nn_drop: Dropout, x: Union[Array, ndarray, bool, number], residual: Union[Array, ndarray, bool, number]) → Union[Array, ndarray, bool, number][source]#

The dropout_add function is a helper function that adds the residual to the output of the dropout layer. This is necessary because we want to use deterministic=True when we are evaluating our model, but we still need to add in the residual. The reason for this is that during training, we have two paths through our network: one with dropout and one without. The path without dropout (residual) allows us to backpropagate gradients through both paths at once.

Parameters

nn_drop – nn.Dropout: Specify the dropout layer
x – chex.Array: Pass in the input to the dropout layer
residual – chex.Array: Add the residual to the output of dropout_add
deterministic – bool: Determine whether the dropout layer is active or not

Returns

A tensor that is the sum of the residual and a dropout layer

easydel.modules.falcon.modeling_falcon_flax

Contents

easydel.modules.falcon.modeling_falcon_flax#