easydel.modules.falcon.modeling_falcon#

class easydel.modules.falcon.modeling_falcon.FalconAttention(*args: Any, **kwargs: Any)[source]#

Bases: UnifiedAttention

Falcon attention built on top of the unified attention backend.

projection_mapping: ClassVar = {'key_projection': 'k_proj', 'mla_kv_a_layernorm': 'kv_a_layernorm', 'mla_kv_a_proj_with_mqa': 'kv_a_proj_with_mqa', 'mla_kv_b_proj': 'kv_b_proj', 'mla_q_a_layernorm': 'q_a_layernorm', 'mla_q_a_proj': 'q_a_proj', 'mla_q_b_proj': 'q_b_proj', 'mla_q_proj': 'q_proj', 'output_projection': 'dense', 'query_key_value_projection': 'query_key_value', 'query_projection': 'q_proj', 'value_projection': 'v_proj'}#
class easydel.modules.falcon.modeling_falcon.FalconBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

Single Falcon transformer block with attention and MLP.

class easydel.modules.falcon.modeling_falcon.FalconForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: BaseCausalLMModule[FalconModel, FalconConfig]

Falcon model with a language modeling head for causal language modeling tasks.

class easydel.modules.falcon.modeling_falcon.FalconMlp(*args: Any, **kwargs: Any)[source]#

Bases: Module

Gated feed-forward network for Falcon decoder blocks.

class easydel.modules.falcon.modeling_falcon.FalconModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

Falcon decoder-only transformer with embeddings, blocks, and final norm.

get_decoder()[source]#

Returns the decoder part of the model’s graph definition.

get_embedding()[source]#

Returns the embedding layer of the module.

get_encoder()[source]#

Returns the encoder part of the model’s graph definition. Decoder-Only models don’t have an encoder.

get_lm_head()[source]#

Returns the language model head of the module. Base Models don’t have a Language Model Head.

easydel.modules.falcon.modeling_falcon.built_bloom_alibi(attention_mask, num_attention_heads)[source]#

The built_bloom_alibi function is used to create a bloom alibi for the attention mask. The bloom alibi is used in the Bloom Attention layer to ensure that each token has a unique attention vector, even if it’s masked out. This ensures that all tokens have an equal chance of being selected as the most important token in the sequence, which helps with training stability and performance.

Parameters
  • attention_mask – Mask out the padding tokens in the input sequence

  • num_attention_heads – Determine the number of attention heads in the model

Returns

A tensor of shape (batch_size, num_attention_heads, 1, sequence_length)

easydel.modules.falcon.modeling_falcon.dropout_add(nn_drop: Dropout, x: Union[Array, ndarray, bool, number], residual: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#

The dropout_add function is a helper function that adds the residual to the output of the dropout layer. This is necessary because we want to use deterministic=True when we are evaluating our model, but we still need to add in the residual. The reason for this is that during training, we have two paths through our network: one with dropout and one without. The path without dropout (residual) allows us to backpropagate gradients through both paths at once.

Parameters
  • nn_drop – nn.Dropout: Specify the dropout layer

  • x – chex.Array: Pass in the input to the dropout layer

  • residual – chex.Array: Add the residual to the output of dropout_add

  • deterministic – bool: Determine whether the dropout layer is active or not

Returns

A tensor that is the sum of the residual and a dropout layer