easydel.modules.roberta.__init__#

class easydel.modules.roberta.__init__.RobertaConfig(vocab_size=50265, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=514, type_vocab_size=1, initializer_range=0.02, layer_norm_eps=1e-05, pad_token_id=1, bos_token_id=0, eos_token_id=2, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, gradient_checkpointing='nothing_saveable', **kwargs)[source]#

Bases: EasyDeLBaseConfig

Configuration objects inherit from [EasyDeLBaseConfig] and can be used to control the model outputs. Read the documentation from [EasyDeLBaseConfig] for more information. :param vocab_size: Vocabulary size of the RoBERTa model. Defines the number of different tokens that can be represented by

the inputs_ids passed when calling RobertaModel.

Parameters
  • hidden_size (int, optional, defaults to 768) โ€“ Dimensionality of the encoder layers and the pooler layer.

  • num_hidden_layers (int, optional, defaults to 12) โ€“ Number of hidden layers in the Transformer encoder.

  • num_attention_heads (int, optional, defaults to 12) โ€“ Number of attention heads for each attention layer in the Transformer encoder.

  • intermediate_size (int, optional, defaults to 3072) โ€“ Dimensionality of the โ€œintermediateโ€ (i.e., feed-forward) layer in the Transformer encoder.

  • hidden_act (str or function, optional, defaults to "gelu") โ€“ The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "swish" and "gelu_new" are supported.

  • hidden_dropout_prob (float, optional, defaults to 0.1) โ€“ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

  • attention_probs_dropout_prob (float, optional, defaults to 0.1) โ€“ The dropout ratio for the attention probabilities.

  • max_position_embeddings (int, optional, defaults to 514) โ€“ The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

  • type_vocab_size (int, optional, defaults to 1) โ€“ The vocabulary size of the token_type_ids passed when calling RobertaModel.

  • initializer_range (float, optional, defaults to 0.02) โ€“ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

  • layer_norm_eps (float, optional, defaults to 1e-5) โ€“ The epsilon used by the layer normalization layers.

  • position_embedding_type (str, optional, defaults to "absolute") โ€“ Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). For more information on "relative_key_query", please refer to Method 4 in [Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).

  • use_cache (bool, optional, defaults to True) โ€“ Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.

  • classifier_dropout (float, optional) โ€“ The dropout ratio for the classification head.

  • gradient_checkpointing (str, optional, defaults to "nothing_saveable") โ€“ What to save during gradient checkpointing. Choose one of "nothing_saveable", "first_half_saveable", "full_saveable".

attach_custom_arguments(gradient_checkpointing='nothing_saveable', **kwargs)[source]#
get_partition_rules(fully_sharded_data_parallel: bool = True)[source]#

Get the partition rules for the model.

Parameters

fully_sharded_data_parallel (bool, optional, defaults to True) โ€“ Whether to use fully sharded data parallelism.

Returns

The partition rules.

Return type

tp.Tuple[tp.Tuple[str, PartitionSpec]]

model_type: str = 'roberta'#
class easydel.modules.roberta.__init__.RobertaForCausalLM(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForMultipleChoice(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForQuestionAnswering(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForSequenceClassification(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForTokenClassification(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule