easydel.modules.roberta.init

easydel.modules.roberta.init#

class easydel.modules.roberta.__init__.RobertaConfig(vocab_size=50265, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=514, type_vocab_size=1, initializer_range=0.02, layer_norm_eps=1e-05, pad_token_id=1, bos_token_id=0, eos_token_id=2, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, gradient_checkpointing='nothing_saveable', **kwargs)[source]#

Bases: EasyDeLBaseConfig

Configuration objects inherit from [EasyDeLBaseConfig] and can be used to control the model outputs. Read the documentation from [EasyDeLBaseConfig] for more information. :param vocab_size: Vocabulary size of the RoBERTa model. Defines the number of different tokens that can be represented by

the inputs_ids passed when calling RobertaModel.

Parameters

hidden_size (int, optional, defaults to 768) – Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (int, optional, defaults to 12) – Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) – Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (int, optional, defaults to 3072) – Dimensionality of the “intermediate” (i.e., feed-forward) layer in the Transformer encoder.
hidden_act (str or function, optional, defaults to "gelu") – The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "swish" and "gelu_new" are supported.
hidden_dropout_prob (float, optional, defaults to 0.1) – The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (float, optional, defaults to 0.1) – The dropout ratio for the attention probabilities.
max_position_embeddings (int, optional, defaults to 514) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (int, optional, defaults to 1) – The vocabulary size of the token_type_ids passed when calling RobertaModel.
initializer_range (float, optional, defaults to 0.02) – The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (float, optional, defaults to 1e-5) – The epsilon used by the layer normalization layers.
position_embedding_type (str, optional, defaults to "absolute") – Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). For more information on "relative_key_query", please refer to Method 4 in [Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
use_cache (bool, optional, defaults to True) – Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True.
classifier_dropout (float, optional) – The dropout ratio for the classification head.
gradient_checkpointing (str, optional, defaults to "nothing_saveable") – What to save during gradient checkpointing. Choose one of "nothing_saveable", "first_half_saveable", "full_saveable".

attach_custom_arguments(gradient_checkpointing='nothing_saveable', **kwargs)[source]#

get_partition_rules(fully_sharded_data_parallel: bool = True)[source]#

Get the partition rules for the model.

Parameters: fully_sharded_data_parallel (bool, optional, defaults to True) – Whether to use fully sharded data parallelism.
Returns: The partition rules.
Return type: tp.Tuple[tp.Tuple[str, PartitionSpec]]

model_type: str = 'roberta'#

class easydel.modules.roberta.__init__.RobertaForCausalLM(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForMultipleChoice(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForQuestionAnswering(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForSequenceClassification(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

class easydel.modules.roberta.__init__.RobertaForTokenClassification(*args: Any, **kwargs: Any)[source]#: Bases: EasyDeLBaseModule

easydel.modules.roberta.__init__

Contents

easydel.modules.roberta.__init__#

easydel.modules.roberta.init

easydel.modules.roberta.init#