easydel.modules.aya_vision.__init__#
- class easydel.modules.aya_vision.__init__.AyaVisionConfig(vision_config=None, text_config=None, vision_feature_select_strategy='full', vision_feature_layer=-1, downsample_factor=2, adapter_layer_norm_eps=1e-06, image_token_index=255036, **kwargs)[source]#
Bases:
EasyDeLBaseConfigThis is the configuration class to store the configuration of a [AyaVisionForConditionalGeneration]. It is used to instantiate an AyaVision model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of AyaVision. e.g. [CohereForAI/aya-vision-8b](https://huggingface.co/CohereForAI/aya-vision-8b)
Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.
- Parameters
vision_config (Union[AutoConfig, dict], optional, defaults to CLIPVisionConfig) – The config object or dictionary of the vision backbone.
text_config (Union[AutoConfig, dict], optional, defaults to LlamaConfig) – The config object or dictionary of the text backbone.
vision_feature_select_strategy (str, optional, defaults to “full”) – The feature selection strategy used to select the vision feature from the vision backbone. Can be one of “default” or “full”. If “default”, the CLS token is removed from the vision features. If “full”, the full vision features are used.
vision_feature_layer (int, optional, defaults to -1) – The index of the layer to select the vision feature.
downsample_factor (int, optional, defaults to 2) – The downsample factor to apply to the vision features.
adapter_layer_norm_eps (float, optional, defaults to 1e-06) – The epsilon value used for layer normalization in the adapter.
image_token_index (int, optional, defaults to 255036) – The image token index to encode the image prompt.
- get_partition_rules(*args, **kwargs)[source]#
Get the partition rules for the model. :returns: The partition rules. :rtype: tp.Tuple[tp.Tuple[str, PartitionSpec]]
- model_type: str = 'aya_vision'#
- sub_configs: Dict[str, 'PretrainedConfig'] = {'text_config': <class 'easydel.modules.auto.auto_configuration.AutoEasyDeLConfig'>, 'vision_config': <class 'easydel.modules.auto.auto_configuration.AutoEasyDeLConfig'>}#
- class easydel.modules.aya_vision.__init__.AyaVisionForConditionalGeneration(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule- get_image_features(pixel_values: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#
- loss_type = 'ForCausalLM'#
- prepare_inputs_for_generation(input_ids: Union[Array, ndarray, bool, number], max_length: int, pixel_values: Optional[Union[Array, ndarray, bool, number]] = None, attention_mask: Optional[Union[Array, ndarray, bool, number]] = None)[source]#
The prepare_inputs_for_generation function is used to prepare the inputs for a generation task.
- Parameters
self – Access variables that belong to the class
input_ids – Pass in the input tokens
max_length – Set the length of the sequence to be generated
attention_mask – tp.Optional[chex.Array]: Mask the attention weights token_type_ids: tp.Optional[chex.Array]: TokenTypeIds
- Returns
A dictionary of the past_key_values, attention_mask and position ids