easydel.modules.qwen2_vl.modeling_qwen2_vl_flax#
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.PatchEmbed(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.PatchMerger(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLAttention(*args: Any, **kwargs: Any)[source]#
Bases:
FlaxAttentionModule
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLCausalLMOutputWithPast(loss: Optional[Union[Array, ndarray, bool, number]] = None, logits: Union[Array, ndarray, bool, number] = None, past_key_values: Optional[List[Union[Array, ndarray, bool, number]]] = None, hidden_states: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None, attentions: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None, rope_deltas: Optional[Union[Array, ndarray, bool, number]] = None)[source]#
Bases:
ModelOutputBase class for Qwen2VL causal language model (or autoregressive) outputs.
- replace(**kwargs)#
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLDecoderLayer(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLForConditionalGeneration(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule- config: tp.Union[EasyDeLBaseConfig, _CP]#
- dtype: jnp.dtype#
- loss_type = 'ForCausalLM'#
- param_dtype: jnp.dtype#
- precision: lax.PrecisionLike#
- prepare_inputs_for_call(image_grid_thw: Optional[Union[Array, ndarray, bool, number]] = None, video_grid_thw: Optional[Union[Array, ndarray, bool, number]] = None, image_max_grid_size: int = None, video_max_grid_size: int = None, drop_ids: bool = True, **others)[source]#
update inputs for calling model
- prepare_inputs_for_generation(input_ids, max_length, past_key_values=None, attention_mask=None, inputs_embeds=None, position_ids=None, pixel_values=None, pixel_values_videos=None, image_grid_thw=None, video_grid_thw=None, **kwargs)[source]#
The prepare_inputs_for_generation function is used to prepare the inputs for a generation task.
- Parameters
self – Access variables that belong to the class
input_ids – Pass in the input tokens
max_length – Set the length of the sequence to be generated
attention_mask – tp.Optional[chex.Array]: Mask the attention weights token_type_ids: tp.Optional[chex.Array]: TokenTypeIds
- Returns
A dictionary of the past_key_values, attention_mask and position ids
- rngs: nn.Rngs#
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLMLP(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLVisionBlock(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VisionTransformerPretrainedModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModule- config_class#
alias of
Qwen2VLVisionConfig
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.VisionAttention(*args: Any, **kwargs: Any)[source]#
Bases:
FlaxAttentionModule
- class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.VisionMlp(*args: Any, **kwargs: Any)[source]#
Bases:
Module
- easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.apply_rotary_pos_emb_vision(array: Union[Array, ndarray, bool, number], freqs: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#
- easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.create_attention_mask(cu_seqlens, seq_length, dtype)[source]#
Creates an attention mask matrix.
- Parameters
cu_seqlens – Cumulative sequence lengths.
seq_length – Length of each sequence.
dtype – Data type of the mask.
- Returns
Attention mask matrix.
- easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.get_rope_index(input_ids: ndarray, image_grid_thw: Optional[ndarray] = None, video_grid_thw: Optional[ndarray] = None, attention_mask: Optional[ndarray] = None, spatial_merge_size: int = 1, image_token_id: int = -1, video_token_id: int = -1, vision_start_token_id: int = -1) Tuple[ndarray, ndarray][source]#
Calculate the 3D rope index based on image and video’s temporal, height, and width in LLM.
- Parameters
input_ids (np.ndarray of shape (batch_size, sequence_length)) – Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.
image_grid_thw (np.ndarray of shape (num_images, 3), optional) – The temporal, height, and width of feature shape of each image in LLM.
video_grid_thw (np.ndarray of shape (num_videos, 3), optional) – The temporal, height, and width of feature shape of each video in LLM.
attention_mask (np.ndarray of shape (batch_size, sequence_length), optional) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.
spatial_merge_size (int) – The spatial merge size for vision embeddings.
image_token_id (int) – The token ID representing an image.
video_token_id (int) – The token ID representing a video.
vision_start_token_id (int) – The token ID representing the start of a vision sequence.
- Returns
position_ids (np.ndarray of shape (3, batch_size, sequence_length)) mrope_position_deltas (np.ndarray of shape (batch_size))
- easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.jax_scatter(sec_embeds, ids, fir_embeds, TKN_ID)[source]#