easydel.modules.qwen2_vl.modeling_qwen2_vl_flax#

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.PatchEmbed(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.PatchMerger(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLAttention(*args: Any, **kwargs: Any)[source]#

Bases: FlaxAttentionModule

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLCausalLMOutputWithPast(loss: Optional[Union[Array, ndarray, bool, number]] = None, logits: Union[Array, ndarray, bool, number] = None, past_key_values: Optional[List[Union[Array, ndarray, bool, number]]] = None, hidden_states: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None, attentions: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None, rope_deltas: Optional[Union[Array, ndarray, bool, number]] = None)[source]#

Bases: ModelOutput

Base class for Qwen2VL causal language model (or autoregressive) outputs.

attentions: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None#
hidden_states: Optional[Tuple[Union[Array, ndarray, bool, number]]] = None#
logits: Union[Array, ndarray, bool, number] = None#
loss: Optional[Union[Array, ndarray, bool, number]] = None#
past_key_values: Optional[List[Union[Array, ndarray, bool, number]]] = None#
replace(**updates)#

Returns a new object replacing the specified fields with new values.

rope_deltas: Optional[Union[Array, ndarray, bool, number]] = None#
class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLDecoderLayer(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLForConditionalGeneration(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

get_decoder()[source]#
get_input_embeddings()[source]#
get_output_embeddings()[source]#
get_static_arguments()[source]#

return static arguments kwargs for jax.jit

prepare_inputs_for_call(image_grid_thw: Optional[Union[Array, ndarray, bool, number]] = None, video_grid_thw: Optional[Union[Array, ndarray, bool, number]] = None, image_max_grid_size: int = None, video_max_grid_size: int = None, drop_ids: bool = True, **others)[source]#

update inputs for calling model

prepare_inputs_for_generation(input_ids, max_length, past_key_values=None, attention_mask=None, inputs_embeds=None, position_ids=None, pixel_values=None, pixel_values_videos=None, image_grid_thw=None, video_grid_thw=None, **kwargs)[source]#

The prepare_inputs_for_generation function is used to prepare the inputs for a generation task.

Parameters
  • self – Access variables that belong to the class

  • input_ids – Pass in the input tokens

  • max_length – Set the length of the sequence to be generated

  • attention_mask – tp.Optional[chex.Array]: Mask the attention weights

Returns

A dictionary of the past_key_values, attention_mask and position ids

update_inputs_for_generation(model_outputs, model_kwargs)[source]#
class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLMLP(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VLVisionBlock(*args: Any, **kwargs: Any)[source]#

Bases: Module

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.Qwen2VisionTransformerPretrainedModel(*args: Any, **kwargs: Any)[source]#

Bases: EasyDeLBaseModule

config_class#

alias of Qwen2VLVisionConfig

get_dtype() dtype[source]#
rot_pos_emb(grid_thw, max_grid_size)[source]#
class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.VisionAttention(*args: Any, **kwargs: Any)[source]#

Bases: FlaxAttentionModule

class easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.VisionMlp(*args: Any, **kwargs: Any)[source]#

Bases: Module

easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.apply_rotary_pos_emb_vision(array: Union[Array, ndarray, bool, number], freqs: Union[Array, ndarray, bool, number]) Union[Array, ndarray, bool, number][source]#
easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.create_attention_mask(cu_seqlens, seq_length, dtype)[source]#

Creates an attention mask matrix.

Parameters
  • cu_seqlens – Cumulative sequence lengths.

  • seq_length – Length of each sequence.

  • dtype – Data type of the mask.

Returns

Attention mask matrix.

easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.get_rope_index(input_ids: ndarray, image_grid_thw: Optional[ndarray] = None, video_grid_thw: Optional[ndarray] = None, attention_mask: Optional[ndarray] = None, spatial_merge_size: int = 1, image_token_id: int = -1, video_token_id: int = -1, vision_start_token_id: int = -1) Tuple[ndarray, ndarray][source]#

Calculate the 3D rope index based on image and video’s temporal, height, and width in LLM.

Parameters
  • input_ids (np.ndarray of shape (batch_size, sequence_length)) – Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

  • image_grid_thw (np.ndarray of shape (num_images, 3), optional) – The temporal, height, and width of feature shape of each image in LLM.

  • video_grid_thw (np.ndarray of shape (num_videos, 3), optional) – The temporal, height, and width of feature shape of each video in LLM.

  • attention_mask (np.ndarray of shape (batch_size, sequence_length), optional) – Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: - 1 for tokens that are not masked, - 0 for tokens that are masked.

  • spatial_merge_size (int) – The spatial merge size for vision embeddings.

  • image_token_id (int) – The token ID representing an image.

  • video_token_id (int) – The token ID representing a video.

  • vision_start_token_id (int) – The token ID representing the start of a vision sequence.

Returns

position_ids (np.ndarray of shape (3, batch_size, sequence_length)) mrope_position_deltas (np.ndarray of shape (batch_size))

easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.jax_scatter(sec_embeds, ids, fir_embeds, TKN_ID)[source]#
easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.precompute_vl_rotary(dim, theta, max_position)[source]#
easydel.modules.qwen2_vl.modeling_qwen2_vl_flax.rotate_half(x)[source]#

Rotates half the hidden dims of the input.