easydel.modules.siglip.modeling_siglip#
- class easydel.modules.siglip.modeling_siglip.MultiheadAttention(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleSimple multi-head attention used by the vision pooling head.
- class easydel.modules.siglip.modeling_siglip.SiglipAttention(*args: Any, **kwargs: Any)[source]#
Bases:
AttentionModuleMulti-head self-attention module used across SigLIP encoders.
- class easydel.modules.siglip.modeling_siglip.SiglipEncoder(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleStack of SigLIP encoder layers with optional attention and hidden state capture.
- class easydel.modules.siglip.modeling_siglip.SiglipEncoderLayer(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleTransformer encoder block with pre-norm attention and MLP.
- class easydel.modules.siglip.modeling_siglip.SiglipForImageClassification(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleImage-classification head on top of the SigLIP vision encoder.
- class easydel.modules.siglip.modeling_siglip.SiglipMLP(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleTwo-layer feed-forward network for SigLIP transformer blocks.
- class easydel.modules.siglip.modeling_siglip.SiglipModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleFull SigLIP contrastive model combining text and vision towers.
- get_decoder()[source]#
Returns the decoder part of the model’s graph definition. The text model acts as the decoder in this multi-modal setup.
- get_encoder()[source]#
Returns the encoder part of the model’s graph definition. The vision tower acts as the encoder in this multi-modal setup.
- get_image_features(pixel_values: Optional[Union[Array, ndarray, bool, number]] = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, interpolate_pos_encoding: bool = False) Union[Array, ndarray, bool, number][source]#
- get_lm_head()[source]#
Returns the language model head of the module. This model does not have a traditional language model head, but a projection head.
- get_text_features(input_ids: jaxtyping.Int[Array, 'batch seq_len'] | None = None, attention_mask: jaxtyping.Bool[Array, 'batch seq_len'] | None = None, mask_info: ejkernel.types.mask.MaskInfo | None = None, position_ids: jaxtyping.Int[Array, 'batch seq_len'] | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None) Union[Array, ndarray, bool, number][source]#
- class easydel.modules.siglip.modeling_siglip.SiglipMultiheadAttentionPoolingHead(*args: Any, **kwargs: Any)[source]#
Bases:
ModulePools vision tokens with a learned probe followed by MLP refinement.
- class easydel.modules.siglip.modeling_siglip.SiglipOutput(loss: Optional[Union[Array, ndarray, bool, number]] = None, logits_per_image: Union[Array, ndarray, bool, number] = None, logits_per_text: Union[Array, ndarray, bool, number] = None, text_embeds: Union[Array, ndarray, bool, number] = None, image_embeds: Union[Array, ndarray, bool, number] = None, text_model_output: BaseModelOutputWithPooling = None, vision_model_output: BaseModelOutputWithPooling = None)[source]#
Bases:
ModelOutputContrastive SigLIP output bundling text/vision logits and embeddings.
- classmethod from_dict(data: dict[str, Any]) T#
Deserializes a dictionary into a PyTree object.
- classmethod from_json(json_str: str) T#
Deserializes a JSON string into a PyTree object.
- replace(**kwargs)#
Creates a new instance with specified fields replaced.
- text_model_output: BaseModelOutputWithPooling = None#
- to_dict() dict[str, Any]#
Serializes the PyTree object to a dictionary.
- to_json(**kwargs) str#
Serializes the PyTree object to a JSON string.
- to_tuple() tuple[Any][source]#
Convert self to a tuple containing all the attributes/keys that are not None.
- vision_model_output: BaseModelOutputWithPooling = None#
- class easydel.modules.siglip.modeling_siglip.SiglipTextEmbeddings(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleToken and position embeddings for the SigLIP text encoder.
- class easydel.modules.siglip.modeling_siglip.SiglipTextModel(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModulePublic text-only SigLIP model wrapper exposing the transformer backbone.
- class easydel.modules.siglip.modeling_siglip.SiglipTextModelOutput(text_embeds: Optional[Union[Array, ndarray, bool, number]] = None, last_hidden_state: Union[Array, ndarray, bool, number] = None, hidden_states: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None, attentions: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None)[source]#
Bases:
ModelOutputOutputs from the SigLIP text encoder with optional attentions.
- attentions: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None#
- classmethod from_dict(data: dict[str, Any]) T#
Deserializes a dictionary into a PyTree object.
- classmethod from_json(json_str: str) T#
Deserializes a JSON string into a PyTree object.
- replace(**kwargs)#
Creates a new instance with specified fields replaced.
- to_dict() dict[str, Any]#
Serializes the PyTree object to a dictionary.
- to_json(**kwargs) str#
Serializes the PyTree object to a JSON string.
- class easydel.modules.siglip.modeling_siglip.SiglipTextTransformer(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleText-side transformer backbone providing embeddings, encoder, and projection head.
- class easydel.modules.siglip.modeling_siglip.SiglipVisionEmbeddings(*args: Any, **kwargs: Any)[source]#
Bases:
ModulePatch projection and positional encoding for the SigLIP vision encoder.
- class easydel.modules.siglip.modeling_siglip.SiglipVisionModel(*args: Any, **kwargs: Any)[source]#
Bases:
ModuleConvenience wrapper around the SigLIP vision transformer backbone.
- class easydel.modules.siglip.modeling_siglip.SiglipVisionModelOutput(image_embeds: Optional[Union[Array, ndarray, bool, number]] = None, last_hidden_state: Union[Array, ndarray, bool, number] = None, hidden_states: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None, attentions: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None)[source]#
Bases:
ModelOutputOutputs from the SigLIP vision tower including pooled embeddings.
- attentions: tuple[Union[jax.Array, numpy.ndarray, numpy.bool, numpy.number], ...] | None = None#
- classmethod from_dict(data: dict[str, Any]) T#
Deserializes a dictionary into a PyTree object.
- classmethod from_json(json_str: str) T#
Deserializes a JSON string into a PyTree object.
- replace(**kwargs)#
Creates a new instance with specified fields replaced.
- to_dict() dict[str, Any]#
Serializes the PyTree object to a dictionary.
- to_json(**kwargs) str#
Serializes the PyTree object to a JSON string.
- class easydel.modules.siglip.modeling_siglip.SiglipVisionTransformer(*args: Any, **kwargs: Any)[source]#
Bases:
EasyDeLBaseModuleVision-side transformer encoder used by SigLIP for patch representations.