easydel.layers.quantization.quantizers#

class easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig(dtype: eformer.ops.quantization._config.QuantizationType | str = QuantizationType.NF4, block_size: int = 64, simulate: bool = False, use_kernel: bool = True, pattern: str = '^(?!.*(?:embedding|norm)).*$')[source]#

Bases: QuantizationConfig

Extended quantization config with pattern support for layer selection.

This config extends eformer’s QuantizationConfig with an additional pattern field for selecting which layers to quantize.

dtype#

The quantization type (NF4, INT8, TERNARY, BINARY).

Type

eformer.ops.quantization._config.QuantizationType | str

block_size#

Block size for block-wise quantization.

Type

int

simulate#

If True, uses STE without actual bit packing (QAT mode).

Type

bool

use_kernel#

If True, uses optimized TPU/GPU kernels when available.

Type

bool

pattern#

Regex pattern for selecting layers to quantize. Default excludes embedding and norm layers.

Type

str

Example

>>> from easydel.layers.quantization import EasyDeLQuantizationConfig, QuantizationType
>>> config = EasyDeLQuantizationConfig(
...     dtype=QuantizationType.NF4,
...     block_size=64,
...     pattern=r".*proj.*"  # Only quantize projection layers
... )
pattern: str = '^(?!.*(?:embedding|norm)).*$'#
class easydel.layers.quantization.quantizers.EasyQuantizer(quantization_config: easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig | eformer.ops.quantization._config.QuantizationConfig | None = None)[source]#

Bases: object

Unified quantization interface for EasyDeL models.

Uses eformer’s quantization infrastructure for efficient quantization. Supports NF4 (4-bit) and INT8 (8-bit) quantization methods.

Parameters

quantization_config – The quantization configuration. Pass None to disable quantization.

Example

>>> from easydel.layers.quantization import EasyQuantizer, EasyDeLQuantizationConfig, QuantizationType
>>> config = EasyDeLQuantizationConfig(dtype=QuantizationType.NF4, block_size=64)
>>> quantizer = EasyQuantizer(quantization_config=config)
>>> quantized_model = quantizer.quantize_linears(model)
property config: easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig | eformer.ops.quantization._config.QuantizationConfig | None#

Get the quantization configuration.

property pattern: str#

Get the quantization pattern.

quantize_array(array: Array, simulate: bool = False) Array[source]#

Quantize an array using eformer’s unified quantize function.

Parameters
  • array – The array to quantize.

  • simulate – If True, uses STE simulation mode (materializes immediately).

Returns

Quantized array.

quantize_linears(model: Module, /, *, quantization_pattern: str | None = None, verbose: bool = True) Module[source]#

Quantize linear layers in a model to the configured precision.

Parameters
  • model – The model to quantize.

  • quantization_pattern – Regex pattern for layers to be quantized. Overrides the pattern in config if provided.

  • verbose – Whether to use tqdm for progress logging.

Returns

Model with quantized linear layers.