easydel.layers.quantization.quantizers#
- class easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig(dtype: eformer.ops.quantization._config.QuantizationType | str = QuantizationType.NF4, block_size: int = 64, simulate: bool = False, use_kernel: bool = True, pattern: str = '^(?!.*(?:embedding|norm)).*$')[source]#
Bases:
QuantizationConfigExtended quantization config with pattern support for layer selection.
This config extends eformer’s QuantizationConfig with an additional pattern field for selecting which layers to quantize.
- dtype#
The quantization type (NF4, INT8, TERNARY, BINARY).
- Type
eformer.ops.quantization._config.QuantizationType | str
- block_size#
Block size for block-wise quantization.
- Type
int
- simulate#
If True, uses STE without actual bit packing (QAT mode).
- Type
bool
- use_kernel#
If True, uses optimized TPU/GPU kernels when available.
- Type
bool
- pattern#
Regex pattern for selecting layers to quantize. Default excludes embedding and norm layers.
- Type
str
Example
>>> from easydel.layers.quantization import EasyDeLQuantizationConfig, QuantizationType >>> config = EasyDeLQuantizationConfig( ... dtype=QuantizationType.NF4, ... block_size=64, ... pattern=r".*proj.*" # Only quantize projection layers ... )
- pattern: str = '^(?!.*(?:embedding|norm)).*$'#
- class easydel.layers.quantization.quantizers.EasyQuantizer(quantization_config: easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig | eformer.ops.quantization._config.QuantizationConfig | None = None)[source]#
Bases:
objectUnified quantization interface for EasyDeL models.
Uses eformer’s quantization infrastructure for efficient quantization. Supports NF4 (4-bit) and INT8 (8-bit) quantization methods.
- Parameters
quantization_config – The quantization configuration. Pass None to disable quantization.
Example
>>> from easydel.layers.quantization import EasyQuantizer, EasyDeLQuantizationConfig, QuantizationType >>> config = EasyDeLQuantizationConfig(dtype=QuantizationType.NF4, block_size=64) >>> quantizer = EasyQuantizer(quantization_config=config) >>> quantized_model = quantizer.quantize_linears(model)
- property config: easydel.layers.quantization.quantizers.EasyDeLQuantizationConfig | eformer.ops.quantization._config.QuantizationConfig | None#
Get the quantization configuration.
- property pattern: str#
Get the quantization pattern.
- quantize_array(array: Array, simulate: bool = False) Array[source]#
Quantize an array using eformer’s unified quantize function.
- Parameters
array – The array to quantize.
simulate – If True, uses STE simulation mode (materializes immediately).
- Returns
Quantized array.
- quantize_linears(model: Module, /, *, quantization_pattern: str | None = None, verbose: bool = True) Module[source]#
Quantize linear layers in a model to the configured precision.
- Parameters
model – The model to quantize.
quantization_pattern – Regex pattern for layers to be quantized. Overrides the pattern in config if provided.
verbose – Whether to use tqdm for progress logging.
- Returns
Model with quantized linear layers.