easydel.layers.quantization.linear_nf4

Contents

easydel.layers.quantization.linear_nf4#

class easydel.layers.quantization.linear_nf4.LinearNF4(*args: Any, **kwargs: Any)[source]#

Bases: QauntModule

A 4-bit quantized version of the linear transformation using NF4 quantization.

classmethod from_linear(linear: Linear, rngs: Optional[Rngs] = None, block_size: int = 128, **kwargs) → LinearNF4[source]#

get_kernel()[source]#: Get the dequantized quant_kernel weights.

get_quantized_kernel()[source]#: Get the quantized quant_kernel weights and quant_scales.

static metadata()[source]#

static quantization_mapping()[source]#

to_linear(rngs: Optional[Rngs] = None) → Linear[source]#

easydel.layers.quantization.linear_nf4.dequantize_nf4(packed_values, absmax, block_size)[source]#

easydel.layers.quantization.linear_nf4.quantize_and_pack_nf4(blocks, block_size=64)[source]#

easydel.layers.quantization.linear_nf4.single_dequantize_nf4(packed_values, absmax, block_size)[source]#: Optimized dequantization combining unpacking and scaling in fewer operations.

easydel.layers.quantization.linear_nf4.single_quantize_and_pack_nf4(blocks, block_size=64)[source]#: Combined quantization and packing for better performance. Handles normalization, quantization, and packing in a single operation.