easydel.layers.quantization.linear_nf4#

class easydel.layers.quantization.linear_nf4.LinearNF4(*args: Any, **kwargs: Any)[source]#

Bases: QauntModule

A 4-bit quantized version of the linear transformation using NF4 quantization.

classmethod from_linear(linear: Linear, rngs: Optional[Rngs] = None, block_size: int = 128, **kwargs) LinearNF4[source]#
get_kernel()[source]#

Get the dequantized quant_kernel weights.

get_quantized_kernel()[source]#

Get the quantized quant_kernel weights and quant_scales.

static metadata()[source]#
static quantization_mapping()[source]#
to_linear(rngs: Optional[Rngs] = None) Linear[source]#
easydel.layers.quantization.linear_nf4.dequantize_nf4(packed_values, absmax, block_size)[source]#
easydel.layers.quantization.linear_nf4.quantize_and_pack_nf4(blocks, block_size=64)[source]#
easydel.layers.quantization.linear_nf4.single_dequantize_nf4(packed_values, absmax, block_size)[source]#

Optimized dequantization combining unpacking and scaling in fewer operations.

easydel.layers.quantization.linear_nf4.single_quantize_and_pack_nf4(blocks, block_size=64)[source]#

Combined quantization and packing for better performance. Handles normalization, quantization, and packing in a single operation.