easydel.infra.elarge_model.elarge_model

Contents

easydel.infra.elarge_model.elarge_model#

eLargeModel - Easy Large Models master class for EasyDeL.

This module provides a unified interface for working with large language models in the EasyDeL framework, combining configuration management, model building, training orchestration, and inference engine initialization.

Key Features:
  • Unified configuration management for models, training, and inference

  • Automatic model and tokenizer initialization from HuggingFace or local paths

  • Support for multiple training paradigms (SFT, DPO, ORPO, GRPO, distillation)

  • Integration with the eSurge inference engine

  • Built-in evaluation with lm-evaluation-harness

  • Flexible dataset mixture configuration

  • Model sharding and quantization support

class easydel.infra.elarge_model.elarge_model.BuildTrainerKws[source]#

Bases: TypedDict

Type hints for optional keyword arguments when building trainers.

data_collator#

Custom data collator for batching examples

Type

Callable

formatting_func#

Function to format examples for SFT training

Type

Callable

reward_processing_classes#

Processing classes for reward models in GRPO

Type

list[Callable]

data_tokenize_fn#

Custom tokenization function for data preprocessing

Type

Callable

reference_model#

Reference model for DPO/preference optimization

Type

easydel.infra.base_module.EasyDeLBaseModule | None

reward_model#

Reward model for GRPO training

Type

easydel.infra.base_module.EasyDeLBaseModule | None

teacher_model#

Teacher model for distillation training

Type

easydel.infra.base_module.EasyDeLBaseModule | None

reward_funcs#

Custom reward functions for GRPO

Type

Any | None

data_collator: Callable#
data_tokenize_fn: Callable#
formatting_func: Callable#
reference_model: easydel.infra.base_module.EasyDeLBaseModule | None#
reward_funcs: Any | None#
reward_model: easydel.infra.base_module.EasyDeLBaseModule | None#
reward_processing_classes: list[Callable]#
teacher_model: easydel.infra.base_module.EasyDeLBaseModule | None#
class easydel.infra.elarge_model.elarge_model.eLargeModel(config: easydel.infra.elarge_model.types.ELMConfig | collections.abc.Mapping[str, Any] | str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath | None = None)[source]#

Bases: object

Master class for Easy Large Models (ELM) in EasyDeL.

This class provides a unified interface for: - Configuration management (load, save, create) - Model building and initialization (including teacher/reference models) - Training orchestration with multiple paradigms (SFT, DPO, ORPO, etc.) - eSurge inference engine integration - Tokenizer management - Dataset mixture configuration - Model evaluation with lm-evaluation-harness

config#

The normalized ELM configuration dictionary

model_name#

The model name or path from configuration

task#

The resolved task type (auto-detected or specified)

teacher_model_name#

Teacher model name for distillation (if configured)

reference_model_name#

Reference model name for DPO/ORPO (if configured)

Example

Basic model loading: >>> elm = eLargeModel({“model”: {“name_or_path”: “meta-llama/Llama-2-7b”}}) >>> model = elm.build_model()

From pretrained with configuration: >>> elm = eLargeModel.from_pretrained( … “meta-llama/Llama-2-7b”, … task=”causal-lm” … ) >>> elm.set_dtype(“bf16”) >>> elm.set_sharding(axis_dims=(1, 2, 1, -1))

Loading from JSON configuration: >>> elm = eLargeModel.from_json(“config.json”) >>> esurge_engine = elm.build_esurge()

Training with SFT: >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> elm.add_dataset(“train.json”, dataset_type=”json”, content_field=”text”) >>> results = elm.train()

Evaluation: >>> results = elm.eval([“hellaswag”, “mmlu”], engine=”esurge”)

add_dataset(data_files: str | list[str], dataset_type: str | None = None, content_field: str = 'content', split: str = 'train', **kwargs) eLargeModel[source]#

Add a dataset to the mixture configuration.

Appends a new dataset to the existing mixture. Multiple datasets can be added and will be combined during training.

Parameters
  • data_files – Path(s) to data files. Can be: - Single file: “data.json” - Multiple files: [“data1.json”, “data2.json”] - Glob pattern: “data/*.parquet” - Remote URL: “https://example.com/data.json

  • dataset_type – Dataset type or format. Options: - File formats: “json”, “jsonl”, “parquet”, “csv”, “text” - HuggingFace dataset ID: “imdb”, “squad”, etc. - None: Auto-detect from file extension

  • content_field – Field name containing the text content (default: “content”). For chat data, might be “messages” or “conversations”.

  • split – Dataset split to use (default: “train”). Common values: “train”, “validation”, “test”.

  • **kwargs – Additional dataset options: - weight: Sampling weight for this dataset - max_samples: Maximum samples to use - filter_fn: Function to filter samples - map_fn: Function to transform samples

Returns

Self for method chaining

Example

>>> # Add a JSON dataset
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>>
>>> # Add a HuggingFace dataset
>>> elm.add_dataset("imdb", dataset_type="imdb", split="train")
>>>
>>> # Add multiple Parquet files with sampling weight
>>> elm.add_dataset(
...     "data/*.parquet",
...     dataset_type="parquet",
...     content_field="content",
...     weight=0.5
... )
build_dataset()[source]#

Build dataset from mixture configuration.

Creates a dataset from the configured mixture of data sources. Supports multiple formats (JSON, Parquet, CSV) and can combine multiple data sources into a single dataset.

Returns

The loaded and processed dataset ready for training,

or None if no mixture is configured

Return type

Dataset

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>> elm.add_dataset("valid/*.parquet", dataset_type="parquet", content_field="content")
>>> dataset = elm.build_dataset()
>>> print(f"Dataset size: {len(dataset)}")
build_esurge() eSurge[source]#

Build the eSurge inference engine.

Creates an eSurge engine instance configured with the current settings. Automatically builds the model if not already built.

Returns

eSurge instance ready for batch inference

Example

>>> elm.set_esurge(max_num_seqs=32, hbm_utilization=0.9)
>>> engine = elm.build_esurge()
>>> # Use engine for batch inference
>>> results = engine.generate(prompts, max_tokens=100)
build_model(force_rebuild: bool = False) EasyDeLBaseModule[source]#

Build the EasyDeL model from configuration.

Loads the model using the configured settings including dtype, sharding, and quantization. The model is cached after first build unless force_rebuild is True.

Parameters

force_rebuild – Force rebuilding even if model is already cached. Useful when configuration has changed.

Returns

EasyDeLBaseModule instance ready for training or inference

Raises
  • ValueError – If model name/path is not set

  • RuntimeError – If model loading fails

Example

>>> elm = eLargeModel.from_pretrained("meta-llama/Llama-2-7b")
>>> elm.set_dtype("bf16")
>>> model = elm.build_model()
build_reference_model() easydel.infra.base_module.EasyDeLBaseModule | None[source]#

Build the reference model for preference optimization (DPO, etc.).

Loads the reference model using the same loader configuration as the primary model. The reference model provides a baseline for computing preference losses in DPO, ORPO, and similar methods.

Returns

EasyDeLBaseModule instance for the reference model, or None if no reference model is configured

Example

>>> elm.set_reference_model("meta-llama/Llama-2-7b-hf")
>>> reference = elm.build_reference_model()
>>> # Reference model will be used automatically in DPO training
build_sharded_source() ShardedDataSource | None[source]#

Build dataset as ShardedDataSource for use with new data pipeline.

Creates a ShardedDataSource from the configured mixture of data sources. This uses the new data architecture that supports lazy transforms, efficient streaming, and better integration with trainers.

Returns

The data source ready for training, or None

if no mixture is configured

Return type

ShardedDataSource

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>> elm.set_mixture(use_sharded_source=True)
>>> source = elm.build_sharded_source()
>>> for batch in source.open_shard(source.shard_names[0]):
...     process(batch)
build_teacher_model() easydel.infra.base_module.EasyDeLBaseModule | None[source]#

Build the teacher model for distillation training.

Loads the teacher model using the same loader configuration as the student model (dtype, sharding, etc.) but with the teacher model path.

Returns

EasyDeLBaseModule instance for the teacher model, or None if no teacher model is configured

Example

>>> elm.set_teacher_model("meta-llama/Llama-2-13b")
>>> teacher = elm.build_teacher_model()
>>> # Teacher model will be used automatically in distillation training
build_tokenizer(force_rebuild: bool = False) AutoTokenizer[source]#

Build or get the tokenizer for the model.

Loads the tokenizer from the model path or a separately specified tokenizer path. The tokenizer is cached after first build.

Parameters

force_rebuild – Force rebuilding even if tokenizer is already cached. Useful when switching between different tokenizers.

Returns

AutoTokenizer instance configured for the model

Raises

ValueError – If tokenizer path cannot be determined

Example

>>> tokenizer = elm.build_tokenizer()
>>> tokens = tokenizer("Hello world", return_tensors="np")
build_trainer(train_dataset: Dataset | ShardedDataSource | None = None, eval_dataset: Dataset | ShardedDataSource | None = None, reference_model: EasyDeLBaseModule | None = None, reward_model: EasyDeLBaseModule | None = None, teacher_model: EasyDeLBaseModule | None = None, reward_funcs: Any | None = None, base_state_class: type[EasyDeLState] | None = None, args_class: type[TrainingArguments] | None = None, trainer_class: type[Trainer] | None = None, **kwargs) Trainer[source]#

Build a trainer instance with the configured settings.

Creates and configures a trainer based on the trainer_type setting. Automatically builds required models and datasets if not provided.

Parameters
  • train_dataset – Training dataset (Dataset or ShardedDataSource). If None, builds from mixture config using get_train_source().

  • eval_dataset – Evaluation dataset for validation metrics.

  • reference_model – Reference model for DPO/ORPO. If None, builds from reference_model configuration if present.

  • reward_model – Reward model for GRPO. If None, builds from config.

  • teacher_model – Teacher model for distillation. If None, builds from teacher_model configuration if present.

  • reward_funcs – Custom reward functions for GRPO. Alternative to reward_model.

  • base_state_class – Custom EasyDeLState class for model state management.

  • args_class – Custom TrainingArguments class. Auto-selected if None.

  • trainer_class – Custom Trainer class. Auto-selected if None.

  • **kwargs – Additional trainer configuration overrides

Returns

Configured trainer instance ready for training

Raises

ValueError – If required models or datasets are not configured

Example

>>> # Build trainer with auto-configuration
>>> trainer = elm.build_trainer()
>>>
>>> # Build trainer with custom dataset
>>> custom_data = load_dataset("custom_data")
>>> trainer = elm.build_trainer(train_dataset=custom_data)
>>>
>>> # Build DPO trainer with custom reference model
>>> ref_model = elm.build_reference_model()
>>> trainer = elm.build_trainer(
...     trainer_type="dpo",
...     reference_model=ref_model
... )
build_training_arguments(args_class: easydel.trainers.training_configurations.TrainingArguments | None = None, **overrides)[source]#

Build TrainingArguments for the configured trainer.

Parameters
  • args_class – Optional custom TrainingArguments class. If not provided, will automatically select based on trainer_type.

  • **overrides – Override specific configuration values

Returns

TrainingArguments instance for the configured trainer type (e.g., DPOConfig for DPO training, SFTConfig for SFT)

clear_cache() None[source]#

Clear cached model, tokenizer, and inference engine instances.

This is useful when you want to reload models with different configurations or free memory after model operations.

property config: ELMConfig#

Get the normalized configuration dictionary.

Returns

The full ELM configuration including model, loader, sharding, quantization, training, and inference settings.

eval(tasks: str | list[str], engine: Union[Literal['esurge', 'auto'], Any] = 'auto', num_fewshot: int = 0, output_path: str | None = None) dict[str, Any][source]#

Run evaluation on specified tasks using lm-evaluation-harness.

This method provides a unified interface for evaluating models using the eSurge engine with the lm-evaluation-harness framework.

Parameters
  • tasks – Task name(s) to evaluate on. Can be a single task string or list of tasks. Common tasks include: - Language understanding: “hellaswag”, “winogrande”, “piqa”, “arc_easy”, “arc_challenge” - Math: “gsm8k”, “math”, “minerva_math” - Knowledge: “mmlu”, “triviaqa”, “naturalquestions” - Reasoning: “bbh”, “boolq”, “copa” - Truthfulness: “truthfulqa_mc1”, “truthfulqa_mc2” - Coding: “humaneval”, “mbpp” Full list: EleutherAI/lm-evaluation-harness

  • engine – Inference engine to use. Options: - “esurge”: Use eSurge engine (high throughput) - “auto”: Automatically select based on configuration (default) - An existing eSurge instance for custom configuration

  • num_fewshot – Number of few-shot examples to use (default: 0 for zero-shot). Different tasks may have different recommended values: - MMLU: typically 5-shot - GSM8K: typically 8-shot - HellaSwag: typically 0-shot

  • output_path – Optional path to save evaluation results as JSON. Results include detailed metrics, task versions, and configuration.

Returns

{
“results”: {
task_name: {

metric_name: value, # e.g., “acc”: 0.85, “acc_stderr”: 0.02 …

}, “versions”: {task_name: version_string, …}, “config”: {“model”: …, “num_fewshot”: …, …}

}

Return type

Dictionary containing evaluation results with structure

Example

Basic zero-shot evaluation: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> results = elm.eval(“hellaswag”) >>> print(f”HellaSwag accuracy: {results[‘results’][‘hellaswag’][‘acc’]:.2%}”)

Few-shot evaluation with multiple tasks: >>> elm.set_esurge(max_num_seqs=64, hbm_utilization=0.9) >>> results = elm.eval( … [“gsm8k”, “mmlu”, “truthfulqa_mc1”], … engine=”esurge”, … num_fewshot=5, … output_path=”eval_results.json” … ) >>> for task, metrics in results[“results”].items(): … print(f”{task}: {metrics.get(‘acc’, metrics.get(‘exact_match’)):.2%}”)

Evaluation with custom settings: >>> elm.set_eval(

… max_new_tokens=512,

… temperature=0.0, # Greedy decoding … batch_size=32 … ) >>> results = elm.eval([“humaneval”, “mbpp”])

Raises
  • ImportError – If lm-eval is not installed (install with: pip install lm-eval)

  • ValueError – If invalid engine type or model not configured

  • RuntimeError – If evaluation fails during execution

Note

The evaluation uses settings from set_eval() for generation parameters. Default settings are optimized for deterministic evaluation (temperature=0).

classmethod from_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) eLargeModel[source]#

Create eLargeModel from JSON configuration file.

Parameters

json_path – Path to JSON configuration file

Returns

eLargeModel instance

classmethod from_pretrained(model_name_or_path: str, task: easydel.infra.factory.TaskType | str | None = None, **kwargs) eLargeModel[source]#

Create eLargeModel from pretrained model name or path.

Parameters
  • model_name_or_path – HuggingFace model ID or local path

  • task – Optional task type (auto-detected if not provided or AUTO_BIND)

  • **kwargs – Additional configuration options

Returns

eLargeModel instance with configuration

get_base_config(prefer: str = 'base') dict[str, Any][source]#

Get materialized base configuration.

Resolves the configuration hierarchy, materializing shared base settings across different configuration sections.

Parameters

prefer – Resolution preference when conflicts exist: - “base”: Prefer values from base configuration - “section”: Prefer values from specific sections

Returns

Base configuration dictionary with resolved values

Example

>>> # Get configuration with base values taking precedence
>>> base_config = elm.get_base_config(prefer="base")
>>> print(base_config["dtype"])  # Shows the base dtype setting
get_data_mixture_kwargs() dict[str, Any][source]#

Get kwargs for DatasetMixture initialization.

Extracts and formats the mixture configuration for use with the DatasetMixture class.

Returns

Dictionary of DatasetMixture arguments including informs, batch_size, streaming settings, and other mixture options

get_esurge_kwargs() dict[str, Any][source]#

Get kwargs for eSurge initialization.

Extracts and formats the configuration options for creating an eSurge engine instance.

Returns

Dictionary of eSurge arguments including max_model_len, max_num_seqs, hbm_utilization, and other engine settings

Example

>>> kwargs = elm.get_esurge_kwargs()
>>> # Can be used directly:
>>> from easydel.inference import eSurge
>>> engine = eSurge(model, **kwargs)
get_from_pretrained_kwargs() dict[str, Any][source]#

Get kwargs for model.from_pretrained() calls.

Extracts and formats the configuration options that should be passed to the model’s from_pretrained() method, including dtype, sharding, and quantization settings.

Returns

Dictionary of from_pretrained arguments ready to use with EasyDeL model loading functions

Example

>>> kwargs = elm.get_from_pretrained_kwargs()
>>> # Can be used directly:
>>> model = LlamaForCausalLM.from_pretrained(
...     "meta-llama/Llama-2-7b",
...     **kwargs
... )
get_train_source() ShardedDataSource | Dataset | None[source]#

Get training data as ShardedDataSource or Dataset.

Automatically selects the appropriate data format based on the use_sharded_source configuration option.

Returns

ShardedDataSource if use_sharded_source=True in mixture config, otherwise HuggingFace Dataset. Returns None if no mixture configured.

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json")
>>> elm.set_mixture(use_sharded_source=True)  # Use new pipeline
>>> data = elm.get_train_source()  # Returns ShardedDataSource
>>>
>>> elm.set_mixture(use_sharded_source=False)  # Use legacy pipeline
>>> data = elm.get_train_source()  # Returns HF Dataset
get_trainer_config() dict[str, Any][source]#

Get normalized trainer configuration.

This method processes the raw trainer configuration and applies defaults and normalization for the specified trainer type.

Returns

Normalized trainer configuration dictionary with all required fields populated with defaults where necessary.

property model_name: str#

Get the model name or path.

Returns

The HuggingFace model ID or local path to the model.

property reference_model_name: str | None#

Get the reference model name or path for DPO/ORPO.

Returns

The reference model path if configured, None otherwise.

set_dtype(dtype: str) eLargeModel[source]#

Set the data type for model loading.

Configures both the computation dtype and parameter dtype for the model. This affects memory usage and computation speed.

Parameters

dtype – Data type string. Supported values: - “bf16”: BFloat16 (recommended for TPU, modern GPUs) - “fp16”: Float16 (good for older GPUs) - “fp32”: Float32 (highest precision, most memory) - “fp8”: Float8 (experimental, requires compatible hardware)

Returns

Self for method chaining

Example

>>> elm.set_dtype("bf16")  # Use bfloat16 for training/inference
set_esurge(max_model_len: int | None = None, max_num_seqs: int = 16, hbm_utilization: float = 0.85, **kwargs) eLargeModel[source]#

Configure eSurge inference settings.

eSurge is a high-performance batch inference engine optimized for throughput. It uses PagedAttention for efficient memory management.

Parameters
  • max_model_len – Maximum model sequence length (input + output tokens). If None, uses model’s default max position embeddings.

  • max_num_seqs – Maximum number of sequences to process concurrently. Higher values increase throughput but require more memory.

  • hbm_utilization – HBM memory utilization ratio (0.0-1.0). Controls how much device memory to use for KV cache.

  • **kwargs – Additional eSurge options: - page_size: PagedAttention page size (default: 128) - enable_prefix_caching: Enable prefix caching optimization - kv_cache_dtype: Dtype for KV cache (None = auto) - decoding_engine: “ring” or “triton” (default: auto)

Returns

Self for method chaining

Example

>>> elm.set_esurge(
...     max_model_len=8192,
...     max_num_seqs=64,
...     hbm_utilization=0.9,
...     enable_prefix_caching=True
... )
set_eval(max_new_tokens: int = 2048, temperature: float = 0.0, top_p: float = 0.95, batch_size: int | None = None, use_tqdm: bool = True, **kwargs) eLargeModel[source]#

Configure evaluation settings for lm-evaluation-harness.

Sets default parameters for model evaluation on standard benchmarks. These settings apply when using the eval() method.

Parameters
  • max_new_tokens – Maximum tokens to generate per evaluation sample (default: 2048). Lower values speed up evaluation.

  • temperature – Sampling temperature (default: 0.0 for greedy decoding). 0.0 = deterministic/greedy, higher = more random.

  • top_p – Top-p (nucleus) sampling parameter (default: 0.95). Only used when temperature > 0.

  • batch_size – Evaluation batch size (default: engine-specific). Higher values increase throughput but use more memory.

  • use_tqdm – Show progress bar during evaluation (default: True)

  • **kwargs – Additional evaluation options: - top_k: Top-k sampling parameter - repetition_penalty: Penalty for repeated tokens - num_beams: Beam search width (1 = greedy) - do_sample: Whether to use sampling - early_stopping: Stop generation at first EOS

Returns

Self for method chaining

Example

>>> # Configure for deterministic evaluation
>>> elm.set_eval(
...     max_new_tokens=512,
...     temperature=0.0,
...     batch_size=64
... )
>>>
>>> # Configure for sampling-based evaluation
>>> elm.set_eval(
...     temperature=0.7,
...     top_p=0.9,
...     top_k=50
... )
set_mixture(informs: list[dict] | None = None, batch_size: int = 32, streaming: bool = True, use_fast_loader: bool = True, **kwargs) eLargeModel[source]#

Configure data mixture settings for training/evaluation.

Sets up a mixture of datasets that can be combined and sampled from during training. Supports multiple data sources and formats.

Parameters
  • informs – List of dataset configurations. Each dict should contain: - type: Dataset type (“json”, “parquet”, “csv”, “text”, or HF dataset ID) - data_files: Path or pattern to data files - content_field: Field name containing the text content - split: Dataset split to use (default: “train”) - weight: Sampling weight for this dataset (optional)

  • batch_size – Batch size for data loading (default: 32)

  • streaming – Use streaming mode for large datasets (default: True). Reduces memory usage but may be slower.

  • use_fast_loader – Enable fast loading with fsspec (default: True). Provides optimized loading for remote/cloud storage.

  • **kwargs – Additional mixture options: - max_samples: Maximum samples per dataset - shuffle: Whether to shuffle data - seed: Random seed for shuffling

Returns

Self for method chaining

Example

>>> elm.set_mixture(
...     informs=[
...         {"type": "json", "data_files": "train.json", "content_field": "text", "weight": 0.7},
...         {"type": "parquet", "data_files": "valid/*.parquet", "content_field": "content", "weight": 0.3}
...     ],
...     batch_size=32,
...     streaming=True,
...     shuffle=True,
...     seed=42
... )
set_model(model_name_or_path: str) eLargeModel[source]#

Set the model name or path.

Updates the primary model configuration. This will clear any cached model instance to ensure the new model is loaded on next build.

Parameters

model_name_or_path – HuggingFace model ID (e.g., “meta-llama/Llama-2-7b”) or local path to model directory

Returns

Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b-hf")
>>> elm.set_model("/path/to/local/model")
set_operation_configs(configs: collections.abc.Mapping[str, Any] | None = None, **kwargs) eLargeModel[source]#

Configure ejkernel operation overrides.

Allows overriding ejkernel’s autotune behavior for specific attention operations by providing explicit configuration objects. When a config is provided, it’s passed directly to the operation instead of using ejkernel’s autotune.

Parameters
  • configs – Dictionary mapping operation names to config objects. Valid operation names (must match OperationRegistry): - “flash_attn2”: Flash attention 2 - “ring”: Ring attention - “blocksparse”: Block sparse attention - “ragged_page_attention_v2”: Ragged page attention v2 - “ragged_page_attention_v3”: Ragged page attention v3 - “sdpa”: Scaled dot product attention - “vanilla”: Vanilla attention

  • **kwargs – Individual operation configs as keyword arguments. These are merged with the configs dict.

Returns

Self for method chaining

Example

>>> from easydel import FlashAttentionConfig, RingAttentionConfig
>>>
>>> # Using dict
>>> elm.set_operation_configs({
...     "flash_attn2": FlashAttentionConfig(platform="triton"),
...     "ring": RingAttentionConfig(),
... })
>>>
>>> # Using kwargs
>>> elm.set_operation_configs(
...     flash_attn2=FlashAttentionConfig(platform="pallas"),
... )
set_quantization(method: str | None = None, block_size: int = 128, **kwargs) eLargeModel[source]#

Configure quantization settings.

Enables model quantization to reduce memory usage and potentially improve inference speed at the cost of some accuracy.

Parameters
  • method – Quantization method.

  • block_size – Quantization block size (default: 128). Smaller blocks = better accuracy but more overhead.

  • **kwargs – Additional quantization options: - platform: Target platform (“cpu”, “cuda”, “tpu”) - compute_dtype: Dtype for computation (e.g., “fp16”) - double_quant: Enable double quantization for 4bit

Returns

Self for method chaining

Example

>>> elm.set_quantization("nf4", block_size=64)
>>> elm.set_quantization("a8bit")
set_reference_model(model_name_or_path: str) eLargeModel[source]#

Set the reference model name or path for preference optimization.

Configures a reference model used in DPO (Direct Preference Optimization) and similar preference-based training methods. The reference model provides a baseline for computing preference losses.

Parameters

model_name_or_path – HuggingFace model ID or local path for reference model. Often the same as the base model before fine-tuning.

Returns

Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b-hf")  # Model to train
>>> elm.set_reference_model("meta-llama/Llama-2-7b-hf")  # Reference
>>> elm.set_trainer("dpo", beta=0.1)
set_sharding(axis_dims: tuple[int, ...] | None = None, axis_names: tuple[str, ...] | None = None, **kwargs) eLargeModel[source]#

Configure model sharding for distributed training/inference.

Sets up model parallelism by specifying how to shard model parameters and computations across devices. Essential for training large models that don’t fit on a single device.

Parameters
  • axis_dims – Sharding axis dimensions as a tuple. Common patterns: - (1, 1, 1, -1): Data parallel only - (2, 1, 1, -1): 2-way tensor parallel - (1, 2, 1, -1): 2-way pipeline parallel - (2, 2, 1, -1): 2-way tensor + 2-way pipeline parallel

  • axis_names – Sharding axis names (e.g., (“dp”, “tp”, “pp”, “sp”)) - dp: Data parallel - tp: Tensor parallel - pp: Pipeline parallel - sp: Sequence parallel

  • **kwargs – Additional sharding options.

Returns

Self for method chaining

Example

>>> # 2-way tensor parallel, 2-way data parallel
>>> elm.set_sharding(
...     axis_dims=(2, 2, 1, -1),
...     axis_names=("dp", "tp", "pp", "sp")
... )
set_teacher_model(model_name_or_path: str) eLargeModel[source]#

Set the teacher model name or path for distillation training.

Configures a teacher model used for knowledge distillation. The teacher model is typically a larger, more capable model that guides the training of the student (primary) model.

Parameters

model_name_or_path – HuggingFace model ID or local path for teacher model. Should be a model compatible with the student model’s architecture.

Returns

Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b")  # Student model
>>> elm.set_teacher_model("meta-llama/Llama-2-13b")  # Teacher model
>>> elm.set_trainer("distillation", temperature=3.0)
set_trainer(trainer_type: str, **kwargs) eLargeModel[source]#

Configure trainer settings.

Sets the training paradigm and associated hyperparameters.

Parameters
  • trainer_type – Type of trainer to use: - “sft”: Supervised Fine-Tuning - “dpo”: Direct Preference Optimization - “orpo”: Odds Ratio Preference Optimization - “grpo”: Group Relative Policy Optimization - “reward”: Reward model training - “distillation”: Knowledge distillation - “base”: Basic trainer for custom training loops

  • **kwargs – Trainer-specific configuration options: Common options: - learning_rate: Learning rate (default: 5e-5) - num_train_epochs: Number of training epochs - per_device_train_batch_size: Batch size per device - gradient_accumulation_steps: Gradient accumulation steps - warmup_steps: Number of warmup steps - output_dir: Directory to save checkpoints DPO-specific: - beta: KL regularization coefficient - loss_type: “sigmoid”, “ipo”, “hinge” Distillation-specific: - temperature: Distillation temperature - alpha: Weight for distillation loss

Returns

Self for method chaining

Example

>>> # SFT training
>>> elm.set_trainer(
...     "sft",
...     learning_rate=2e-5,
...     num_train_epochs=3,
...     per_device_train_batch_size=4
... )
>>>
>>> # DPO training
>>> elm.set_trainer(
...     "dpo",
...     beta=0.1,
...     learning_rate=1e-6
... )
property task: TaskType#

Get the resolved task type.

Returns

The task type (e.g., TaskType.CAUSAL_LM) either explicitly configured or auto-detected from the model.

property teacher_model_name: str | None#

Get the teacher model name or path for distillation.

Returns

The teacher model path if configured, None otherwise.

to_dict() dict[str, Any][source]#

Get configuration as dictionary.

Returns a copy of the full configuration dictionary that can be modified without affecting the eLargeModel instance.

Returns

Configuration dictionary with all settings

Example

>>> config_dict = elm.to_dict()
>>> print(config_dict["model"]["name_or_path"])
>>> # Modify the dict without affecting elm
>>> config_dict["model"]["dtype"] = "fp16"
to_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) None[source]#

Save configuration to JSON file.

Exports the current configuration to a JSON file that can be loaded later with from_json() or shared with others.

Parameters

json_path – Path where the JSON configuration file will be saved. Will create parent directories if they don’t exist.

Example

>>> elm.to_json("config.json")
>>> # Later or on another machine:
>>> elm2 = eLargeModel.from_json("config.json")
train(train_dataset: Dataset | ShardedDataSource | None = None, eval_dataset: Dataset | ShardedDataSource | None = None, base_state_class: type[EasyDeLState] | None = None, args_class: type[TrainingArguments] | None = None, trainer_class: type[Trainer] | None = None, **build_kwargs: Unpack[BuildTrainerKws])[source]#

Train the model with the configured settings.

This is a high-level convenience method that orchestrates the entire training pipeline: 1. Validates configuration 2. Builds the model if not already built 3. Creates the dataset from mixture configuration if not provided 4. Builds reference/teacher models if needed 5. Creates the appropriate trainer 6. Runs training and returns results

Args:
train_dataset: Optional training dataset (Dataset or ShardedDataSource).

If None, will build from mixture configuration.

eval_dataset: Optional evaluation dataset for validation during training. base_state_class: Optional custom EasyDeLState class for model state

management. Use for custom model implementations.

args_class: Optional custom TrainingArguments class. If None, will

auto-select based on trainer_type.

trainer_class: Optional custom Trainer class. If None, will auto-select

based on trainer_type.

**build_kwargs: Additional kwargs for trainer building:
  • data_collator: Custom data collator function

  • formatting_func: Function to format examples (SFT)

  • reward_processing_classes: Processing classes for rewards (GRPO)

  • data_tokenize_fn: Custom tokenization function

  • reference_model: Override reference model

  • reward_model: Override reward model

  • teacher_model: Override teacher model

  • reward_funcs: Custom reward functions

Returns:

Training results from the trainer, including metrics and final model state

Example:

Basic SFT training: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> elm.add_dataset(“train.json”, dataset_type=”json”) >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> results = elm.train()

DPO training with custom datasets: >>> train_data = load_dataset(“preference_data”, split=”train”) >>> eval_data = load_dataset(“preference_data”, split=”test”) >>> elm.set_trainer(“dpo”, beta=0.1) >>> elm.set_reference_model(“meta-llama/Llama-2-7b”) >>> results = elm.train(train_dataset=train_data, eval_dataset=eval_data)

Custom trainer with formatting function: >>> def format_fn(examples): … return [f”Question: {q}

Answer: {a}”

… for q, a in zip(examples[“question”], examples[“answer”])] >>> results = elm.train(formatting_func=format_fn)

update_config(updates: Mapping[str, Any]) eLargeModel[source]#

Update configuration with new values.

Performs a deep merge of the updates into the existing configuration, preserving nested structures. The configuration is re-normalized after updating to ensure consistency.

Parameters

updates – Dictionary with configuration updates. Can include nested structures like {“model”: {“dtype”: “bf16”}, “esurge”: {“max_num_seqs”: 32}}

Returns

Self for method chaining

Example

>>> elm.update_config({
...     "loader": {"dtype": "bf16"},
...     "esurge": {"max_model_len": 4096}
... })
validate() None[source]#

Validate the current configuration.

Checks that all required fields are present and have valid values. This is automatically called before training or building engines.

Raises

ValueError – If configuration is invalid (e.g., missing model name, invalid dtype, incompatible settings)