easydel.infra.elarge_model.elarge_model

easydel.infra.elarge_model.elarge_model#

eLargeModel - Easy Large Models master class for EasyDeL.

This module provides a unified interface for working with large language models in the EasyDeL framework, combining configuration management, model building, training orchestration, and inference engine initialization.

Key Features:

Unified configuration management for models, training, and inference
Automatic model and tokenizer initialization from HuggingFace or local paths
Support for multiple training paradigms (SFT, DPO, ORPO, GRPO, distillation)
Integration with the eSurge inference engine
Built-in evaluation with lm-evaluation-harness
Flexible dataset mixture configuration
Model sharding and quantization support

class easydel.infra.elarge_model.elarge_model.BuildTrainerKws[source]#

Bases: TypedDict

Type hints for optional keyword arguments when building trainers.

data_collator#

Custom data collator for batching examples

Type: Callable

formatting_func#

Function to format examples for SFT training

Type: Callable

reward_processing_classes#

Processing classes for reward models in GRPO

Type: list[Callable]

data_tokenize_fn#

Custom tokenization function for data preprocessing

Type: Callable

reference_model#

Reference model for DPO/preference optimization

Type: easydel.infra.base_module.EasyDeLBaseModule | None

reward_model#

Reward model for GRPO training

Type: easydel.infra.base_module.EasyDeLBaseModule | None

teacher_model#

Teacher model for distillation training

Type: easydel.infra.base_module.EasyDeLBaseModule | None

reward_funcs#

Custom reward functions for GRPO

Type: Any | None

data_collator: Callable#

data_tokenize_fn: Callable#

formatting_func: Callable#

reference_model: easydel.infra.base_module.EasyDeLBaseModule | None#

reward_funcs: Any | None#

reward_model: easydel.infra.base_module.EasyDeLBaseModule | None#

reward_processing_classes: list[Callable]#

teacher_model: easydel.infra.base_module.EasyDeLBaseModule | None#

Bases: object

Master class for Easy Large Models (ELM) in EasyDeL.

This class provides a unified interface for: - Configuration management (load, save, create) - Model building and initialization (including teacher/reference models) - Training orchestration with multiple paradigms (SFT, DPO, ORPO, etc.) - eSurge inference engine integration - Tokenizer management - Dataset mixture configuration - Model evaluation with lm-evaluation-harness

config#: The normalized ELM configuration dictionary

model_name#: The model name or path from configuration

task#: The resolved task type (auto-detected or specified)

teacher_model_name#: Teacher model name for distillation (if configured)

reference_model_name#: Reference model name for DPO/ORPO (if configured)

Example

Basic model loading: >>> elm = eLargeModel({“model”: {“name_or_path”: “meta-llama/Llama-2-7b”}}) >>> model = elm.build_model()

From pretrained with configuration: >>> elm = eLargeModel.from_pretrained( … “meta-llama/Llama-2-7b”, … task=”causal-lm” … ) >>> elm.set_dtype(“bf16”) >>> elm.set_sharding(axis_dims=(1, 2, 1, -1))

Loading from JSON configuration: >>> elm = eLargeModel.from_json(“config.json”) >>> esurge_engine = elm.build_esurge()

Training with SFT: >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> elm.add_dataset(“train.json”, dataset_type=”json”, content_field=”text”) >>> results = elm.train()

Evaluation: >>> results = elm.eval([“hellaswag”, “mmlu”], engine=”esurge”)

add_dataset(data_files: str | list[str], dataset_type: str | None = None, content_field: str = 'content', split: str = 'train', **kwargs) → eLargeModel[source]#

Add a dataset to the mixture configuration.

Appends a new dataset to the existing mixture. Multiple datasets can be added and will be combined during training.

Parameters

data_files – Path(s) to data files. Can be: - Single file: “data.json” - Multiple files: [“data1.json”, “data2.json”] - Glob pattern: “data/*.parquet” - Remote URL: “https://example.com/data.json”
dataset_type – Dataset type or format. Options: - File formats: “json”, “jsonl”, “parquet”, “csv”, “text” - HuggingFace dataset ID: “imdb”, “squad”, etc. - None: Auto-detect from file extension
content_field – Field name containing the text content (default: “content”). For chat data, might be “messages” or “conversations”.
split – Dataset split to use (default: “train”). Common values: “train”, “validation”, “test”.
**kwargs – Additional dataset options: - weight: Sampling weight for this dataset - max_samples: Maximum samples to use - filter_fn: Function to filter samples - map_fn: Function to transform samples

Returns

Self for method chaining

Example

>>> # Add a JSON dataset
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>>
>>> # Add a HuggingFace dataset
>>> elm.add_dataset("imdb", dataset_type="imdb", split="train")
>>>
>>> # Add multiple Parquet files with sampling weight
>>> elm.add_dataset(
...     "data/*.parquet",
...     dataset_type="parquet",
...     content_field="content",
...     weight=0.5
... )

build_dataset()[source]#

Build dataset from mixture configuration.

Creates a dataset from the configured mixture of data sources. Supports multiple formats (JSON, Parquet, CSV) and can combine multiple data sources into a single dataset.

Returns

The loaded and processed dataset ready for training,: or None if no mixture is configured

Return type

Dataset

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>> elm.add_dataset("valid/*.parquet", dataset_type="parquet", content_field="content")
>>> dataset = elm.build_dataset()
>>> print(f"Dataset size: {len(dataset)}")

build_esurge() → eSurge[source]#

Build the eSurge inference engine.

Creates an eSurge engine instance configured with the current settings. Automatically builds the model if not already built.

Returns: eSurge instance ready for batch inference

Example

>>> elm.set_esurge(max_num_seqs=32, hbm_utilization=0.9)
>>> engine = elm.build_esurge()
>>> # Use engine for batch inference
>>> results = engine.generate(prompts, max_tokens=100)

build_model(force_rebuild: bool = False) → EasyDeLBaseModule[source]#

Build the EasyDeL model from configuration.

Loads the model using the configured settings including dtype, sharding, and quantization. The model is cached after first build unless force_rebuild is True.

Parameters

force_rebuild – Force rebuilding even if model is already cached. Useful when configuration has changed.

Returns

EasyDeLBaseModule instance ready for training or inference

Raises

ValueError – If model name/path is not set
RuntimeError – If model loading fails

Example

>>> elm = eLargeModel.from_pretrained("meta-llama/Llama-2-7b")
>>> elm.set_dtype("bf16")
>>> model = elm.build_model()

build_reference_model() → easydel.infra.base_module.EasyDeLBaseModule | None[source]#

Build the reference model for preference optimization (DPO, etc.).

Loads the reference model using the same loader configuration as the primary model. The reference model provides a baseline for computing preference losses in DPO, ORPO, and similar methods.

Returns: EasyDeLBaseModule instance for the reference model, or None if no reference model is configured

Example

>>> elm.set_reference_model("meta-llama/Llama-2-7b-hf")
>>> reference = elm.build_reference_model()
>>> # Reference model will be used automatically in DPO training

build_sharded_source() → ShardedDataSource | None[source]#

Build dataset as ShardedDataSource for use with new data pipeline.

Creates a ShardedDataSource from the configured mixture of data sources. This uses the new data architecture that supports lazy transforms, efficient streaming, and better integration with trainers.

Returns

The data source ready for training, or None: if no mixture is configured

Return type

ShardedDataSource

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json", content_field="text")
>>> elm.set_mixture(use_sharded_source=True)
>>> source = elm.build_sharded_source()
>>> for batch in source.open_shard(source.shard_names[0]):
...     process(batch)

build_teacher_model() → easydel.infra.base_module.EasyDeLBaseModule | None[source]#

Build the teacher model for distillation training.

Loads the teacher model using the same loader configuration as the student model (dtype, sharding, etc.) but with the teacher model path.

Returns: EasyDeLBaseModule instance for the teacher model, or None if no teacher model is configured

Example

>>> elm.set_teacher_model("meta-llama/Llama-2-13b")
>>> teacher = elm.build_teacher_model()
>>> # Teacher model will be used automatically in distillation training

build_tokenizer(force_rebuild: bool = False) → AutoTokenizer[source]#

Build or get the tokenizer for the model.

Loads the tokenizer from the model path or a separately specified tokenizer path. The tokenizer is cached after first build.

Parameters: force_rebuild – Force rebuilding even if tokenizer is already cached. Useful when switching between different tokenizers.
Returns: AutoTokenizer instance configured for the model
Raises: ValueError – If tokenizer path cannot be determined

Example

>>> tokenizer = elm.build_tokenizer()
>>> tokens = tokenizer("Hello world", return_tensors="np")

Build a trainer instance with the configured settings.

Creates and configures a trainer based on the trainer_type setting. Automatically builds required models and datasets if not provided.

Parameters

train_dataset – Training dataset (Dataset or ShardedDataSource). If None, builds from mixture config using get_train_source().
eval_dataset – Evaluation dataset for validation metrics.
reference_model – Reference model for DPO/ORPO. If None, builds from reference_model configuration if present.
reward_model – Reward model for GRPO. If None, builds from config.
teacher_model – Teacher model for distillation. If None, builds from teacher_model configuration if present.
reward_funcs – Custom reward functions for GRPO. Alternative to reward_model.
base_state_class – Custom EasyDeLState class for model state management.
args_class – Custom TrainingArguments class. Auto-selected if None.
trainer_class – Custom Trainer class. Auto-selected if None.
**kwargs – Additional trainer configuration overrides

Returns

Configured trainer instance ready for training

Raises

ValueError – If required models or datasets are not configured

Example

>>> # Build trainer with auto-configuration
>>> trainer = elm.build_trainer()
>>>
>>> # Build trainer with custom dataset
>>> custom_data = load_dataset("custom_data")
>>> trainer = elm.build_trainer(train_dataset=custom_data)
>>>
>>> # Build DPO trainer with custom reference model
>>> ref_model = elm.build_reference_model()
>>> trainer = elm.build_trainer(
...     trainer_type="dpo",
...     reference_model=ref_model
... )

build_training_arguments(args_class: easydel.trainers.training_configurations.TrainingArguments | None = None, **overrides)[source]#

Build TrainingArguments for the configured trainer.

Parameters

args_class – Optional custom TrainingArguments class. If not provided, will automatically select based on trainer_type.
**overrides – Override specific configuration values

Returns

TrainingArguments instance for the configured trainer type (e.g., DPOConfig for DPO training, SFTConfig for SFT)

clear_cache() → None[source]#

Clear cached model, tokenizer, and inference engine instances.

This is useful when you want to reload models with different configurations or free memory after model operations.

property config: ELMConfig#

Get the normalized configuration dictionary.

Returns: The full ELM configuration including model, loader, sharding, quantization, training, and inference settings.

eval(tasks: str | list[str], engine: Union[Literal['esurge', 'auto'], Any] = 'auto', num_fewshot: int = 0, output_path: str | None = None) → dict[str, Any][source]#

Run evaluation on specified tasks using lm-evaluation-harness.

This method provides a unified interface for evaluating models using the eSurge engine with the lm-evaluation-harness framework.

Parameters

tasks – Task name(s) to evaluate on. Can be a single task string or list of tasks. Common tasks include: - Language understanding: “hellaswag”, “winogrande”, “piqa”, “arc_easy”, “arc_challenge” - Math: “gsm8k”, “math”, “minerva_math” - Knowledge: “mmlu”, “triviaqa”, “naturalquestions” - Reasoning: “bbh”, “boolq”, “copa” - Truthfulness: “truthfulqa_mc1”, “truthfulqa_mc2” - Coding: “humaneval”, “mbpp” Full list: EleutherAI/lm-evaluation-harness
engine – Inference engine to use. Options: - “esurge”: Use eSurge engine (high throughput) - “auto”: Automatically select based on configuration (default) - An existing eSurge instance for custom configuration
num_fewshot – Number of few-shot examples to use (default: 0 for zero-shot). Different tasks may have different recommended values: - MMLU: typically 5-shot - GSM8K: typically 8-shot - HellaSwag: typically 0-shot
output_path – Optional path to save evaluation results as JSON. Results include detailed metrics, task versions, and configuration.

Returns

{

“results”: {

task_name: {: metric_name: value, # e.g., “acc”: 0.85, “acc_stderr”: 0.02 …

}, “versions”: {task_name: version_string, …}, “config”: {“model”: …, “num_fewshot”: …, …}

}

Return type

Dictionary containing evaluation results with structure

Example

Basic zero-shot evaluation: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> results = elm.eval(“hellaswag”) >>> print(f”HellaSwag accuracy: {results[‘results’][‘hellaswag’][‘acc’]:.2%}”)

Few-shot evaluation with multiple tasks: >>> elm.set_esurge(max_num_seqs=64, hbm_utilization=0.9) >>> results = elm.eval( … [“gsm8k”, “mmlu”, “truthfulqa_mc1”], … engine=”esurge”, … num_fewshot=5, … output_path=”eval_results.json” … ) >>> for task, metrics in results[“results”].items(): … print(f”{task}: {metrics.get(‘acc’, metrics.get(‘exact_match’)):.2%}”)

Evaluation with custom settings: >>> elm.set_eval(

… max_new_tokens=512,

… temperature=0.0, # Greedy decoding … batch_size=32 … ) >>> results = elm.eval([“humaneval”, “mbpp”])

Raises

ImportError – If lm-eval is not installed (install with: pip install lm-eval)
ValueError – If invalid engine type or model not configured
RuntimeError – If evaluation fails during execution

Note

The evaluation uses settings from set_eval() for generation parameters. Default settings are optimized for deterministic evaluation (temperature=0).

classmethod from_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) → eLargeModel[source]#

Create eLargeModel from JSON configuration file.

Parameters: json_path – Path to JSON configuration file
Returns: eLargeModel instance

classmethod from_pretrained(model_name_or_path: str, task: easydel.infra.factory.TaskType | str | None = None, **kwargs) → eLargeModel[source]#

Create eLargeModel from pretrained model name or path.

Parameters

model_name_or_path – HuggingFace model ID or local path
task – Optional task type (auto-detected if not provided or AUTO_BIND)
**kwargs – Additional configuration options

Returns

eLargeModel instance with configuration

get_base_config(prefer: str = 'base') → dict[str, Any][source]#

Get materialized base configuration.

Resolves the configuration hierarchy, materializing shared base settings across different configuration sections.

Parameters: prefer – Resolution preference when conflicts exist: - “base”: Prefer values from base configuration - “section”: Prefer values from specific sections
Returns: Base configuration dictionary with resolved values

Example

>>> # Get configuration with base values taking precedence
>>> base_config = elm.get_base_config(prefer="base")
>>> print(base_config["dtype"])  # Shows the base dtype setting

get_data_mixture_kwargs() → dict[str, Any][source]#

Get kwargs for DatasetMixture initialization.

Extracts and formats the mixture configuration for use with the DatasetMixture class.

Returns: Dictionary of DatasetMixture arguments including informs, batch_size, streaming settings, and other mixture options

get_esurge_kwargs() → dict[str, Any][source]#

Get kwargs for eSurge initialization.

Extracts and formats the configuration options for creating an eSurge engine instance.

Returns: Dictionary of eSurge arguments including max_model_len, max_num_seqs, hbm_utilization, and other engine settings

Example

>>> kwargs = elm.get_esurge_kwargs()
>>> # Can be used directly:
>>> from easydel.inference import eSurge
>>> engine = eSurge(model, **kwargs)

get_from_pretrained_kwargs() → dict[str, Any][source]#

Get kwargs for model.from_pretrained() calls.

Extracts and formats the configuration options that should be passed to the model’s from_pretrained() method, including dtype, sharding, and quantization settings.

Returns: Dictionary of from_pretrained arguments ready to use with EasyDeL model loading functions

Example

>>> kwargs = elm.get_from_pretrained_kwargs()
>>> # Can be used directly:
>>> model = LlamaForCausalLM.from_pretrained(
...     "meta-llama/Llama-2-7b",
...     **kwargs
... )

get_train_source() → ShardedDataSource | Dataset | None[source]#

Get training data as ShardedDataSource or Dataset.

Automatically selects the appropriate data format based on the use_sharded_source configuration option.

Returns: ShardedDataSource if use_sharded_source=True in mixture config, otherwise HuggingFace Dataset. Returns None if no mixture configured.

Example

>>> elm = eLargeModel()
>>> elm.add_dataset("train.json", dataset_type="json")
>>> elm.set_mixture(use_sharded_source=True)  # Use new pipeline
>>> data = elm.get_train_source()  # Returns ShardedDataSource
>>>
>>> elm.set_mixture(use_sharded_source=False)  # Use legacy pipeline
>>> data = elm.get_train_source()  # Returns HF Dataset

get_trainer_config() → dict[str, Any][source]#

Get normalized trainer configuration.

This method processes the raw trainer configuration and applies defaults and normalization for the specified trainer type.

Returns: Normalized trainer configuration dictionary with all required fields populated with defaults where necessary.

property model_name: str#

Get the model name or path.

Returns: The HuggingFace model ID or local path to the model.

property reference_model_name: str | None#

Get the reference model name or path for DPO/ORPO.

Returns: The reference model path if configured, None otherwise.

set_dtype(dtype: str) → eLargeModel[source]#

Set the data type for model loading.

Configures both the computation dtype and parameter dtype for the model. This affects memory usage and computation speed.

Parameters: dtype – Data type string. Supported values: - “bf16”: BFloat16 (recommended for TPU, modern GPUs) - “fp16”: Float16 (good for older GPUs) - “fp32”: Float32 (highest precision, most memory) - “fp8”: Float8 (experimental, requires compatible hardware)
Returns: Self for method chaining

Example

>>> elm.set_dtype("bf16")  # Use bfloat16 for training/inference

set_esurge(max_model_len: int | None = None, max_num_seqs: int = 16, hbm_utilization: float = 0.85, **kwargs) → eLargeModel[source]#

Configure eSurge inference settings.

eSurge is a high-performance batch inference engine optimized for throughput. It uses PagedAttention for efficient memory management.

Parameters

max_model_len – Maximum model sequence length (input + output tokens). If None, uses model’s default max position embeddings.
max_num_seqs – Maximum number of sequences to process concurrently. Higher values increase throughput but require more memory.
hbm_utilization – HBM memory utilization ratio (0.0-1.0). Controls how much device memory to use for KV cache.
**kwargs – Additional eSurge options: - page_size: PagedAttention page size (default: 128) - enable_prefix_caching: Enable prefix caching optimization - kv_cache_dtype: Dtype for KV cache (None = auto) - decoding_engine: “ring” or “triton” (default: auto)

Returns

Self for method chaining

Example

>>> elm.set_esurge(
...     max_model_len=8192,
...     max_num_seqs=64,
...     hbm_utilization=0.9,
...     enable_prefix_caching=True
... )

set_eval(max_new_tokens: int = 2048, temperature: float = 0.0, top_p: float = 0.95, batch_size: int | None = None, use_tqdm: bool = True, **kwargs) → eLargeModel[source]#

Configure evaluation settings for lm-evaluation-harness.

Sets default parameters for model evaluation on standard benchmarks. These settings apply when using the eval() method.

Parameters

max_new_tokens – Maximum tokens to generate per evaluation sample (default: 2048). Lower values speed up evaluation.
temperature – Sampling temperature (default: 0.0 for greedy decoding). 0.0 = deterministic/greedy, higher = more random.
top_p – Top-p (nucleus) sampling parameter (default: 0.95). Only used when temperature > 0.
batch_size – Evaluation batch size (default: engine-specific). Higher values increase throughput but use more memory.
use_tqdm – Show progress bar during evaluation (default: True)
**kwargs – Additional evaluation options: - top_k: Top-k sampling parameter - repetition_penalty: Penalty for repeated tokens - num_beams: Beam search width (1 = greedy) - do_sample: Whether to use sampling - early_stopping: Stop generation at first EOS

Returns

Self for method chaining

Example

>>> # Configure for deterministic evaluation
>>> elm.set_eval(
...     max_new_tokens=512,
...     temperature=0.0,
...     batch_size=64
... )
>>>
>>> # Configure for sampling-based evaluation
>>> elm.set_eval(
...     temperature=0.7,
...     top_p=0.9,
...     top_k=50
... )

set_mixture(informs: list[dict] | None = None, batch_size: int = 32, streaming: bool = True, use_fast_loader: bool = True, **kwargs) → eLargeModel[source]#

Configure data mixture settings for training/evaluation.

Sets up a mixture of datasets that can be combined and sampled from during training. Supports multiple data sources and formats.

Parameters

informs – List of dataset configurations. Each dict should contain: - type: Dataset type (“json”, “parquet”, “csv”, “text”, or HF dataset ID) - data_files: Path or pattern to data files - content_field: Field name containing the text content - split: Dataset split to use (default: “train”) - weight: Sampling weight for this dataset (optional)
batch_size – Batch size for data loading (default: 32)
streaming – Use streaming mode for large datasets (default: True). Reduces memory usage but may be slower.
use_fast_loader – Enable fast loading with fsspec (default: True). Provides optimized loading for remote/cloud storage.
**kwargs – Additional mixture options: - max_samples: Maximum samples per dataset - shuffle: Whether to shuffle data - seed: Random seed for shuffling

Returns

Self for method chaining

Example

>>> elm.set_mixture(
...     informs=[
...         {"type": "json", "data_files": "train.json", "content_field": "text", "weight": 0.7},
...         {"type": "parquet", "data_files": "valid/*.parquet", "content_field": "content", "weight": 0.3}
...     ],
...     batch_size=32,
...     streaming=True,
...     shuffle=True,
...     seed=42
... )

set_model(model_name_or_path: str) → eLargeModel[source]#

Set the model name or path.

Updates the primary model configuration. This will clear any cached model instance to ensure the new model is loaded on next build.

Parameters: model_name_or_path – HuggingFace model ID (e.g., “meta-llama/Llama-2-7b”) or local path to model directory
Returns: Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b-hf")
>>> elm.set_model("/path/to/local/model")

set_operation_configs(configs: collections.abc.Mapping[str, Any] | None = None, **kwargs) → eLargeModel[source]#

Configure ejkernel operation overrides.

Allows overriding ejkernel’s autotune behavior for specific attention operations by providing explicit configuration objects. When a config is provided, it’s passed directly to the operation instead of using ejkernel’s autotune.

Parameters

configs – Dictionary mapping operation names to config objects. Valid operation names (must match OperationRegistry): - “flash_attn2”: Flash attention 2 - “ring”: Ring attention - “blocksparse”: Block sparse attention - “ragged_page_attention_v2”: Ragged page attention v2 - “ragged_page_attention_v3”: Ragged page attention v3 - “sdpa”: Scaled dot product attention - “vanilla”: Vanilla attention
**kwargs – Individual operation configs as keyword arguments. These are merged with the configs dict.

Returns

Self for method chaining

Example

>>> from easydel import FlashAttentionConfig, RingAttentionConfig
>>>
>>> # Using dict
>>> elm.set_operation_configs({
...     "flash_attn2": FlashAttentionConfig(platform="triton"),
...     "ring": RingAttentionConfig(),
... })
>>>
>>> # Using kwargs
>>> elm.set_operation_configs(
...     flash_attn2=FlashAttentionConfig(platform="pallas"),
... )

set_quantization(method: str | None = None, block_size: int = 128, **kwargs) → eLargeModel[source]#

Configure quantization settings.

Enables model quantization to reduce memory usage and potentially improve inference speed at the cost of some accuracy.

Parameters

method – Quantization method.
block_size – Quantization block size (default: 128). Smaller blocks = better accuracy but more overhead.
**kwargs – Additional quantization options: - platform: Target platform (“cpu”, “cuda”, “tpu”) - compute_dtype: Dtype for computation (e.g., “fp16”) - double_quant: Enable double quantization for 4bit

Returns

Self for method chaining

Example

>>> elm.set_quantization("nf4", block_size=64)
>>> elm.set_quantization("a8bit")

set_reference_model(model_name_or_path: str) → eLargeModel[source]#

Set the reference model name or path for preference optimization.

Configures a reference model used in DPO (Direct Preference Optimization) and similar preference-based training methods. The reference model provides a baseline for computing preference losses.

Parameters: model_name_or_path – HuggingFace model ID or local path for reference model. Often the same as the base model before fine-tuning.
Returns: Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b-hf")  # Model to train
>>> elm.set_reference_model("meta-llama/Llama-2-7b-hf")  # Reference
>>> elm.set_trainer("dpo", beta=0.1)

set_sharding(axis_dims: tuple[int, ...] | None = None, axis_names: tuple[str, ...] | None = None, **kwargs) → eLargeModel[source]#

Configure model sharding for distributed training/inference.

Sets up model parallelism by specifying how to shard model parameters and computations across devices. Essential for training large models that don’t fit on a single device.

Parameters

axis_dims – Sharding axis dimensions as a tuple. Common patterns: - (1, 1, 1, -1): Data parallel only - (2, 1, 1, -1): 2-way tensor parallel - (1, 2, 1, -1): 2-way pipeline parallel - (2, 2, 1, -1): 2-way tensor + 2-way pipeline parallel
axis_names – Sharding axis names (e.g., (“dp”, “tp”, “pp”, “sp”)) - dp: Data parallel - tp: Tensor parallel - pp: Pipeline parallel - sp: Sequence parallel
**kwargs – Additional sharding options.

Returns

Self for method chaining

Example

>>> # 2-way tensor parallel, 2-way data parallel
>>> elm.set_sharding(
...     axis_dims=(2, 2, 1, -1),
...     axis_names=("dp", "tp", "pp", "sp")
... )

set_teacher_model(model_name_or_path: str) → eLargeModel[source]#

Set the teacher model name or path for distillation training.

Configures a teacher model used for knowledge distillation. The teacher model is typically a larger, more capable model that guides the training of the student (primary) model.

Parameters: model_name_or_path – HuggingFace model ID or local path for teacher model. Should be a model compatible with the student model’s architecture.
Returns: Self for method chaining

Example

>>> elm.set_model("meta-llama/Llama-2-7b")  # Student model
>>> elm.set_teacher_model("meta-llama/Llama-2-13b")  # Teacher model
>>> elm.set_trainer("distillation", temperature=3.0)

set_trainer(trainer_type: str, **kwargs) → eLargeModel[source]#

Configure trainer settings.

Sets the training paradigm and associated hyperparameters.

Parameters

trainer_type – Type of trainer to use: - “sft”: Supervised Fine-Tuning - “dpo”: Direct Preference Optimization - “orpo”: Odds Ratio Preference Optimization - “grpo”: Group Relative Policy Optimization - “reward”: Reward model training - “distillation”: Knowledge distillation - “base”: Basic trainer for custom training loops
**kwargs – Trainer-specific configuration options: Common options: - learning_rate: Learning rate (default: 5e-5) - num_train_epochs: Number of training epochs - per_device_train_batch_size: Batch size per device - gradient_accumulation_steps: Gradient accumulation steps - warmup_steps: Number of warmup steps - output_dir: Directory to save checkpoints DPO-specific: - beta: KL regularization coefficient - loss_type: “sigmoid”, “ipo”, “hinge” Distillation-specific: - temperature: Distillation temperature - alpha: Weight for distillation loss

Returns

Self for method chaining

Example

>>> # SFT training
>>> elm.set_trainer(
...     "sft",
...     learning_rate=2e-5,
...     num_train_epochs=3,
...     per_device_train_batch_size=4
... )
>>>
>>> # DPO training
>>> elm.set_trainer(
...     "dpo",
...     beta=0.1,
...     learning_rate=1e-6
... )

property task: TaskType#

Get the resolved task type.

Returns: The task type (e.g., TaskType.CAUSAL_LM) either explicitly configured or auto-detected from the model.

property teacher_model_name: str | None#

Get the teacher model name or path for distillation.

Returns: The teacher model path if configured, None otherwise.

to_dict() → dict[str, Any][source]#

Get configuration as dictionary.

Returns a copy of the full configuration dictionary that can be modified without affecting the eLargeModel instance.

Returns: Configuration dictionary with all settings

Example

>>> config_dict = elm.to_dict()
>>> print(config_dict["model"]["name_or_path"])
>>> # Modify the dict without affecting elm
>>> config_dict["model"]["dtype"] = "fp16"

to_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) → None[source]#

Save configuration to JSON file.

Exports the current configuration to a JSON file that can be loaded later with from_json() or shared with others.

Parameters: json_path – Path where the JSON configuration file will be saved. Will create parent directories if they don’t exist.

Example

>>> elm.to_json("config.json")
>>> # Later or on another machine:
>>> elm2 = eLargeModel.from_json("config.json")

Train the model with the configured settings.

This is a high-level convenience method that orchestrates the entire training pipeline: 1. Validates configuration 2. Builds the model if not already built 3. Creates the dataset from mixture configuration if not provided 4. Builds reference/teacher models if needed 5. Creates the appropriate trainer 6. Runs training and returns results

Args:

train_dataset: Optional training dataset (Dataset or ShardedDataSource).
If None, will build from mixture configuration.

eval_dataset: Optional evaluation dataset for validation during training. base_state_class: Optional custom EasyDeLState class for model state

management. Use for custom model implementations.

args_class: Optional custom TrainingArguments class. If None, will
auto-select based on trainer_type.

trainer_class: Optional custom Trainer class. If None, will auto-select
based on trainer_type.

**build_kwargs: Additional kwargs for trainer building:

data_collator: Custom data collator function

formatting_func: Function to format examples (SFT)

reward_processing_classes: Processing classes for rewards (GRPO)

data_tokenize_fn: Custom tokenization function

reference_model: Override reference model

reward_model: Override reward model

teacher_model: Override teacher model

reward_funcs: Custom reward functions

Returns:
Training results from the trainer, including metrics and final model state

Example:
Basic SFT training: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> elm.add_dataset(“train.json”, dataset_type=”json”) >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> results = elm.train()

DPO training with custom datasets: >>> train_data = load_dataset(“preference_data”, split=”train”) >>> eval_data = load_dataset(“preference_data”, split=”test”) >>> elm.set_trainer(“dpo”, beta=0.1) >>> elm.set_reference_model(“meta-llama/Llama-2-7b”) >>> results = elm.train(train_dataset=train_data, eval_dataset=eval_data)

Custom trainer with formatting function: >>> def format_fn(examples): … return [f”Question: {q}

Answer: {a}”: … for q, a in zip(examples[“question”], examples[“answer”])] >>> results = elm.train(formatting_func=format_fn)

update_config(updates: Mapping[str, Any]) → eLargeModel[source]#

Update configuration with new values.

Performs a deep merge of the updates into the existing configuration, preserving nested structures. The configuration is re-normalized after updating to ensure consistency.

Parameters: updates – Dictionary with configuration updates. Can include nested structures like {“model”: {“dtype”: “bf16”}, “esurge”: {“max_num_seqs”: 32}}
Returns: Self for method chaining

Example

>>> elm.update_config({
...     "loader": {"dtype": "bf16"},
...     "esurge": {"max_model_len": 4096}
... })

validate() → None[source]#

Validate the current configuration.

Checks that all required fields are present and have valid values. This is automatically called before training or building engines.

Raises: ValueError – If configuration is invalid (e.g., missing model name, invalid dtype, incompatible settings)

easydel.infra.elarge_model.elarge_model

Contents

easydel.infra.elarge_model.elarge_model#