easydel.infra.elarge_model.elarge_model#
eLargeModel - Easy Large Models master class for EasyDeL.
This module provides a unified interface for working with large language models in the EasyDeL framework, combining configuration management, model building, training orchestration, and inference engine initialization.
- Key Features:
Unified configuration management for models, training, and inference
Automatic model and tokenizer initialization from HuggingFace or local paths
Support for multiple training paradigms (SFT, DPO, ORPO, GRPO, distillation)
Integration with the eSurge inference engine
Built-in evaluation with lm-evaluation-harness
Flexible dataset mixture configuration
Model sharding and quantization support
- class easydel.infra.elarge_model.elarge_model.BuildTrainerKws[source]#
Bases:
TypedDictType hints for optional keyword arguments when building trainers.
- data_collator#
Custom data collator for batching examples
- Type
Callable
- formatting_func#
Function to format examples for SFT training
- Type
Callable
- reward_processing_classes#
Processing classes for reward models in GRPO
- Type
list[Callable]
- data_tokenize_fn#
Custom tokenization function for data preprocessing
- Type
Callable
- reference_model#
Reference model for DPO/preference optimization
- Type
- reward_model#
Reward model for GRPO training
- Type
- teacher_model#
Teacher model for distillation training
- Type
- reward_funcs#
Custom reward functions for GRPO
- Type
Any | None
- data_collator: Callable#
- data_tokenize_fn: Callable#
- formatting_func: Callable#
- reference_model: easydel.infra.base_module.EasyDeLBaseModule | None#
- reward_model: easydel.infra.base_module.EasyDeLBaseModule | None#
- reward_processing_classes: list[Callable]#
- teacher_model: easydel.infra.base_module.EasyDeLBaseModule | None#
- class easydel.infra.elarge_model.elarge_model.eLargeModel(config: easydel.infra.elarge_model.types.ELMConfig | collections.abc.Mapping[str, Any] | str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath | None = None)[source]#
Bases:
objectMaster class for Easy Large Models (ELM) in EasyDeL.
This class provides a unified interface for: - Configuration management (load, save, create) - Model building and initialization (including teacher/reference models) - Training orchestration with multiple paradigms (SFT, DPO, ORPO, etc.) - eSurge inference engine integration - Tokenizer management - Dataset mixture configuration - Model evaluation with lm-evaluation-harness
- config#
The normalized ELM configuration dictionary
- model_name#
The model name or path from configuration
- task#
The resolved task type (auto-detected or specified)
- teacher_model_name#
Teacher model name for distillation (if configured)
- reference_model_name#
Reference model name for DPO/ORPO (if configured)
Example
Basic model loading: >>> elm = eLargeModel({“model”: {“name_or_path”: “meta-llama/Llama-2-7b”}}) >>> model = elm.build_model()
From pretrained with configuration: >>> elm = eLargeModel.from_pretrained( … “meta-llama/Llama-2-7b”, … task=”causal-lm” … ) >>> elm.set_dtype(“bf16”) >>> elm.set_sharding(axis_dims=(1, 2, 1, -1))
Loading from JSON configuration: >>> elm = eLargeModel.from_json(“config.json”) >>> esurge_engine = elm.build_esurge()
Training with SFT: >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> elm.add_dataset(“train.json”, dataset_type=”json”, content_field=”text”) >>> results = elm.train()
Evaluation: >>> results = elm.eval([“hellaswag”, “mmlu”], engine=”esurge”)
- add_dataset(data_files: str | list[str], dataset_type: str | None = None, content_field: str = 'content', split: str = 'train', **kwargs) eLargeModel[source]#
Add a dataset to the mixture configuration.
Appends a new dataset to the existing mixture. Multiple datasets can be added and will be combined during training.
- Parameters
data_files – Path(s) to data files. Can be: - Single file: “data.json” - Multiple files: [“data1.json”, “data2.json”] - Glob pattern: “data/*.parquet” - Remote URL: “https://example.com/data.json”
dataset_type – Dataset type or format. Options: - File formats: “json”, “jsonl”, “parquet”, “csv”, “text” - HuggingFace dataset ID: “imdb”, “squad”, etc. - None: Auto-detect from file extension
content_field – Field name containing the text content (default: “content”). For chat data, might be “messages” or “conversations”.
split – Dataset split to use (default: “train”). Common values: “train”, “validation”, “test”.
**kwargs – Additional dataset options: - weight: Sampling weight for this dataset - max_samples: Maximum samples to use - filter_fn: Function to filter samples - map_fn: Function to transform samples
- Returns
Self for method chaining
Example
>>> # Add a JSON dataset >>> elm.add_dataset("train.json", dataset_type="json", content_field="text") >>> >>> # Add a HuggingFace dataset >>> elm.add_dataset("imdb", dataset_type="imdb", split="train") >>> >>> # Add multiple Parquet files with sampling weight >>> elm.add_dataset( ... "data/*.parquet", ... dataset_type="parquet", ... content_field="content", ... weight=0.5 ... )
- build_dataset()[source]#
Build dataset from mixture configuration.
Creates a dataset from the configured mixture of data sources. Supports multiple formats (JSON, Parquet, CSV) and can combine multiple data sources into a single dataset.
- Returns
- The loaded and processed dataset ready for training,
or None if no mixture is configured
- Return type
Dataset
Example
>>> elm = eLargeModel() >>> elm.add_dataset("train.json", dataset_type="json", content_field="text") >>> elm.add_dataset("valid/*.parquet", dataset_type="parquet", content_field="content") >>> dataset = elm.build_dataset() >>> print(f"Dataset size: {len(dataset)}")
- build_esurge() eSurge[source]#
Build the eSurge inference engine.
Creates an eSurge engine instance configured with the current settings. Automatically builds the model if not already built.
- Returns
eSurge instance ready for batch inference
Example
>>> elm.set_esurge(max_num_seqs=32, hbm_utilization=0.9) >>> engine = elm.build_esurge() >>> # Use engine for batch inference >>> results = engine.generate(prompts, max_tokens=100)
- build_model(force_rebuild: bool = False) EasyDeLBaseModule[source]#
Build the EasyDeL model from configuration.
Loads the model using the configured settings including dtype, sharding, and quantization. The model is cached after first build unless force_rebuild is True.
- Parameters
force_rebuild – Force rebuilding even if model is already cached. Useful when configuration has changed.
- Returns
EasyDeLBaseModule instance ready for training or inference
- Raises
ValueError – If model name/path is not set
RuntimeError – If model loading fails
Example
>>> elm = eLargeModel.from_pretrained("meta-llama/Llama-2-7b") >>> elm.set_dtype("bf16") >>> model = elm.build_model()
- build_reference_model() easydel.infra.base_module.EasyDeLBaseModule | None[source]#
Build the reference model for preference optimization (DPO, etc.).
Loads the reference model using the same loader configuration as the primary model. The reference model provides a baseline for computing preference losses in DPO, ORPO, and similar methods.
- Returns
EasyDeLBaseModule instance for the reference model, or None if no reference model is configured
Example
>>> elm.set_reference_model("meta-llama/Llama-2-7b-hf") >>> reference = elm.build_reference_model() >>> # Reference model will be used automatically in DPO training
- build_sharded_source() ShardedDataSource | None[source]#
Build dataset as ShardedDataSource for use with new data pipeline.
Creates a ShardedDataSource from the configured mixture of data sources. This uses the new data architecture that supports lazy transforms, efficient streaming, and better integration with trainers.
- Returns
- The data source ready for training, or None
if no mixture is configured
- Return type
ShardedDataSource
Example
>>> elm = eLargeModel() >>> elm.add_dataset("train.json", dataset_type="json", content_field="text") >>> elm.set_mixture(use_sharded_source=True) >>> source = elm.build_sharded_source() >>> for batch in source.open_shard(source.shard_names[0]): ... process(batch)
- build_teacher_model() easydel.infra.base_module.EasyDeLBaseModule | None[source]#
Build the teacher model for distillation training.
Loads the teacher model using the same loader configuration as the student model (dtype, sharding, etc.) but with the teacher model path.
- Returns
EasyDeLBaseModule instance for the teacher model, or None if no teacher model is configured
Example
>>> elm.set_teacher_model("meta-llama/Llama-2-13b") >>> teacher = elm.build_teacher_model() >>> # Teacher model will be used automatically in distillation training
- build_tokenizer(force_rebuild: bool = False) AutoTokenizer[source]#
Build or get the tokenizer for the model.
Loads the tokenizer from the model path or a separately specified tokenizer path. The tokenizer is cached after first build.
- Parameters
force_rebuild – Force rebuilding even if tokenizer is already cached. Useful when switching between different tokenizers.
- Returns
AutoTokenizer instance configured for the model
- Raises
ValueError – If tokenizer path cannot be determined
Example
>>> tokenizer = elm.build_tokenizer() >>> tokens = tokenizer("Hello world", return_tensors="np")
- build_trainer(train_dataset: Dataset | ShardedDataSource | None = None, eval_dataset: Dataset | ShardedDataSource | None = None, reference_model: EasyDeLBaseModule | None = None, reward_model: EasyDeLBaseModule | None = None, teacher_model: EasyDeLBaseModule | None = None, reward_funcs: Any | None = None, base_state_class: type[EasyDeLState] | None = None, args_class: type[TrainingArguments] | None = None, trainer_class: type[Trainer] | None = None, **kwargs) Trainer[source]#
Build a trainer instance with the configured settings.
Creates and configures a trainer based on the trainer_type setting. Automatically builds required models and datasets if not provided.
- Parameters
train_dataset – Training dataset (Dataset or ShardedDataSource). If None, builds from mixture config using get_train_source().
eval_dataset – Evaluation dataset for validation metrics.
reference_model – Reference model for DPO/ORPO. If None, builds from reference_model configuration if present.
reward_model – Reward model for GRPO. If None, builds from config.
teacher_model – Teacher model for distillation. If None, builds from teacher_model configuration if present.
reward_funcs – Custom reward functions for GRPO. Alternative to reward_model.
base_state_class – Custom EasyDeLState class for model state management.
args_class – Custom TrainingArguments class. Auto-selected if None.
trainer_class – Custom Trainer class. Auto-selected if None.
**kwargs – Additional trainer configuration overrides
- Returns
Configured trainer instance ready for training
- Raises
ValueError – If required models or datasets are not configured
Example
>>> # Build trainer with auto-configuration >>> trainer = elm.build_trainer() >>> >>> # Build trainer with custom dataset >>> custom_data = load_dataset("custom_data") >>> trainer = elm.build_trainer(train_dataset=custom_data) >>> >>> # Build DPO trainer with custom reference model >>> ref_model = elm.build_reference_model() >>> trainer = elm.build_trainer( ... trainer_type="dpo", ... reference_model=ref_model ... )
- build_training_arguments(args_class: easydel.trainers.training_configurations.TrainingArguments | None = None, **overrides)[source]#
Build TrainingArguments for the configured trainer.
- Parameters
args_class – Optional custom TrainingArguments class. If not provided, will automatically select based on trainer_type.
**overrides – Override specific configuration values
- Returns
TrainingArguments instance for the configured trainer type (e.g., DPOConfig for DPO training, SFTConfig for SFT)
- clear_cache() None[source]#
Clear cached model, tokenizer, and inference engine instances.
This is useful when you want to reload models with different configurations or free memory after model operations.
- property config: ELMConfig#
Get the normalized configuration dictionary.
- Returns
The full ELM configuration including model, loader, sharding, quantization, training, and inference settings.
- eval(tasks: str | list[str], engine: Union[Literal['esurge', 'auto'], Any] = 'auto', num_fewshot: int = 0, output_path: str | None = None) dict[str, Any][source]#
Run evaluation on specified tasks using lm-evaluation-harness.
This method provides a unified interface for evaluating models using the eSurge engine with the lm-evaluation-harness framework.
- Parameters
tasks – Task name(s) to evaluate on. Can be a single task string or list of tasks. Common tasks include: - Language understanding: “hellaswag”, “winogrande”, “piqa”, “arc_easy”, “arc_challenge” - Math: “gsm8k”, “math”, “minerva_math” - Knowledge: “mmlu”, “triviaqa”, “naturalquestions” - Reasoning: “bbh”, “boolq”, “copa” - Truthfulness: “truthfulqa_mc1”, “truthfulqa_mc2” - Coding: “humaneval”, “mbpp” Full list: EleutherAI/lm-evaluation-harness
engine – Inference engine to use. Options: - “esurge”: Use eSurge engine (high throughput) - “auto”: Automatically select based on configuration (default) - An existing eSurge instance for custom configuration
num_fewshot – Number of few-shot examples to use (default: 0 for zero-shot). Different tasks may have different recommended values: - MMLU: typically 5-shot - GSM8K: typically 8-shot - HellaSwag: typically 0-shot
output_path – Optional path to save evaluation results as JSON. Results include detailed metrics, task versions, and configuration.
- Returns
- {
- “results”: {
- task_name: {
metric_name: value, # e.g., “acc”: 0.85, “acc_stderr”: 0.02 …
}, “versions”: {task_name: version_string, …}, “config”: {“model”: …, “num_fewshot”: …, …}
}
- Return type
Dictionary containing evaluation results with structure
Example
Basic zero-shot evaluation: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> results = elm.eval(“hellaswag”) >>> print(f”HellaSwag accuracy: {results[‘results’][‘hellaswag’][‘acc’]:.2%}”)
Few-shot evaluation with multiple tasks: >>> elm.set_esurge(max_num_seqs=64, hbm_utilization=0.9) >>> results = elm.eval( … [“gsm8k”, “mmlu”, “truthfulqa_mc1”], … engine=”esurge”, … num_fewshot=5, … output_path=”eval_results.json” … ) >>> for task, metrics in results[“results”].items(): … print(f”{task}: {metrics.get(‘acc’, metrics.get(‘exact_match’)):.2%}”)
Evaluation with custom settings: >>> elm.set_eval(
… max_new_tokens=512,
… temperature=0.0, # Greedy decoding … batch_size=32 … ) >>> results = elm.eval([“humaneval”, “mbpp”])
- Raises
ImportError – If lm-eval is not installed (install with: pip install lm-eval)
ValueError – If invalid engine type or model not configured
RuntimeError – If evaluation fails during execution
Note
The evaluation uses settings from set_eval() for generation parameters. Default settings are optimized for deterministic evaluation (temperature=0).
- classmethod from_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) eLargeModel[source]#
Create eLargeModel from JSON configuration file.
- Parameters
json_path – Path to JSON configuration file
- Returns
eLargeModel instance
- classmethod from_pretrained(model_name_or_path: str, task: easydel.infra.factory.TaskType | str | None = None, **kwargs) eLargeModel[source]#
Create eLargeModel from pretrained model name or path.
- Parameters
model_name_or_path – HuggingFace model ID or local path
task – Optional task type (auto-detected if not provided or AUTO_BIND)
**kwargs – Additional configuration options
- Returns
eLargeModel instance with configuration
- get_base_config(prefer: str = 'base') dict[str, Any][source]#
Get materialized base configuration.
Resolves the configuration hierarchy, materializing shared base settings across different configuration sections.
- Parameters
prefer – Resolution preference when conflicts exist: - “base”: Prefer values from base configuration - “section”: Prefer values from specific sections
- Returns
Base configuration dictionary with resolved values
Example
>>> # Get configuration with base values taking precedence >>> base_config = elm.get_base_config(prefer="base") >>> print(base_config["dtype"]) # Shows the base dtype setting
- get_data_mixture_kwargs() dict[str, Any][source]#
Get kwargs for DatasetMixture initialization.
Extracts and formats the mixture configuration for use with the DatasetMixture class.
- Returns
Dictionary of DatasetMixture arguments including informs, batch_size, streaming settings, and other mixture options
- get_esurge_kwargs() dict[str, Any][source]#
Get kwargs for eSurge initialization.
Extracts and formats the configuration options for creating an eSurge engine instance.
- Returns
Dictionary of eSurge arguments including max_model_len, max_num_seqs, hbm_utilization, and other engine settings
Example
>>> kwargs = elm.get_esurge_kwargs() >>> # Can be used directly: >>> from easydel.inference import eSurge >>> engine = eSurge(model, **kwargs)
- get_from_pretrained_kwargs() dict[str, Any][source]#
Get kwargs for model.from_pretrained() calls.
Extracts and formats the configuration options that should be passed to the model’s from_pretrained() method, including dtype, sharding, and quantization settings.
- Returns
Dictionary of from_pretrained arguments ready to use with EasyDeL model loading functions
Example
>>> kwargs = elm.get_from_pretrained_kwargs() >>> # Can be used directly: >>> model = LlamaForCausalLM.from_pretrained( ... "meta-llama/Llama-2-7b", ... **kwargs ... )
- get_train_source() ShardedDataSource | Dataset | None[source]#
Get training data as ShardedDataSource or Dataset.
Automatically selects the appropriate data format based on the use_sharded_source configuration option.
- Returns
ShardedDataSource if use_sharded_source=True in mixture config, otherwise HuggingFace Dataset. Returns None if no mixture configured.
Example
>>> elm = eLargeModel() >>> elm.add_dataset("train.json", dataset_type="json") >>> elm.set_mixture(use_sharded_source=True) # Use new pipeline >>> data = elm.get_train_source() # Returns ShardedDataSource >>> >>> elm.set_mixture(use_sharded_source=False) # Use legacy pipeline >>> data = elm.get_train_source() # Returns HF Dataset
- get_trainer_config() dict[str, Any][source]#
Get normalized trainer configuration.
This method processes the raw trainer configuration and applies defaults and normalization for the specified trainer type.
- Returns
Normalized trainer configuration dictionary with all required fields populated with defaults where necessary.
- property model_name: str#
Get the model name or path.
- Returns
The HuggingFace model ID or local path to the model.
- property reference_model_name: str | None#
Get the reference model name or path for DPO/ORPO.
- Returns
The reference model path if configured, None otherwise.
- set_dtype(dtype: str) eLargeModel[source]#
Set the data type for model loading.
Configures both the computation dtype and parameter dtype for the model. This affects memory usage and computation speed.
- Parameters
dtype – Data type string. Supported values: - “bf16”: BFloat16 (recommended for TPU, modern GPUs) - “fp16”: Float16 (good for older GPUs) - “fp32”: Float32 (highest precision, most memory) - “fp8”: Float8 (experimental, requires compatible hardware)
- Returns
Self for method chaining
Example
>>> elm.set_dtype("bf16") # Use bfloat16 for training/inference
- set_esurge(max_model_len: int | None = None, max_num_seqs: int = 16, hbm_utilization: float = 0.85, **kwargs) eLargeModel[source]#
Configure eSurge inference settings.
eSurge is a high-performance batch inference engine optimized for throughput. It uses PagedAttention for efficient memory management.
- Parameters
max_model_len – Maximum model sequence length (input + output tokens). If None, uses model’s default max position embeddings.
max_num_seqs – Maximum number of sequences to process concurrently. Higher values increase throughput but require more memory.
hbm_utilization – HBM memory utilization ratio (0.0-1.0). Controls how much device memory to use for KV cache.
**kwargs – Additional eSurge options: - page_size: PagedAttention page size (default: 128) - enable_prefix_caching: Enable prefix caching optimization - kv_cache_dtype: Dtype for KV cache (None = auto) - decoding_engine: “ring” or “triton” (default: auto)
- Returns
Self for method chaining
Example
>>> elm.set_esurge( ... max_model_len=8192, ... max_num_seqs=64, ... hbm_utilization=0.9, ... enable_prefix_caching=True ... )
- set_eval(max_new_tokens: int = 2048, temperature: float = 0.0, top_p: float = 0.95, batch_size: int | None = None, use_tqdm: bool = True, **kwargs) eLargeModel[source]#
Configure evaluation settings for lm-evaluation-harness.
Sets default parameters for model evaluation on standard benchmarks. These settings apply when using the eval() method.
- Parameters
max_new_tokens – Maximum tokens to generate per evaluation sample (default: 2048). Lower values speed up evaluation.
temperature – Sampling temperature (default: 0.0 for greedy decoding). 0.0 = deterministic/greedy, higher = more random.
top_p – Top-p (nucleus) sampling parameter (default: 0.95). Only used when temperature > 0.
batch_size – Evaluation batch size (default: engine-specific). Higher values increase throughput but use more memory.
use_tqdm – Show progress bar during evaluation (default: True)
**kwargs – Additional evaluation options: - top_k: Top-k sampling parameter - repetition_penalty: Penalty for repeated tokens - num_beams: Beam search width (1 = greedy) - do_sample: Whether to use sampling - early_stopping: Stop generation at first EOS
- Returns
Self for method chaining
Example
>>> # Configure for deterministic evaluation >>> elm.set_eval( ... max_new_tokens=512, ... temperature=0.0, ... batch_size=64 ... ) >>> >>> # Configure for sampling-based evaluation >>> elm.set_eval( ... temperature=0.7, ... top_p=0.9, ... top_k=50 ... )
- set_mixture(informs: list[dict] | None = None, batch_size: int = 32, streaming: bool = True, use_fast_loader: bool = True, **kwargs) eLargeModel[source]#
Configure data mixture settings for training/evaluation.
Sets up a mixture of datasets that can be combined and sampled from during training. Supports multiple data sources and formats.
- Parameters
informs – List of dataset configurations. Each dict should contain: - type: Dataset type (“json”, “parquet”, “csv”, “text”, or HF dataset ID) - data_files: Path or pattern to data files - content_field: Field name containing the text content - split: Dataset split to use (default: “train”) - weight: Sampling weight for this dataset (optional)
batch_size – Batch size for data loading (default: 32)
streaming – Use streaming mode for large datasets (default: True). Reduces memory usage but may be slower.
use_fast_loader – Enable fast loading with fsspec (default: True). Provides optimized loading for remote/cloud storage.
**kwargs – Additional mixture options: - max_samples: Maximum samples per dataset - shuffle: Whether to shuffle data - seed: Random seed for shuffling
- Returns
Self for method chaining
Example
>>> elm.set_mixture( ... informs=[ ... {"type": "json", "data_files": "train.json", "content_field": "text", "weight": 0.7}, ... {"type": "parquet", "data_files": "valid/*.parquet", "content_field": "content", "weight": 0.3} ... ], ... batch_size=32, ... streaming=True, ... shuffle=True, ... seed=42 ... )
- set_model(model_name_or_path: str) eLargeModel[source]#
Set the model name or path.
Updates the primary model configuration. This will clear any cached model instance to ensure the new model is loaded on next build.
- Parameters
model_name_or_path – HuggingFace model ID (e.g., “meta-llama/Llama-2-7b”) or local path to model directory
- Returns
Self for method chaining
Example
>>> elm.set_model("meta-llama/Llama-2-7b-hf") >>> elm.set_model("/path/to/local/model")
- set_operation_configs(configs: collections.abc.Mapping[str, Any] | None = None, **kwargs) eLargeModel[source]#
Configure ejkernel operation overrides.
Allows overriding ejkernel’s autotune behavior for specific attention operations by providing explicit configuration objects. When a config is provided, it’s passed directly to the operation instead of using ejkernel’s autotune.
- Parameters
configs – Dictionary mapping operation names to config objects. Valid operation names (must match OperationRegistry): - “flash_attn2”: Flash attention 2 - “ring”: Ring attention - “blocksparse”: Block sparse attention - “ragged_page_attention_v2”: Ragged page attention v2 - “ragged_page_attention_v3”: Ragged page attention v3 - “sdpa”: Scaled dot product attention - “vanilla”: Vanilla attention
**kwargs – Individual operation configs as keyword arguments. These are merged with the configs dict.
- Returns
Self for method chaining
Example
>>> from easydel import FlashAttentionConfig, RingAttentionConfig >>> >>> # Using dict >>> elm.set_operation_configs({ ... "flash_attn2": FlashAttentionConfig(platform="triton"), ... "ring": RingAttentionConfig(), ... }) >>> >>> # Using kwargs >>> elm.set_operation_configs( ... flash_attn2=FlashAttentionConfig(platform="pallas"), ... )
- set_quantization(method: str | None = None, block_size: int = 128, **kwargs) eLargeModel[source]#
Configure quantization settings.
Enables model quantization to reduce memory usage and potentially improve inference speed at the cost of some accuracy.
- Parameters
method – Quantization method.
block_size – Quantization block size (default: 128). Smaller blocks = better accuracy but more overhead.
**kwargs – Additional quantization options: - platform: Target platform (“cpu”, “cuda”, “tpu”) - compute_dtype: Dtype for computation (e.g., “fp16”) - double_quant: Enable double quantization for 4bit
- Returns
Self for method chaining
Example
>>> elm.set_quantization("nf4", block_size=64) >>> elm.set_quantization("a8bit")
- set_reference_model(model_name_or_path: str) eLargeModel[source]#
Set the reference model name or path for preference optimization.
Configures a reference model used in DPO (Direct Preference Optimization) and similar preference-based training methods. The reference model provides a baseline for computing preference losses.
- Parameters
model_name_or_path – HuggingFace model ID or local path for reference model. Often the same as the base model before fine-tuning.
- Returns
Self for method chaining
Example
>>> elm.set_model("meta-llama/Llama-2-7b-hf") # Model to train >>> elm.set_reference_model("meta-llama/Llama-2-7b-hf") # Reference >>> elm.set_trainer("dpo", beta=0.1)
- set_sharding(axis_dims: tuple[int, ...] | None = None, axis_names: tuple[str, ...] | None = None, **kwargs) eLargeModel[source]#
Configure model sharding for distributed training/inference.
Sets up model parallelism by specifying how to shard model parameters and computations across devices. Essential for training large models that don’t fit on a single device.
- Parameters
axis_dims – Sharding axis dimensions as a tuple. Common patterns: - (1, 1, 1, -1): Data parallel only - (2, 1, 1, -1): 2-way tensor parallel - (1, 2, 1, -1): 2-way pipeline parallel - (2, 2, 1, -1): 2-way tensor + 2-way pipeline parallel
axis_names – Sharding axis names (e.g., (“dp”, “tp”, “pp”, “sp”)) - dp: Data parallel - tp: Tensor parallel - pp: Pipeline parallel - sp: Sequence parallel
**kwargs – Additional sharding options.
- Returns
Self for method chaining
Example
>>> # 2-way tensor parallel, 2-way data parallel >>> elm.set_sharding( ... axis_dims=(2, 2, 1, -1), ... axis_names=("dp", "tp", "pp", "sp") ... )
- set_teacher_model(model_name_or_path: str) eLargeModel[source]#
Set the teacher model name or path for distillation training.
Configures a teacher model used for knowledge distillation. The teacher model is typically a larger, more capable model that guides the training of the student (primary) model.
- Parameters
model_name_or_path – HuggingFace model ID or local path for teacher model. Should be a model compatible with the student model’s architecture.
- Returns
Self for method chaining
Example
>>> elm.set_model("meta-llama/Llama-2-7b") # Student model >>> elm.set_teacher_model("meta-llama/Llama-2-13b") # Teacher model >>> elm.set_trainer("distillation", temperature=3.0)
- set_trainer(trainer_type: str, **kwargs) eLargeModel[source]#
Configure trainer settings.
Sets the training paradigm and associated hyperparameters.
- Parameters
trainer_type – Type of trainer to use: - “sft”: Supervised Fine-Tuning - “dpo”: Direct Preference Optimization - “orpo”: Odds Ratio Preference Optimization - “grpo”: Group Relative Policy Optimization - “reward”: Reward model training - “distillation”: Knowledge distillation - “base”: Basic trainer for custom training loops
**kwargs – Trainer-specific configuration options: Common options: - learning_rate: Learning rate (default: 5e-5) - num_train_epochs: Number of training epochs - per_device_train_batch_size: Batch size per device - gradient_accumulation_steps: Gradient accumulation steps - warmup_steps: Number of warmup steps - output_dir: Directory to save checkpoints DPO-specific: - beta: KL regularization coefficient - loss_type: “sigmoid”, “ipo”, “hinge” Distillation-specific: - temperature: Distillation temperature - alpha: Weight for distillation loss
- Returns
Self for method chaining
Example
>>> # SFT training >>> elm.set_trainer( ... "sft", ... learning_rate=2e-5, ... num_train_epochs=3, ... per_device_train_batch_size=4 ... ) >>> >>> # DPO training >>> elm.set_trainer( ... "dpo", ... beta=0.1, ... learning_rate=1e-6 ... )
- property task: TaskType#
Get the resolved task type.
- Returns
The task type (e.g., TaskType.CAUSAL_LM) either explicitly configured or auto-detected from the model.
- property teacher_model_name: str | None#
Get the teacher model name or path for distillation.
- Returns
The teacher model path if configured, None otherwise.
- to_dict() dict[str, Any][source]#
Get configuration as dictionary.
Returns a copy of the full configuration dictionary that can be modified without affecting the eLargeModel instance.
- Returns
Configuration dictionary with all settings
Example
>>> config_dict = elm.to_dict() >>> print(config_dict["model"]["name_or_path"]) >>> # Modify the dict without affecting elm >>> config_dict["model"]["dtype"] = "fp16"
- to_json(json_path: str | os.PathLike | eformer.paths.GCSPath | eformer.paths.LocalPath | eformer.paths.MLUtilPath) None[source]#
Save configuration to JSON file.
Exports the current configuration to a JSON file that can be loaded later with from_json() or shared with others.
- Parameters
json_path – Path where the JSON configuration file will be saved. Will create parent directories if they don’t exist.
Example
>>> elm.to_json("config.json") >>> # Later or on another machine: >>> elm2 = eLargeModel.from_json("config.json")
- train(train_dataset: Dataset | ShardedDataSource | None = None, eval_dataset: Dataset | ShardedDataSource | None = None, base_state_class: type[EasyDeLState] | None = None, args_class: type[TrainingArguments] | None = None, trainer_class: type[Trainer] | None = None, **build_kwargs: Unpack[BuildTrainerKws])[source]#
Train the model with the configured settings.
This is a high-level convenience method that orchestrates the entire training pipeline: 1. Validates configuration 2. Builds the model if not already built 3. Creates the dataset from mixture configuration if not provided 4. Builds reference/teacher models if needed 5. Creates the appropriate trainer 6. Runs training and returns results
- Args:
- train_dataset: Optional training dataset (Dataset or ShardedDataSource).
If None, will build from mixture configuration.
eval_dataset: Optional evaluation dataset for validation during training. base_state_class: Optional custom EasyDeLState class for model state
management. Use for custom model implementations.
- args_class: Optional custom TrainingArguments class. If None, will
auto-select based on trainer_type.
- trainer_class: Optional custom Trainer class. If None, will auto-select
based on trainer_type.
- **build_kwargs: Additional kwargs for trainer building:
data_collator: Custom data collator function
formatting_func: Function to format examples (SFT)
reward_processing_classes: Processing classes for rewards (GRPO)
data_tokenize_fn: Custom tokenization function
reference_model: Override reference model
reward_model: Override reward model
teacher_model: Override teacher model
reward_funcs: Custom reward functions
- Returns:
Training results from the trainer, including metrics and final model state
- Example:
Basic SFT training: >>> elm = eLargeModel.from_pretrained(“meta-llama/Llama-2-7b”) >>> elm.add_dataset(“train.json”, dataset_type=”json”) >>> elm.set_trainer(“sft”, learning_rate=2e-5, num_train_epochs=3) >>> results = elm.train()
DPO training with custom datasets: >>> train_data = load_dataset(“preference_data”, split=”train”) >>> eval_data = load_dataset(“preference_data”, split=”test”) >>> elm.set_trainer(“dpo”, beta=0.1) >>> elm.set_reference_model(“meta-llama/Llama-2-7b”) >>> results = elm.train(train_dataset=train_data, eval_dataset=eval_data)
Custom trainer with formatting function: >>> def format_fn(examples): … return [f”Question: {q}
- Answer: {a}”
… for q, a in zip(examples[“question”], examples[“answer”])] >>> results = elm.train(formatting_func=format_fn)
- update_config(updates: Mapping[str, Any]) eLargeModel[source]#
Update configuration with new values.
Performs a deep merge of the updates into the existing configuration, preserving nested structures. The configuration is re-normalized after updating to ensure consistency.
- Parameters
updates – Dictionary with configuration updates. Can include nested structures like {“model”: {“dtype”: “bf16”}, “esurge”: {“max_num_seqs”: 32}}
- Returns
Self for method chaining
Example
>>> elm.update_config({ ... "loader": {"dtype": "bf16"}, ... "esurge": {"max_model_len": 4096} ... })
- validate() None[source]#
Validate the current configuration.
Checks that all required fields are present and have valid values. This is automatically called before training or building engines.
- Raises
ValueError – If configuration is invalid (e.g., missing model name, invalid dtype, incompatible settings)