easydel.trainers.group_relative_policy_optimization.__init__#
- class easydel.trainers.group_relative_policy_optimization.__init__.GRPOConfig(auto_shard_states: bool = True, aux_loss_enabled: bool = False, backend: tp.Optional[str] = None, clip_grad: tp.Optional[float] = None, custom_scheduler: tp.Optional[tp.Callable[[int], tp.Any]] = None, dataloader_num_workers: tp.Optional[int] = 0, dataloader_pin_memory: tp.Optional[bool] = False, do_eval: bool = False, do_last_save: bool = True, do_train: bool = True, eval_batch_size: tp.Optional[int] = None, evaluation_steps: tp.Optional[int] = None, extra_optimizer_kwargs: dict = <factory>, frozen_parameters: tp.Optional[str] = None, gradient_accumulation_steps: int = 1, ids_to_pop_from_dataset: tp.Optional[tp.List[str]] = <factory>, is_fine_tuning: bool = True, init_tx: bool = True, jax_distributed_config: tp.Optional[dict] = None, learning_rate: float = 1e-06, learning_rate_end: tp.Optional[float] = None, log_all_workers: bool = False, log_grad_norms: bool = True, report_metrics: bool = True, log_steps: int = 10, loss_config: tp.Optional[LossConfig] = None, low_mem_usage: bool = True, max_evaluation_steps: tp.Optional[int] = None, max_sequence_length: tp.Optional[int] = 4096, max_training_steps: tp.Optional[int] = None, model_name: str = 'GRPOTrainer', model_parameters: tp.Optional[dict] = None, metrics_to_show_in_rich_pbar: tp.Optional[tp.List[str]] = None, num_train_epochs: int = 10, offload_dataset: bool = False, offload_device_type: str = 'cpu', offload_device_index: int = 0, optimizer: AVAILABLE_OPTIMIZERS = EasyDeLOptimizers.ADAMW, performance_mode: bool = False, pruning_module: tp.Any = None, process_zero_is_admin: bool = True, progress_bar_type: tp.Literal['tqdm', 'rich', 'json'] = 'tqdm', remove_ckpt_after_load: bool = False, remove_unused_columns: ~typing.Optional[bool] = False, report_steps: int = 5, save_directory: str = 'EasyDeL-Checkpoints', save_optimizer_state: bool = True, save_steps: tp.Optional[int] = None, save_total_limit: tp.Optional[int] = None, scheduler: AVAILABLE_SCHEDULERS = EasyDeLSchedulers.NONE, sparsify_module: bool = False, sparse_module_type: AVAILABLE_SPARSE_MODULE_TYPES = 'bcoo', state_apply_fn_kwarguments_to_model: tp.Optional[dict] = None, step_partition_spec: PartitionSpec = PartitionSpec(('dp', 'fsdp'), 'sp'), step_start_point: tp.Optional[int] = None, shuffle_train_dataset: bool = True, total_batch_size: int = 32, training_time_limit: tp.Optional[str] = None, train_on_inputs: bool = True, truncation_mode: tp.Literal['keep_end', 'keep_start'] = 'keep_end', tx_mu_dtype: tp.Optional[jnp.dtype] = None, track_memory: bool = False, use_data_collactor: bool = True, use_wandb: bool = True, verbose: bool = True, wandb_entity: tp.Optional[str] = None, warmup_steps: int = 0, weight_decay: float = 0.01, max_prompt_length: int = 512, max_completion_length: int = 256, dataset_num_proc: ~typing.Optional[int] = None, beta: float = 0.04, sync_ref_model: bool = False, ref_model_mixup_alpha: float = 0.9, ref_model_sync_steps: int = 64, tools: ~typing.Optional[~typing.List[~typing.Union[dict, ~typing.Callable]]] = None, skip_apply_chat_template: bool = False)[source]#
Bases:
TrainingArgumentsConfiguration class for the GRPOTrainer.
- beta: float = 0.04#
- dataset_num_proc: Optional[int] = None#
- extra_optimizer_kwargs: dict#
- ids_to_pop_from_dataset: tp.Optional[tp.List[str]]#
- learning_rate: float = 1e-06#
- max_completion_length: int = 256#
- max_prompt_length: int = 512#
- model_name: str = 'GRPOTrainer'#
- ref_model_mixup_alpha: float = 0.9#
- ref_model_sync_steps: int = 64#
- remove_unused_columns: Optional[bool] = False#
- skip_apply_chat_template: bool = False#
- sync_ref_model: bool = False#
- tools: Optional[List[Union[dict, Callable]]] = None#
- class easydel.trainers.group_relative_policy_optimization.__init__.GRPOTrainer(arguments: GRPOConfig, vinference: vInference, model: Optional[Union[EasyDeLBaseModule, EasyDeLState]], reward_funcs: Union[EasyDeLBaseModule, EasyDeLState, Callable[[list, list], list[float]], list[Union[easydel.infra.base_module.EasyDeLBaseModule, easydel.infra.base_state.EasyDeLState, Callable[[list, list], list[float]]]]], train_dataset: Optional[Any] = None, eval_dataset: Optional[Union[Any, Dict[str, Any]]] = None, processing_class: Optional[Any] = None, reward_processing_classes: Optional[Any] = None, data_tokenize_fn: Optional[Callable] = None)[source]#
Bases:
Trainer- arguments: GRPOConfig#
- checkpoint_manager: tp.Any#
- checkpoint_path: tp.Optional[tp.Union[str, os.PathLike]]#
- config: EasyDeLBaseConfig#
- configure_functions() TrainerConfigureFunctionOutput[source]#
Configures and JIT-compiles the training and evaluation step functions.
- This method sets up the necessary functions for training and evaluation, including:
Initialization of the model state.
Sharding of the model parameters and optimizer state.
JIT-compilation of the training and evaluation step functions.
- Returns
An object containing the configured functions and other relevant information.
- Return type
- data_collator: tp.Optional[tp.Callable]#
- dataloader_eval: tp.Optional[tp.Iterator[np.ndarray]]#
- dataloader_train: tp.Iterator[np.ndarray]#
- dataset_eval: tp.Optional[Dataset]#
- dataset_train: tp.Optional[Dataset]#
- dtype: tp.Any#
- evalu_tracker: CompilationTracker#
- finetune: bool#
- max_evaluation_steps: int#
- max_training_steps: int#
- memory_monitor: tp.Any#
- model_state: EasyDeLState#
- on_step_end(state: EasyDeLState, metrics: Any, step: int) Tuple[EasyDeLState, Any][source]#
hook process to call in start of the step.
- param_dtype: tp.Any#
- pruning_module: tp.Any#
- scheduler: optax.Schedule#
- sharded_evaluation_step_function: JitWrapped#
- sharded_training_step_function: JitWrapped#
- state: tp.Any#
- state_named_sharding: tp.Any#
- state_partition_spec: tp.Any#
- state_shape: tp.Any#
- train_tracker: CompilationTracker#
- tx: optax.GradientTransformation#
- wandb_runtime: tp.Any#