Skip to content

viberl.utils.training

Training utilities for VibeRL framework.

This module provides backward compatibility for the old training interface while internally using the new Trainer class.

Functions:

Name Description
train_agent

Generic training function for RL agents with periodic evaluation.

evaluate_agent

Generic evaluation function for RL agents.

train_agent

train_agent(
    env: Env,
    agent: Agent,
    num_episodes: int = 1000,
    max_steps: int = 1000,
    render_interval: int | None = None,
    save_interval: int | None = None,
    save_path: str | None = None,
    verbose: bool = True,
    log_dir: str | None = None,
    eval_interval: int = 100,
    eval_episodes: int = 10,
    log_interval: int = 1000,
) -> list[float]

Generic training function for RL agents with periodic evaluation.

.. deprecated:: 1.0 Use :class:viberl.trainer.Trainer instead.

Parameters:

Name Type Description Default
env Env

Gymnasium environment

required
agent Agent

RL agent with select_action, store_transition, and update_policy methods

required
num_episodes int

Number of training episodes

1000
max_steps int

Maximum steps per episode

1000
render_interval int | None

Render every N episodes

None
save_interval int | None

Save model every N episodes

None
save_path str | None

Path to save models

None
verbose bool

Print training progress

True
log_dir str | None

Directory for TensorBoard logs

None
eval_interval int

Evaluate every N episodes

100
eval_episodes int

Number of evaluation episodes

10

Returns:

Type Description
list[float]

List of episode rewards

Source code in viberl/utils/training.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def train_agent(
    env: gym.Env,
    agent: Agent,
    num_episodes: int = 1000,
    max_steps: int = 1000,
    render_interval: int | None = None,
    save_interval: int | None = None,
    save_path: str | None = None,
    verbose: bool = True,
    log_dir: str | None = None,
    eval_interval: int = 100,
    eval_episodes: int = 10,
    log_interval: int = 1000,
) -> list[float]:
    """
    Generic training function for RL agents with periodic evaluation.

    .. deprecated:: 1.0
        Use :class:`viberl.trainer.Trainer` instead.

    Args:
        env: Gymnasium environment
        agent: RL agent with select_action, store_transition, and update_policy methods
        num_episodes: Number of training episodes
        max_steps: Maximum steps per episode
        render_interval: Render every N episodes
        save_interval: Save model every N episodes
        save_path: Path to save models
        verbose: Print training progress
        log_dir: Directory for TensorBoard logs
        eval_interval: Evaluate every N episodes
        eval_episodes: Number of evaluation episodes

    Returns:
        List of episode rewards
    """
    warnings.warn(
        'train_agent is deprecated and will be removed in a future version. '
        'Use viberl.trainer.Trainer instead.',
        DeprecationWarning,
        stacklevel=2,
    )

    # Create trainer using new interface
    trainer = Trainer(
        env=env,
        agent=agent,
        max_steps=max_steps,
        log_dir=log_dir,
    )

    # Train using new trainer
    return trainer.train(
        num_episodes=num_episodes,
        eval_interval=eval_interval,
        eval_episodes=eval_episodes,
        save_interval=save_interval,
        save_path=save_path,
        render_interval=render_interval,
        log_interval=log_interval,
        verbose=verbose,
    )

evaluate_agent

evaluate_agent(
    env: Env, agent: Agent, num_episodes: int = 10, render: bool = False, max_steps: int = 1000
) -> tuple[list[float], list[int]]

Generic evaluation function for RL agents.

Parameters:

Name Type Description Default
env Env

Gymnasium environment

required
agent Agent

RL agent with select_action method

required
num_episodes int

Number of evaluation episodes

10
render bool

Whether to render the environment

False
max_steps int

Maximum steps per episode

1000

Returns:

Type Description
tuple[list[float], list[int]]

Tuple of (episode_rewards, episode_lengths)

Source code in viberl/utils/training.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def evaluate_agent(
    env: gym.Env,
    agent: Agent,
    num_episodes: int = 10,
    render: bool = False,
    max_steps: int = 1000,
) -> tuple[list[float], list[int]]:
    """
    Generic evaluation function for RL agents.

    Args:
        env: Gymnasium environment
        agent: RL agent with select_action method
        num_episodes: Number of evaluation episodes
        render: Whether to render the environment
        max_steps: Maximum steps per episode

    Returns:
        Tuple of (episode_rewards, episode_lengths)
    """
    trainer = Trainer(env=env, agent=agent, max_steps=max_steps)
    return trainer.evaluate(num_episodes=num_episodes, render=render)