VibeRL Documentation
Welcome to VibeRL - A modern Reinforcement Learning framework built with type safety and modern Python practices.
Quick Start
Installation
# Clone the repository
git clone https://github.com/0xWelt/VibeRL.git
cd VibeRL
# Install using [uv](https://docs.astral.sh/uv/)
uv pip install -e "."
# Or install with development dependencies
uv pip install -e ".[dev]"
Your First Training
# Train a REINFORCE agent
viberl-train --alg reinforce --episodes 1000 --grid-size 10 --trajectory-batch 4
# Train a DQN agent
viberl-train --alg dqn --episodes 2000 --grid-size 15 --memory-size 10000 --trajectory-batch 8
# Train a PPO agent
viberl-train --alg ppo --episodes 1000 --grid-size 12 --ppo-epochs 4 --trajectory-batch 16
Evaluate and Demo
# Evaluate a trained model
viberl-eval --model-path experiments/reinforce_snake_20241231_120000/final_model.pth --episodes 10 --render
# Run demo with random actions
viberl-demo --episodes 5 --grid-size 15
# Monitor training with TensorBoard
tensorboard --logdir experiments/
Experiment Management
Directory Structure
When training agents, VibeRL automatically creates organized experiment directories:
experiments/
├── reinforce_snake_20241231_120000/
│ ├── final_model.pth
│ ├── best_model.pth
│ ├── config.json
│ ├── metrics.json
│ ├── tensorboard/
│ │ └── events.out.tfevents.*
│ └── logs/
│ └── training.log
├── dqn_snake_20250101_143000/
│ └── ...
└── ppo_snake_20250102_090000/
└── ...
Each experiment directory contains:
- Model files: final_model.pth
, best_model.pth
(highest reward)
- Configuration: config.json
with all training parameters
- Metrics: metrics.json
with training statistics
- TensorBoard logs: Real-time training metrics
- Training logs: Detailed log files
TensorBoard Integration
Monitor your training progress in real-time with TensorBoard:
# Start TensorBoard for all experiments
tensorboard --logdir experiments/
# Start TensorBoard for specific algorithm
tensorboard --logdir experiments/reinforce_*
# Start TensorBoard for specific run
tensorboard --logdir experiments/reinforce_snake_20241231_120000/tensorboard/
Access TensorBoard at http://localhost:6006
to view:
- Episode rewards and lengths
- Loss curves (policy, value, total)
- Learning rates and hyperparameters
- Action distributions
- Custom metrics per algorithm
Weights & Biases Integration
Track experiments with Weights & Biases for enhanced experiment management:
# Enable wandb logging during training
viberl-train --alg dqn --episodes 1000 --wandb --name my_experiment
# All CLI arguments are automatically logged to wandb
viberl-train --alg ppo --episodes 500 --lr 3e-4 --wandb --name ppo_tuning
Features include: - Automatic hyperparameter tracking - Real-time metric visualization - Experiment comparison - Artifact storage for models and logs - Collaborative experiment sharing
Python API
Basic Training
from viberl.agents.reinforce import REINFORCEAgent
from viberl.envs import SnakeGameEnv
from viberl.utils.training import train_agent
# Create environment
env = SnakeGameEnv(grid_size=10)
# Create agent
agent = REINFORCEAgent(
state_size=100, # 10x10 grid
action_size=4, # 4 directions
learning_rate=0.001
)
# Train the agent
train_agent(
agent=agent,
env=env,
episodes=1000,
save_path="models/reinforce_snake.pth"
)
Custom Training Loop
import numpy as np
from loguru import logger
from viberl.typing import Trajectory, Transition
from viberl.agents.dqn import DQNAgent
env = SnakeGameEnv(grid_size=10)
agent = DQNAgent(state_size=100, action_size=4)
for episode in range(1000):
state, _ = env.reset()
transitions = []
while True:
action = agent.act(state, training=True)
next_state, reward, done, truncated, info = env.step(action.action)
transitions.append(Transition(
state=state, action=action, reward=reward,
next_state=next_state, done=done
))
state = next_state
if done or truncated:
break
trajectory = Trajectory.from_transitions(transitions)
metrics = agent.learn(trajectories=[trajectory])
if episode % 100 == 0:
logger.info(f"Episode {episode}, Reward: {trajectory.total_reward}")
Experiment Management API
from viberl.utils.experiment_manager import ExperimentManager
from viberl.agents.ppo import PPOAgent
from viberl.envs import SnakeGameEnv
# Create experiment with automatic directory structure
experiment = ExperimentManager(
algorithm="ppo",
grid_size=12,
learning_rate=0.001,
experiment_dir="experiments"
)
# Access experiment paths
logger.info(f"Model will be saved to: {experiment.model_path}")
logger.info(f"TensorBoard logs: {experiment.tensorboard_path}")
logger.info(f"Config file: {experiment.config_path}")
# Use with training
env = SnakeGameEnv(grid_size=12)
agent = PPOAgent(
state_size=144, # 12x12 grid
action_size=4,
learning_rate=0.001,
device=experiment.device
)
# Train with automatic logging
training_metrics = experiment.train_agent(
agent=agent,
env=env,
episodes=1000,
log_frequency=100
)
# View TensorBoard (run in terminal)
# tensorboard --logdir experiments/ppo_*
Features
- Modern Type System: Pydantic-based Action, Transition, Trajectory classes
- Three Algorithms: REINFORCE, DQN, PPO with unified interface
- Batch Training: Collect multiple trajectories per training iteration
- Type Safety: Full type annotations throughout
- CLI Interface: Complete training, evaluation, and demo commands
- Experiment Management: Automatic directory structure with TensorBoard and Weights & Biases logging
- 50+ Tests: Comprehensive test suite
Architecture
The framework follows a clean architecture:
viberl/typing.py
: Modern type systemviberl/agents/
: RL algorithms (REINFORCE, DQN, PPO)viberl/envs/
: Environments (SnakeGameEnv)viberl/networks/
: Neural network implementationsviberl/utils/
: Training utilities, experiment management, and unified loggingviberl/cli.py
: Command-line interface
Algorithms
REINFORCE
Policy gradient method using Monte Carlo returns.
DQN
Deep Q-Network with experience replay and target networks.
PPO
Proximal Policy Optimization with clipping and multiple epochs.
License
This project is licensed under the MIT License - see the LICENSE file for details.