VibeRL
A modern Reinforcement Learning framework for education and research, built with type safety and modern Python practices.
Quick Start
Installation
# Install with uv
uv pip install -e ".[dev]"
# Or with pip
pip install -e ".[dev]"
Basic Usage
# Train REINFORCE
viberl-train --alg reinforce --episodes 1000 --grid-size 10
# Train PPO with parallel environments
viberl-train --alg ppo --episodes 1000 --num-envs 4 --trajectory-batch 8
# Evaluate trained model
viberl-eval --model-path experiments/*/final_model.pth --episodes 10 --render
# Run demo
viberl-demo --episodes 5 --grid-size 15
Python API
from viberl.agents import REINFORCEAgent
from viberl.envs import SnakeGameEnv
from viberl.trainer import Trainer
# Create environment and agent
env = SnakeGameEnv(grid_size=10)
agent = REINFORCEAgent(state_size=100, action_size=4, learning_rate=0.001)
# Train
trainer = Trainer(env=env, agent=agent, num_envs=4, batch_size=8)
trainer.train(num_episodes=1000)
Features
🔧 Modern Development Stack
- Gymnasium: Standard RL environment interface ensuring compatibility with the entire RL ecosystem. VibeRL follows Gymnasium's API standards and supports both single and parallel environments through
AsyncVectorEnv
for efficient sampling - Pydantic: Runtime type validation using Python type hints that ensures data integrity across actions, transitions, and trajectories
- Pytest: Comprehensive testing framework with 50+ unit tests covering all algorithms, environments, and utilities
- TensorBoard: Real-time training visualization dashboard showing loss curves, reward trends, and hyperparameter sweeps
🤖 Reinforcement Learning Algorithms
- REINFORCE: Policy gradient method with Monte Carlo returns
- DQN: Deep Q-Network with experience replay
- PPO: Proximal Policy Optimization with clipping
Core Framework Features
- Type Safety: Full type annotations throughout
- Parallel Training: AsyncVectorEnv support with configurable
num_envs
- CLI Interface: Complete training, evaluation, and demo commands
- Experiment Management: Automatic directory structure with TensorBoard logging
Architecture
Component Overview
Component | Purpose | Key Classes |
---|---|---|
Agents | RL Algorithms | REINFORCEAgent , DQNAgent , PPOAgent |
Environments | Simulation | SnakeGameEnv |
Networks | Neural Networks | PolicyNetwork , ValueNetwork |
Sampling | Data Collection | VectorEnvSampler |
Training | Training Loop | Trainer |
Training Pipeline
graph TD
CLI[CLI] --> Trainer[Trainer]
Trainer --> Sampler[VectorEnvSampler]
Trainer --> Agent[Agent]
Sampler --> Env[AsyncVectorEnv]
Env --> Agent
Agent --> Sampler
Algorithms
- REINFORCE: Policy gradient with Monte Carlo returns
- DQN: Deep Q-Network with experience replay
- PPO: Proximal Policy Optimization with clipping
Development
# Run tests
pytest -n 8
# Format code
ruff format viberl/
ruff check viberl/ --fix
# Build docs
mkdocs serve
Modern Development Stack
Gymnasium - Standard RL environment interface that ensures compatibility with the entire RL ecosystem. VibeRL environments are implemented using Gymnasium's API standards, supporting both single and parallel environments through AsyncVectorEnv
for efficient sampling.
Citation
@software{viberl2025,
title={VibeRL: Modern Reinforcement Learning with Vibe Coding},
author={0xWelt},
year={2025},
url={https://github.com/0xWelt/VibeRL},
}