Skip to content

VibeRL

VibeRL Logo

Documentation CI Tests Coverage Python PyTorch Gymnasium License

A modern Reinforcement Learning framework for education and research, built with type safety and modern Python practices.

Quick Start

Installation

# Install with uv
uv pip install -e ".[dev]"

# Or with pip
pip install -e ".[dev]"

Basic Usage

# Train REINFORCE
viberl-train --alg reinforce --episodes 1000 --grid-size 10

# Train PPO with parallel environments
viberl-train --alg ppo --episodes 1000 --num-envs 4 --trajectory-batch 8

# Evaluate trained model
viberl-eval --model-path experiments/*/final_model.pth --episodes 10 --render

# Run demo
viberl-demo --episodes 5 --grid-size 15

Python API

from viberl.agents import REINFORCEAgent
from viberl.envs import SnakeGameEnv
from viberl.trainer import Trainer

# Create environment and agent
env = SnakeGameEnv(grid_size=10)
agent = REINFORCEAgent(state_size=100, action_size=4, learning_rate=0.001)

# Train
trainer = Trainer(env=env, agent=agent, num_envs=4, batch_size=8)
trainer.train(num_episodes=1000)

Features

🔧 Modern Development Stack

  • Gymnasium: Standard RL environment interface ensuring compatibility with the entire RL ecosystem. VibeRL follows Gymnasium's API standards and supports both single and parallel environments through AsyncVectorEnv for efficient sampling
  • Pydantic: Runtime type validation using Python type hints that ensures data integrity across actions, transitions, and trajectories
  • Pytest: Comprehensive testing framework with 50+ unit tests covering all algorithms, environments, and utilities
  • TensorBoard: Real-time training visualization dashboard showing loss curves, reward trends, and hyperparameter sweeps

🤖 Reinforcement Learning Algorithms

  • REINFORCE: Policy gradient method with Monte Carlo returns
  • DQN: Deep Q-Network with experience replay
  • PPO: Proximal Policy Optimization with clipping

Core Framework Features

  • Type Safety: Full type annotations throughout
  • Parallel Training: AsyncVectorEnv support with configurable num_envs
  • CLI Interface: Complete training, evaluation, and demo commands
  • Experiment Management: Automatic directory structure with TensorBoard logging

Architecture

Component Overview

Component Purpose Key Classes
Agents RL Algorithms REINFORCEAgent, DQNAgent, PPOAgent
Environments Simulation SnakeGameEnv
Networks Neural Networks PolicyNetwork, ValueNetwork
Sampling Data Collection VectorEnvSampler
Training Training Loop Trainer

Training Pipeline

graph TD
    CLI[CLI] --> Trainer[Trainer]
    Trainer --> Sampler[VectorEnvSampler]
    Trainer --> Agent[Agent]
    Sampler --> Env[AsyncVectorEnv]
    Env --> Agent
    Agent --> Sampler

Algorithms

  • REINFORCE: Policy gradient with Monte Carlo returns
  • DQN: Deep Q-Network with experience replay
  • PPO: Proximal Policy Optimization with clipping

Development

# Run tests
pytest -n 8

# Format code
ruff format viberl/
ruff check viberl/ --fix

# Build docs
mkdocs serve

Modern Development Stack

Gymnasium - Standard RL environment interface that ensures compatibility with the entire RL ecosystem. VibeRL environments are implemented using Gymnasium's API standards, supporting both single and parallel environments through AsyncVectorEnv for efficient sampling.

Citation

@software{viberl2025,
  title={VibeRL: Modern Reinforcement Learning with Vibe Coding},
  author={0xWelt},
  year={2025},
  url={https://github.com/0xWelt/VibeRL},
}

⭐ Star History

Star History Chart

License

MIT License