viberl.agents.dqn
DQN: Deep Q-Network combining Q-learning with deep neural networks for human-level control.
Algorithm Overview:
DQN combines traditional Q-learning with deep neural networks to learn optimal action-value functions in environments with high-dimensional state spaces. It addresses key challenges of applying deep learning to reinforcement learning through experience replay and target networks.
Key Concepts:
- Deep Q-Learning: Uses neural networks to approximate Q-values \(Q(s,a;\theta)\)
- Experience Replay: Stores and samples experiences to break correlation between samples
- Target Network: Separate frozen network provides stable target Q-values
- Epsilon-Greedy: Balances exploration and exploitation during training
- Temporal Difference: Uses TD error for Q-value updates
Mathematical Foundation:
Optimization Objective:
Bellman Optimality Equation:
Reference: Mnih, V., Kavukcuoglu, K., Silver, D., et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015). PDF
Classes:
Name | Description |
---|---|
DQNAgent |
DQN agent implementation with deep Q-learning and experience replay. |
DQNAgent
DQNAgent(
state_size: int,
action_size: int,
learning_rate: float = 0.001,
gamma: float = 0.99,
epsilon_start: float = 1.0,
epsilon_end: float = 0.01,
epsilon_decay: float = 0.995,
memory_size: int = 10000,
batch_size: int = 64,
target_update: int = 10,
hidden_size: int = 128,
num_hidden_layers: int = 2,
)
Bases: Agent
DQN agent implementation with deep Q-learning and experience replay.
This agent implements the Deep Q-Network algorithm using neural networks to approximate Q-values, with experience replay and target networks for stability.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_size
|
int
|
Dimension of the state space. Must be positive. |
required |
action_size
|
int
|
Number of possible actions. Must be positive. |
required |
learning_rate
|
float
|
Learning rate for the Adam optimizer. Must be positive. |
0.001
|
gamma
|
float
|
Discount factor for future rewards. Should be in (0, 1]. |
0.99
|
epsilon_start
|
float
|
Initial exploration rate. Should be in [0, 1]. |
1.0
|
epsilon_end
|
float
|
Final exploration rate. Should be in [0, 1]. |
0.01
|
epsilon_decay
|
float
|
Decay rate for exploration. Should be in (0, 1]. |
0.995
|
memory_size
|
int
|
Size of the experience replay buffer. Must be positive. |
10000
|
batch_size
|
int
|
Batch size for training. Must be positive. |
64
|
target_update
|
int
|
Frequency of target network updates. Must be positive. |
10
|
hidden_size
|
int
|
Number of neurons in each hidden layer. Must be positive. |
128
|
num_hidden_layers
|
int
|
Number of hidden layers in the Q-network. Must be non-negative. |
2
|
Raises:
Type | Description |
---|---|
ValueError
|
If any parameter is invalid. |
Methods:
Name | Description |
---|---|
act |
Select action using epsilon-greedy policy. |
learn |
Update Q-network using Q-learning with experience replay. |
save |
Save the agent's neural network parameters to a file. |
load |
Load the agent's neural network parameters from a file. |
Attributes:
Name | Type | Description |
---|---|---|
learning_rate |
|
|
gamma |
|
|
epsilon_start |
|
|
epsilon_end |
|
|
epsilon_decay |
|
|
memory_size |
|
|
batch_size |
|
|
target_update |
|
|
epsilon |
|
|
q_network |
|
|
target_network |
|
|
optimizer |
|
|
memory |
|
Source code in viberl/agents/dqn.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
learning_rate
instance-attribute
learning_rate = learning_rate
gamma
instance-attribute
gamma = gamma
epsilon_start
instance-attribute
epsilon_start = epsilon_start
epsilon_end
instance-attribute
epsilon_end = epsilon_end
epsilon_decay
instance-attribute
epsilon_decay = epsilon_decay
memory_size
instance-attribute
memory_size = memory_size
batch_size
instance-attribute
batch_size = batch_size
target_update
instance-attribute
target_update = target_update
epsilon
instance-attribute
epsilon = epsilon_start
q_network
instance-attribute
q_network = QNetwork(state_size, action_size, hidden_size, num_hidden_layers)
target_network
instance-attribute
target_network = QNetwork(state_size, action_size, hidden_size, num_hidden_layers)
optimizer
instance-attribute
optimizer = Adam(parameters(), lr=learning_rate)
memory
instance-attribute
memory = deque(maxlen=memory_size)
act
act(state: ndarray, training: bool = True) -> Action
Select action using epsilon-greedy policy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state
|
ndarray
|
Current state observation. |
required |
training
|
bool
|
Whether in training mode (affects exploration). |
True
|
Returns:
Type | Description |
---|---|
Action
|
Action containing the selected action. |
Source code in viberl/agents/dqn.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
learn
learn(trajectories: list[Trajectory]) -> dict[str, float]
Update Q-network using Q-learning with experience replay.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trajectories
|
list[Trajectory]
|
List of trajectories to learn from |
required |
Returns:
Type | Description |
---|---|
dict[str, float]
|
Dictionary containing loss, epsilon, and memory size. |
Source code in viberl/agents/dqn.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
save
save(filepath: str) -> None
Save the agent's neural network parameters to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
Path where to save the model |
required |
Source code in viberl/agents/dqn.py
195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
load
load(filepath: str) -> None
Load the agent's neural network parameters from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
Path from which to load the model |
required |
Source code in viberl/agents/dqn.py
209 210 211 212 213 214 215 216 217 |
|