Return to Algorithms

Deep Q-Network (DQN) Reinforcement Learning

Grid World Environment

Agent learning to navigate through the environment to reach a goal
Agent
Goal
Trap
Wall
Q-Value Intensity

Neural Network Model

DQN model architecture and activations

Training Metrics

Reward & Steps

Loss & Epsilon

Controls

Environment Setup

DQN Parameters

0.001
0.95
1.00
0.10
1000

Visualization Options

Status

Episode: 0/0
Step: 0
Current Epsilon: 1.00
Replay Buffer: 0/10000
Loss: -
Total Reward: 0
Status: Not Started

Model Information

How Deep Q-Networks (DQN) Work

Deep Q-Network (DQN) is a reinforcement learning algorithm that combines Q-learning with deep neural networks to handle high-dimensional state spaces. It was introduced by DeepMind in 2015 and represents a breakthrough in reinforcement learning, enabling agents to learn directly from raw sensory inputs.

Key Components of DQN:

  1. Neural Network as Function Approximator: Instead of maintaining a Q-table, DQN uses a neural network to approximate the Q-function, allowing it to handle much larger state spaces.
  2. Experience Replay: DQN stores experiences (state, action, reward, next state) in a replay buffer and samples random batches for training, which breaks correlations between consecutive samples and improves data efficiency.
  3. Target Network: A separate "target" network is used for generating the targets in the Q-learning update, which is periodically updated with the weights of the main network to improve stability.
  4. Epsilon-Greedy Exploration: DQN starts with a high exploration rate (epsilon) that gradually decreases over time, balancing exploration and exploitation.

DQN Algorithm:

  1. Initialize replay memory D to capacity N
  2. Initialize action-value function Q with random weights θ
  3. Initialize target action-value function Q̂ with weights θ⁻ = θ
  4. For each episode:
    • Initialize state s₁
    • For each step of the episode:
      • With probability ε select a random action aₜ, otherwise select aₜ = argmax_a Q(sₜ,a;θ)
      • Execute action aₜ in the environment and observe reward rₜ and next state sₜ₊₁
      • Store transition (sₜ, aₜ, rₜ, sₜ₊₁) in replay memory D
      • Sample random mini-batch of transitions from D
      • Set y_j = rⱼ if episode terminates at step j+1, otherwise y_j = rⱼ + γ max_a' Q̂(sⱼ₊₁,a';θ⁻)
      • Perform a gradient descent step on (y_j - Q(sⱼ,aⱼ;θ))² with respect to θ
      • Every C steps update target network parameters: θ⁻ = θ

Improvements and Variations:

  • Double DQN: Addresses the overestimation bias in Q-learning by using the online network to select actions and the target network to evaluate them.
  • Dueling DQN: Separates the value and advantage functions, allowing the network to learn which states are valuable without having to learn the effect of each action.
  • Prioritized Experience Replay: Samples transitions with higher expected learning progress more frequently.
  • Noisy DQN: Uses noisy linear layers for directed exploration instead of epsilon-greedy.
  • Rainbow DQN: Combines multiple improvements for state-of-the-art performance.

Advantages over traditional Q-Learning:

  • Can handle high-dimensional state spaces (e.g., pixels from a game screen)
  • Better generalization to unseen states through function approximation
  • Experience replay improves data efficiency and breaks correlations
  • More stable learning through target networks
  • Ability to learn complex strategies in challenging environments