`src.rl.dqn`¶

Module Contents¶

Classes¶

DQN_Agent

Introduction¶

The DQN_Agent class implements a Deep Q-Network (DQN) agent for reinforcement learning. This agent uses experience replay, epsilon-greedy exploration, and gradient clipping to learn optimal policies in a given environment. It supports parallelized environments and provides methods for training, action selection, and evaluation.

API¶

class src.rl.dqn.DQN_Agent(config, network: dict, learning_rates: float)[source]¶

Bases: src.rl.basic_agent.Basic_Agent

Introduction¶

Original paper¶

“Playing Atari with Deep Reinforcement Learning.”

Args¶

config: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.
networks (dict): A dictionary of neural networks used by the agent, with keys as network names (e.g., ‘actor’, ‘critic’) and values as the corresponding network instances.
learning_rates (float): Learning rate for the optimizer.

Attributes¶

gamma (float): Discount factor for future rewards.(default: 0.8)
n_act (int): Number of possible actions in the environment.(default: 3)
epsilon (float): Exploration rate for epsilon-greedy policy.(default: 0.1)
max_grad_norm (float): Maximum gradient norm for gradient clipping.(default: infinity)
memory_size (int): Size of the replay buffer.(default: 100)
batch_size (int): Batch size for training.(default: 64)
warm_up_size (int): Minimum number of experiences required in the replay buffer before training starts.(default: training batch size)
device (str): Device to run computations on (e.g., ‘cpu’ or ‘cuda’).
replay_buffer (ReplayBuffer): Replay buffer for storing experiences.
network (list): List of network names used by the agent.
optimizer (torch.optim.Optimizer): Optimizer for training the networks.(default: ‘AdamW’)
criterion (torch.nn.Module): Loss function used for training.(default: ‘MSELoss’)
learning_time (int): Counter for the number of training steps.
cur_checkpoint (int): Counter for the current checkpoint index.

Methods¶

set_network(networks, learning_rates): Sets up the neural networks, optimizers, and loss functions for the agent.
update_setting(config): Updates the agent’s configuration and resets training-related attributes.
get_action(state, epsilon_greedy=False): Selects an action based on the current state using the epsilon-greedy policy.
train_episode(envs, seeds, para_mode, compute_resource, tb_logger, required_info): Trains the agent for one episode in a parallelized environment.
rollout_episode(env, seed, required_info): Executes a single rollout episode in the environment and collects results.
log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, Reward, predict_Q, target_Q, extra_info): Logs training metrics to TensorBoard.

Initialization

Initializes the DQN agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:¶

config: Configuration object containing all necessary parameters for the experiment.
network (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.

get_step()[source]¶

Returns the current training step.

Returns:¶

int: The current training step.

set_network(networks: dict, learning_rates: float)[source]¶

Sets up the neural networks, optimizers, and loss functions for the agent.

Args:¶

networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.

Raises:¶

ValueError: If the length of the learning rates list does not match the number of networks.

update_setting(config)[source]¶

Updates the agent’s configuration and resets training-related attributes.

Args:¶

config: Configuration object containing updated parameters.

get_action(state, epsilon_greedy=False)[source]¶

Selects an action based on the current state using the epsilon-greedy policy.

Args:¶

state (torch.Tensor): The current state.
epsilon_greedy (bool): Whether to use epsilon-greedy exploration.

Returns:¶

np.ndarray: The selected action(s).

train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]¶

Trains the agent for one episode in a parallelized environment.

Args:¶

envs: List of environments for training.
seeds: Seeds for reproducibility.
para_mode (str): Parallelization mode for the environments.
compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
tb_logger: TensorBoard logger for logging training metrics.
required_info (dict): Additional information required from the environment.

Returns:¶

tuple: A boolean indicating whether training has ended and a dictionary with training metrics.

rollout_episode(env, seed=None, required_info=['normalizer', 'gbest'])[source]¶

Executes a single rollout episode in the environment and collects results.

Args:¶

env: The environment for the rollout.
seed (int, optional): Seed for reproducibility.
required_info (dict): Additional information required from the environment.

Returns:¶

dict: A dictionary containing results of the rollout episode, including return and environment-specific metrics.

log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, Reward, predict_Q, target_Q, extra_info={})[source]¶

Logs training metrics to TensorBoard.

Args:¶

tb_logger: TensorBoard logger for logging training metrics.
mini_step (int): Current mini-batch step.
grad_norms (tuple): Gradient norms for the networks.
loss (torch.Tensor): Training loss.
Return (torch.Tensor): Episode return.
Reward (torch.Tensor): Target reward.
predict_Q (torch.Tensor): Predicted Q-values.
target_Q (torch.Tensor): Target Q-values.
extra_info (dict): Additional information to log.

src.rl.dqn¶

Module Contents¶

Classes¶

Introduction¶

API¶

Introduction¶

Original paper¶

Args¶

Attributes¶

Methods¶

Args:¶

Returns:¶

Args:¶

Raises:¶

Args:¶

Args:¶

Returns:¶

Args:¶

Returns:¶

Args:¶

Returns:¶

Args:¶

`src.rl.dqn`¶