src.rl.ddqn¶
Module Contents¶
Classes¶
Introduction¶The |
API¶
- class src.rl.ddqn.DDQN_Agent(config, networks: dict, learning_rates: float)[source]¶
Bases:
src.rl.basic_agent.Basic_AgentIntroduction¶
The
DDQN_Agentclass implements a Double Deep Q-Network (DDQN) agent for reinforcement learning. This agent leverages experience replay, target networks, and epsilon-greedy exploration to learn optimal policies in a given environment.Original paper¶
“Deep Reinforcement Learning with Double Q-learning.”Proceedings of the AAAI Conference on Artificial Intelligence, 2016
Args¶
config: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.networks(dict): A dictionary of neural networks used by the agent, with keys as network names (e.g., ‘actor’, ‘critic’) and values as the corresponding network instances.learning_rates(float): Learning rate for the optimizer.
Attributes¶
gamma(float): Discount factor for future rewards.(default: 0.99)n_act(int): Number of possible actions in the environment.(default: 4)epsilon(float): Epsilon value for epsilon-greedy exploration.(default: 0.1)max_grad_norm(float): Maximum gradient norm for gradient clipping.(default: infinity)memory_size(int): Size of the replay buffer.(default: 100000)batch_size(int): Batch size for training.(default: 64)warm_up_size(int): Minimum number of experiences required in the replay buffer before training starts. (default: 10000)target_update_interval(int): Interval for updating the target network.(default: 1000)device(str): Device to run the computations on (e.g., ‘cpu’ or ‘cuda’).replay_buffer(ReplayBuffer): Replay buffer for storing experiences.network(list): List of network names used by the agent.optimizer(torch.optim.Optimizer): Optimizer for training the networks.(default: ‘Adam’)criterion(torch.nn.Module): Loss function used for training.(default: ‘MSELoss’)learning_time(int): Counter for the number of training steps.cur_checkpoint(int): Counter for the current checkpoint index.
Methods¶
__init__(config, networks, learning_rates): Initializes the DDQN agent with the given configuration, networks, and learning rates.set_network(networks, learning_rates): Sets up the networks, optimizer, and loss function for the agent.get_step(): Returns the current training step.update_setting(config): Updates the agent’s configuration settings.get_action(state, epsilon_greedy=False): Selects an action based on the current state and exploration strategy.train_episode(envs, seeds, para_mode, compute_resource, tb_logger, required_info): Trains the agent for one episode in a parallelized environment.rollout_episode(env, seed, required_info): Executes a single episode in the environment without training.log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, Reward, predict_Q, target_Q, extra_info): Logs training metrics to TensorBoard.
Initialization
Initializes the DDQN agent with the given configuration, networks, and learning rates. Store the initial agent in the checkpoint directory.
Args:¶
config: Configuration object containing all necessary parameters for the experiment.
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.
- set_network(networks: dict, learning_rates: float)[source]¶
Sets up the networks, optimizer, and loss function for the agent.
Args:¶
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.
Raises:¶
AssertionError: If required network attributes (e.g.,
model) are not set.ValueError: If the length of the
learning_rateslist does not match the number of networks provided.
- update_setting(config)[source]¶
Updates the agent’s configuration settings.
Args:¶
config: Configuration object containing updated parameters.
- get_action(state, epsilon_greedy=False)[source]¶
Selects an action based on the current state and exploration strategy.
Args:¶
state (torch.Tensor): The current state.
epsilon_greedy (bool): Whether to use epsilon-greedy exploration.
Returns:¶
numpy.ndarray: The selected action(s).
- train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]¶
Trains the agent for one episode in a parallelized environment.
Args:¶
envs: List of environments for training.
seeds: Seeds for reproducibility.
para_mode (str): Parallelization mode for the environments.
compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
tb_logger: TensorBoard logger for logging training metrics.
required_info (dict): Additional information required from the environment.
Returns:¶
tuple: A boolean indicating whether training has ended and a dictionary with training metrics.
- rollout_episode(env, seed=None, required_info={})[source]¶
Executes a single episode in the environment without training.
Args:¶
env: The environment for the rollout.
seed (int, optional): Seed for reproducibility.
required_info (dict): Additional information required from the environment.
Returns:¶
dict: A dictionary containing episode results such as return, cost, and metadata.
- log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, Reward, predict_Q, target_Q, extra_info={})[source]¶
Logs training metrics to TensorBoard.
Args:¶
tb_logger: TensorBoard logger for logging training metrics.
mini_step (int): Current mini-batch step.
grad_norms (tuple): Gradient norms for the networks.
loss (torch.Tensor): Training loss.
Return (torch.Tensor): Episode return.
Reward (torch.Tensor): Target reward.
predict_Q (torch.Tensor): Predicted Q-values.
target_Q (torch.Tensor): Target Q-values.
extra_info (dict): Additional information to log.