src.rl.qlearning¶
Module Contents¶
Classes¶
Introduction¶The |
API¶
- class src.rl.qlearning.QLearning_Agent(config)[source]¶
Bases:
src.rl.basic_agent.Basic_AgentIntroduction¶
The
QLearning_Agentclass implements a Q-Learning agent for reinforcement learning. This agent uses a tabular Q-learning approach to learn optimal policies in discrete state and action spaces. It supports parallelized environments, epsilon-greedy exploration, and provides methods for training, action selection, and evaluation.Original paper¶
“Learning from Delayed Rewards” (Chapter 5 introduces Q-Learning) The First Journal Publication: “Q-Learning.” 1992 (in Machine Learning journal)
Args¶
config: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.
Attributes¶
gamma(float): Discount factor for future rewards.(default: 0.8)n_act(int): Number of possible actions.(default: 4)n_state(int): Number of possible states.(default: 4)epsilon(float): Exploration rate for epsilon-greedy policy.(default: None)lr_model(float): Learning rate for updating the Q-table.(default: 1)q_table(torch.Tensor): Q-table storing the state-action values.learning_time(int): Counter for the number of learning steps taken.cur_checkpoint(int): Counter for the current checkpoint index.config(object): Configuration object passed during initialization.
Methods¶
__init__(config): Initializes the Q-Learning agent with the given configuration.update_setting(config): Updates the agent’s settings and resets learning time and checkpoints.get_action(state, epsilon_greedy=False): Selects an action based on the current state using an epsilon-greedy policy.train_episode(envs, seeds, para_mode, compute_resource, tb_logger, required_info): Trains the agent for one episode in the given environment(s).rollout_episode(env, seed, required_info): Executes a single episode in the environment without training and returns the results.log_to_tb_train(tb_logger, mini_step, loss, Return, Reward, extra_info): Logs training metrics and additional information to TensorBoard.
Initialization
Initializes the Q-Learning agent with the given configuration.Store the initial agent in the checkpoint directory.
Args:¶
config: Configuration object containing all necessary parameters for the experiment.
- update_setting(config)[source]¶
Updates the agent’s settings and resets learning time and checkpoints.
Args:¶
config: Configuration object containing updated parameters.
- get_action(state, epsilon_greedy=False)[source]¶
Selects an action based on the current state using an epsilon-greedy policy.
Args:¶
state (torch.Tensor): The current state.
epsilon_greedy (bool): Whether to use epsilon-greedy exploration.
Returns:¶
np.ndarray: The selected action(s).
- train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]¶
Trains the agent for one episode in the given environment(s).
Args:¶
envs: List of environments for training.
seeds: Seeds for reproducibility.
para_mode (str): Parallelization mode for the environments.
compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
tb_logger: TensorBoard logger for logging training metrics.
required_info (dict): Additional information required from the environment.
Returns:¶
tuple: A boolean indicating whether training has ended and a dictionary with training metrics.
- rollout_episode(env, seed=None, required_info={})[source]¶
Executes a single episode in the environment without training and returns the results.
Args:¶
env: The environment for the rollout.
seed (int, optional): Seed for reproducibility.
required_info (dict): Additional information required from the environment.
Returns:¶
dict: A dictionary containing evaluation results such as cumulative rewards, environment costs, and metadata.
- log_to_tb_train(tb_logger, mini_step, loss, Return, Reward, extra_info={})[source]¶
Logs training metrics and additional information to TensorBoard.
Args:¶
tb_logger: TensorBoard logger for logging training metrics.
mini_step (int): Current mini-batch step.
loss (torch.Tensor): Training loss.
Return (torch.Tensor): Episode return.
Reward (torch.Tensor): Target reward.
extra_info (dict): Additional information to log.