src.rl.utils¶
Module Contents¶
Classes¶
Introduction¶A class to store and manage the memory required for reinforcement learning algorithms. It keeps track of actions, states, log probabilities, and rewards during an episode and provides functionality to clear the stored memory. |
|
Introduction¶The |
|
Introduction¶The |
|
Introduction¶The |
Functions¶
Clips the norms for all param groups to max_norm and returns gradient norms before clipping |
|
Introduction¶Saves a Python object (class instance) to a file in pickle format. |
API¶
- class src.rl.utils.Memory[source]¶
Introduction¶
A class to store and manage the memory required for reinforcement learning algorithms. It keeps track of actions, states, log probabilities, and rewards during an episode and provides functionality to clear the stored memory.
Methods:¶
init(): Initializes the memory by creating empty lists for actions, states, log probabilities, and rewards.
clear_memory(): Clears the stored memory by deleting the lists of actions, states, log probabilities, and rewards.
Initialization
Initializes the memory by creating empty lists for actions, states, log probabilities, and rewards.
- class src.rl.utils.ReplayBuffer(max_size)[source]¶
Introduction¶
The
ReplayBufferclass is a utility for storing and sampling experiences in reinforcement learning. It uses a fixed-size buffer to store transitions (state, action, reward, next state, done) and provides methods to append new experiences and sample mini-batches for training. This class is essential for implementing experience replay, which helps stabilize and improve the learning process in reinforcement learning algorithms.Args¶
max_size(int): The maximum number of experiences the buffer can hold.
Attributes¶
buffer(collections.deque): A deque object that stores the experiences with a fixed maximum size.
Methods¶
append(exp): Adds a new experience to the buffer.sample(batch_size): Samples a mini-batch of experiences from the buffer.__len__(): Returns the current number of experiences stored in the buffer.
Initialization
Initializes the ReplayBuffer with a fixed maximum size.
Args:¶
max_size (int): The maximum number of experiences the buffer can hold.
- append(exp)[source]¶
Adds a new experience to the buffer.
Args:¶
exp (tuple): A tuple representing a transition (state, action, reward, next state, done).
- sample(batch_size)[source]¶
Samples a mini-batch of experiences from the buffer.
Args:¶
batch_size (int): The number of experiences to sample.
Returns:¶
tuple: A tuple containing batches of observations, actions, rewards, next observations, and done flags.
Raises:¶
ValueError: If the requested batch size exceeds the number of stored experiences.
- class src.rl.utils.ReplayBuffer_torch(capacity, state_dim, device)[source]¶
Introduction¶
The
ReplayBuffer_torchclass implements a replay buffer for reinforcement learning using PyTorch. It is designed to store and sample transitions (state, action, reward, next_state, done) efficiently, enabling agents to learn from past experiences. The buffer supports fixed capacity and operates in a circular manner, overwriting old transitions when full.Args¶
capacity(int): The maximum number of transitions the buffer can store.state_dim(int): The dimensionality of the state space.device(torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.
Attributes¶
capacity(int): The maximum number of transitions the buffer can store.device(torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.position(int): The current position in the buffer where the next transition will be stored.size(int): The current number of transitions stored in the buffer.states(torch.Tensor): A tensor storing the states of transitions.actions(torch.Tensor): A tensor storing the actions of transitions.rewards(torch.Tensor): A tensor storing the rewards of transitions.next_states(torch.Tensor): A tensor storing the next states of transitions.dones(torch.Tensor): A tensor storing the done flags of transitions.
Methods¶
append(state, action, reward, next_state, done): Adds a new transition to the buffer. Overwrites the oldest transition if the buffer is full.sample(batch_size): Samples a batch of transitions from the buffer.__len__(): Returns the current number of transitions stored in the buffer.
Initialization
Initializes the ReplayBuffer_torch with a fixed capacity and state dimensionality.
Args:¶
capacity (int): The maximum number of transitions the buffer can store.
state_dim (int): The dimensionality of the state space.
device (torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.
- append(state, action, reward, next_state, done)[source]¶
Adds a new transition to the buffer. Overwrites the oldest transition if the buffer is full.
Args:¶
state (torch.Tensor): The current state.
action (int): The action taken.
reward (float): The reward received.
next_state (torch.Tensor): The next state.
done (bool): Whether the episode is done.
- class src.rl.utils.MultiAgent_ReplayBuffer(max_size)[source]¶
Introduction¶
The
MultiAgent_ReplayBufferclass is designed for multi-agent reinforcement learning. It stores transitions for multiple agents and supports sampling chunks of transitions for training.Args¶
max_size(int): The maximum number of transitions the buffer can hold.
Attributes¶
buffer(collections.deque): A deque object that stores the transitions with a fixed maximum size.
Methods¶
append(transition): Adds a new transition to the buffer.sample_chunk(batch_size, chunk_size): Samples chunks of transitions for training.__len__(): Returns the current number of transitions stored in the buffer.
Initialization
Initializes the MultiAgent_ReplayBuffer with a fixed maximum size.
Args:¶
max_size (int): The maximum number of transitions the buffer can hold.
- append(transition)[source]¶
Adds a new transition to the buffer.
Args:¶
transition (tuple): A tuple representing a transition for multiple agents.
- sample_chunk(batch_size, chunk_size)[source]¶
Samples chunks of transitions for training.
Args:¶
batch_size (int): The number of chunks to sample.
chunk_size (int): The size of each chunk.
Returns:¶
tuple: Tensors representing sampled chunks of transitions for states, actions, rewards, next states, and done flags.
- src.rl.utils.clip_grad_norms(param_groups, max_norm=math.inf)[source]¶
Clips the norms for all param groups to max_norm and returns gradient norms before clipping
Args:¶
param_groups (list): A list of parameter groups, typically from an optimizer.
max_norm (float): The maximum allowable norm for gradients.
Returns:¶
tuple: A tuple containing lists of gradient norms before and after clipping.
- src.rl.utils.save_class(dir, file_name, saving_class)[source]¶
Introduction¶
Saves a Python object (class instance) to a file in pickle format.
Args:¶
dir (str): The directory where the file will be saved. If the directory does not exist, it will be created.
file_name (str): The name of the file (without extension) to save the object.
saving_class (object): The Python object (class instance) to be saved.
Raises:¶
OSError: If there is an issue creating the directory or writing the file.
Notes:¶
The saved file will have a
.pklextension.