src.rl.utils

Module Contents

Classes

Memory

Introduction

A class to store and manage the memory required for reinforcement learning algorithms. It keeps track of actions, states, log probabilities, and rewards during an episode and provides functionality to clear the stored memory.

ReplayBuffer

Introduction

The ReplayBuffer class is a utility for storing and sampling experiences in reinforcement learning. It uses a fixed-size buffer to store transitions (state, action, reward, next state, done) and provides methods to append new experiences and sample mini-batches for training. This class is essential for implementing experience replay, which helps stabilize and improve the learning process in reinforcement learning algorithms.

ReplayBuffer_torch

Introduction

The ReplayBuffer_torch class implements a replay buffer for reinforcement learning using PyTorch. It is designed to store and sample transitions (state, action, reward, next_state, done) efficiently, enabling agents to learn from past experiences. The buffer supports fixed capacity and operates in a circular manner, overwriting old transitions when full.

MultiAgent_ReplayBuffer

Introduction

The MultiAgent_ReplayBuffer class is designed for multi-agent reinforcement learning. It stores transitions for multiple agents and supports sampling chunks of transitions for training.

Functions

clip_grad_norms

Clips the norms for all param groups to max_norm and returns gradient norms before clipping

save_class

Introduction

Saves a Python object (class instance) to a file in pickle format.

API

class src.rl.utils.Memory[source]

Introduction

A class to store and manage the memory required for reinforcement learning algorithms. It keeps track of actions, states, log probabilities, and rewards during an episode and provides functionality to clear the stored memory.

Methods:

  • init(): Initializes the memory by creating empty lists for actions, states, log probabilities, and rewards.

  • clear_memory(): Clears the stored memory by deleting the lists of actions, states, log probabilities, and rewards.

Initialization

Initializes the memory by creating empty lists for actions, states, log probabilities, and rewards.

clear_memory()[source]

Clears the stored memory by deleting the lists of actions, states, log probabilities, and rewards.

class src.rl.utils.ReplayBuffer(max_size)[source]

Introduction

The ReplayBuffer class is a utility for storing and sampling experiences in reinforcement learning. It uses a fixed-size buffer to store transitions (state, action, reward, next state, done) and provides methods to append new experiences and sample mini-batches for training. This class is essential for implementing experience replay, which helps stabilize and improve the learning process in reinforcement learning algorithms.

Args

  • max_size (int): The maximum number of experiences the buffer can hold.

Attributes

  • buffer (collections.deque): A deque object that stores the experiences with a fixed maximum size.

Methods

  • append(exp): Adds a new experience to the buffer.

  • sample(batch_size): Samples a mini-batch of experiences from the buffer.

  • __len__(): Returns the current number of experiences stored in the buffer.

Initialization

Initializes the ReplayBuffer with a fixed maximum size.

Args:

  • max_size (int): The maximum number of experiences the buffer can hold.

append(exp)[source]

Adds a new experience to the buffer.

Args:

  • exp (tuple): A tuple representing a transition (state, action, reward, next state, done).

sample(batch_size)[source]

Samples a mini-batch of experiences from the buffer.

Args:

  • batch_size (int): The number of experiences to sample.

Returns:

  • tuple: A tuple containing batches of observations, actions, rewards, next observations, and done flags.

Raises:

  • ValueError: If the requested batch size exceeds the number of stored experiences.

__len__()[source]

Returns the current number of experiences stored in the buffer.

Returns:

  • int: The number of experiences in the buffer.

class src.rl.utils.ReplayBuffer_torch(capacity, state_dim, device)[source]

Introduction

The ReplayBuffer_torch class implements a replay buffer for reinforcement learning using PyTorch. It is designed to store and sample transitions (state, action, reward, next_state, done) efficiently, enabling agents to learn from past experiences. The buffer supports fixed capacity and operates in a circular manner, overwriting old transitions when full.

Args

  • capacity (int): The maximum number of transitions the buffer can store.

  • state_dim (int): The dimensionality of the state space.

  • device (torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.

Attributes

  • capacity (int): The maximum number of transitions the buffer can store.

  • device (torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.

  • position (int): The current position in the buffer where the next transition will be stored.

  • size (int): The current number of transitions stored in the buffer.

  • states (torch.Tensor): A tensor storing the states of transitions.

  • actions (torch.Tensor): A tensor storing the actions of transitions.

  • rewards (torch.Tensor): A tensor storing the rewards of transitions.

  • next_states (torch.Tensor): A tensor storing the next states of transitions.

  • dones (torch.Tensor): A tensor storing the done flags of transitions.

Methods

  • append(state, action, reward, next_state, done): Adds a new transition to the buffer. Overwrites the oldest transition if the buffer is full.

  • sample(batch_size): Samples a batch of transitions from the buffer.

  • __len__(): Returns the current number of transitions stored in the buffer.

Initialization

Initializes the ReplayBuffer_torch with a fixed capacity and state dimensionality.

Args:

  • capacity (int): The maximum number of transitions the buffer can store.

  • state_dim (int): The dimensionality of the state space.

  • device (torch.device): The device (CPU or GPU) on which the buffer’s tensors are stored.

append(state, action, reward, next_state, done)[source]

Adds a new transition to the buffer. Overwrites the oldest transition if the buffer is full.

Args:

  • state (torch.Tensor): The current state.

  • action (int): The action taken.

  • reward (float): The reward received.

  • next_state (torch.Tensor): The next state.

  • done (bool): Whether the episode is done.

sample(batch_size)[source]

Samples a batch of transitions from the buffer.

Args:

  • batch_size (int): The number of transitions to sample.

Returns:

  • tuple: A tuple of tensors (states, actions, rewards, next_states, dones) representing a batch of sampled transitions.

__len__()[source]

Returns the current number of transitions stored in the buffer.

Returns:

  • int: The number of transitions in the buffer.

class src.rl.utils.MultiAgent_ReplayBuffer(max_size)[source]

Introduction

The MultiAgent_ReplayBuffer class is designed for multi-agent reinforcement learning. It stores transitions for multiple agents and supports sampling chunks of transitions for training.

Args

  • max_size (int): The maximum number of transitions the buffer can hold.

Attributes

  • buffer (collections.deque): A deque object that stores the transitions with a fixed maximum size.

Methods

  • append(transition): Adds a new transition to the buffer.

  • sample_chunk(batch_size, chunk_size): Samples chunks of transitions for training.

  • __len__(): Returns the current number of transitions stored in the buffer.

Initialization

Initializes the MultiAgent_ReplayBuffer with a fixed maximum size.

Args:

  • max_size (int): The maximum number of transitions the buffer can hold.

append(transition)[source]

Adds a new transition to the buffer.

Args:

  • transition (tuple): A tuple representing a transition for multiple agents.

sample_chunk(batch_size, chunk_size)[source]

Samples chunks of transitions for training.

Args:

  • batch_size (int): The number of chunks to sample.

  • chunk_size (int): The size of each chunk.

Returns:

  • tuple: Tensors representing sampled chunks of transitions for states, actions, rewards, next states, and done flags.

__len__()[source]

Returns the current number of transitions stored in the buffer.

Returns:

  • int: The number of transitions in the buffer.

src.rl.utils.clip_grad_norms(param_groups, max_norm=math.inf)[source]

Clips the norms for all param groups to max_norm and returns gradient norms before clipping

Args:

  • param_groups (list): A list of parameter groups, typically from an optimizer.

  • max_norm (float): The maximum allowable norm for gradients.

Returns:

  • tuple: A tuple containing lists of gradient norms before and after clipping.

src.rl.utils.save_class(dir, file_name, saving_class)[source]

Introduction

Saves a Python object (class instance) to a file in pickle format.

Args:

  • dir (str): The directory where the file will be saved. If the directory does not exist, it will be created.

  • file_name (str): The name of the file (without extension) to save the object.

  • saving_class (object): The Python object (class instance) to be saved.

Raises:

  • OSError: If there is an issue creating the directory or writing the file.

Notes:

  • The saved file will have a .pkl extension.