src.baseline.metabbo.madac

Module Contents

Classes

MultiAgentQNet

MADAC

Introduction

Multi-agent dynamic algorithm configuration in which one agent works for one type of configuration hyperparameter.It rmulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.

Config

API

class src.baseline.metabbo.madac.MultiAgentQNet(input_shape, agent_configs)[source]

Bases: torch.nn.Module

Initialization

Args: input_shape (int): 输入特征维度 agent_configs (list of dict): 每个 agent 的配置字典,包括 ‘name’, ‘n_actions’, ‘n_valid_actions’

forward(obs)[source]
class src.baseline.metabbo.madac.MADAC(config)[source]

Bases: src.rl.vdn.VDN_Agent

Introduction

Multi-agent dynamic algorithm configuration in which one agent works for one type of configuration hyperparameter.It rmulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.

Original paper

Multi-agent dynamic algorithm configuration.” Advances in Neural Information Processing Systems 35 (2022): 20147-20161.

Official Implementation

MADAC

Args:

  • config (Namespace): A configuration object containing all necessary hyperparameters and settings for the agent and environment.

Attributes:

  • gamma (float): Discount factor for future rewards.

  • n_act (int): Number of actions per agent.

  • epsilon_start (float): Initial value of epsilon for epsilon-greedy exploration.

  • epsilon_end (float): Final value of epsilon after decay.

  • epsilon_decay_steps (int): Number of steps over which epsilon decays.

  • max_grad_norm (float): Maximum norm for gradient clipping.

  • memory_size (int): Size of the replay buffer.

  • batch_size (int): Number of samples per training batch.

  • warm_up_size (int): Number of steps before training starts.

  • chunk_size (int): Size of sequence chunks for training.

  • update_iter (int): Number of update iterations per training step.

  • device (str): Device to use for computation (‘cuda’ or ‘cpu’).

  • n_agent (int): Number of agents in the environment.

  • available_action (list): List specifying the number of available actions for each agent.

  • optimizer (str): Optimizer to use for training.

  • criterion (str): Loss function to use for training.

  • target_update_interval (int): Frequency (in steps) to update the target network.

  • required_info (dict): Dictionary specifying required information for logging or evaluation.

  • agent_save_dir (str): Directory path for saving agent checkpoints.

Methods:

  • init(self, config): Initializes the MADAC agent with the specified configuration.

  • str(self): Returns the string representation of the agent (“MADAC”).

Initialization

Initializes the VDN agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
class src.baseline.metabbo.madac.Config[source]

Initialization