`src.baseline.metabbo.madac`¶

Module Contents¶

Classes¶

`MultiAgentQNet`
`MADAC`	Introduction¶ Multi-agent dynamic algorithm configuration in which one agent works for one type of configuration hyperparameter.It rmulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.
`Config`

API¶

class src.baseline.metabbo.madac.MultiAgentQNet(input_shape, agent_configs)[source]¶

Bases: torch.nn.Module

Initialization

Args: input_shape (int): 输入特征维度 agent_configs (list of dict): 每个 agent 的配置字典，包括 ‘name’, ‘n_actions’, ‘n_valid_actions’

forward(obs)[source]¶

class src.baseline.metabbo.madac.MADAC(config)[source]¶

Bases: src.rl.vdn.VDN_Agent

Introduction¶

Multi-agent dynamic algorithm configuration in which one agent works for one type of configuration hyperparameter.It rmulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm.

Original paper¶

“Multi-agent dynamic algorithm configuration.” Advances in Neural Information Processing Systems 35 (2022): 20147-20161.

Official Implementation¶

MADAC

Args:¶

config (Namespace): A configuration object containing all necessary hyperparameters and settings for the agent and environment.

Attributes:¶

gamma (float): Discount factor for future rewards.
n_act (int): Number of actions per agent.
epsilon_start (float): Initial value of epsilon for epsilon-greedy exploration.
epsilon_end (float): Final value of epsilon after decay.
epsilon_decay_steps (int): Number of steps over which epsilon decays.
max_grad_norm (float): Maximum norm for gradient clipping.
memory_size (int): Size of the replay buffer.
batch_size (int): Number of samples per training batch.
warm_up_size (int): Number of steps before training starts.
chunk_size (int): Size of sequence chunks for training.
update_iter (int): Number of update iterations per training step.
device (str): Device to use for computation (‘cuda’ or ‘cpu’).
n_agent (int): Number of agents in the environment.
available_action (list): List specifying the number of available actions for each agent.
optimizer (str): Optimizer to use for training.
criterion (str): Loss function to use for training.
target_update_interval (int): Frequency (in steps) to update the target network.
required_info (dict): Dictionary specifying required information for logging or evaluation.
agent_save_dir (str): Directory path for saving agent checkpoints.

Methods:¶

init(self, config): Initializes the MADAC agent with the specified configuration.
str(self): Returns the string representation of the agent (“MADAC”).

Initialization

Initializes the VDN agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:¶

config: Configuration object containing all necessary parameters for the experiment.
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.

__str__()[source]¶

class src.baseline.metabbo.madac.Config[source]¶: Initialization

src.baseline.metabbo.madac¶

Module Contents¶

Classes¶

Introduction¶

API¶

Introduction¶

Original paper¶

Official Implementation¶

Args:¶

Attributes:¶

Methods:¶

Args:¶

`src.baseline.metabbo.madac`¶