src.baseline.metabbo.rldas

Module Contents

Classes

Actor

Critic

RLDAS

Introduction

This paper proposes a dynamic algorithm selection method based on deep reinforcement learning, aiming to improve the performance of solving real-parameter optimization problems. The paper points out that evolutionary algorithms (such as differential evolution) perform well in solving real-parameter optimization problems. However, the optimal algorithm parameters corresponding to different problem instances may be different, which poses a challenge to algorithm selection. To this end, the authors designed a deep reinforcement learning framework that can adaptively select the optimal algorithm parameter configuration based on the characteristics of the problem instance. Through experimental verification on a series of benchmark functions, the method shows better performance than traditional differential evolution algorithms.

API

class src.baseline.metabbo.rldas.Actor(dim, optimizer_num, feature_dim, device)[source]

Bases: src.baseline.metabbo.networks.nn.Module

Initialization

Introduction

Initializes the model with multiple embedders, a final embedder, and a main model for processing input features and producing optimizer selection probabilities.

Args:

  • dim (int): The input dimension for each embedder.

  • optimizer_num (int): The number of optimizers, determines how many embedders are created.

  • feature_dim (int): The dimension of the input features for the final embedder.

  • device (torch.device or str): The device on which to place the model components (e.g., ‘cpu’ or ‘cuda’).

Attributes:

  • device: Stores the computation device.

  • embedders (nn.ModuleList): Contains pairs of sequential neural network modules for each optimizer.

  • embedder_final (nn.Sequential): Processes concatenated features from all embedders and input features.

  • model (nn.Sequential): Produces a probability distribution over optimizers using a softmax layer.

forward(obs, fix_action=None, require_entropy=False)[source]
class src.baseline.metabbo.rldas.Critic(dim, optimizer_num, feature_dim, device)[source]

Bases: src.baseline.metabbo.networks.nn.Module

Initialization

forward(obs)[source]
class src.baseline.metabbo.rldas.RLDAS(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

This paper proposes a dynamic algorithm selection method based on deep reinforcement learning, aiming to improve the performance of solving real-parameter optimization problems. The paper points out that evolutionary algorithms (such as differential evolution) perform well in solving real-parameter optimization problems. However, the optimal algorithm parameters corresponding to different problem instances may be different, which poses a challenge to algorithm selection. To this end, the authors designed a deep reinforcement learning framework that can adaptively select the optimal algorithm parameter configuration based on the characteristics of the problem instance. Through experimental verification on a series of benchmark functions, the method shows better performance than traditional differential evolution algorithms.

Original Paper

Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution.” IEEE Transactions on Systems, Man, and Cybernetics: Systems (2024)

Official Implementation

RL-DAS

Application Scenario

single-object optimization problems(SOOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Stores the configuration object.
actor (Actor): The actor network used for policy generation.
critic (Critic): The critic network used for value estimation.
optimizer (torch.optim.Optimizer): Optimizer for training the actor and critic networks.
learning_time (int): Tracks the number of learning steps performed.
cur_checkpoint (int): Tracks the current checkpoint for saving the model.

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode using the PPO algorithm.
    Args:
        envs (list): List of environments for training.
        seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization.
        para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments.
        compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
        tb_logger (Optional): TensorBoard logger for logging training metrics.
        required_info (dict): Additional information required from the environment.
    Returns:
        Tuple[bool, dict]: A tuple containing a boolean indicating if training is complete and a dictionary with training information.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode in a given environment.
    Args:
        env (object): The environment for the rollout.
        seed (Optional): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    Returns:
        dict: A dictionary containing rollout results, including costs, function evaluations, and returns.

Returns:

None

Raises:

None

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.ppo.Optional[src.rl.ppo.Union[int, src.rl.ppo.List[int], src.rl.utils.np.ndarray]], para_mode: src.rl.ppo.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]