src.baseline.metabbo.rldas¶
Module Contents¶
Classes¶
Introduction¶This paper proposes a dynamic algorithm selection method based on deep reinforcement learning, aiming to improve the performance of solving real-parameter optimization problems. The paper points out that evolutionary algorithms (such as differential evolution) perform well in solving real-parameter optimization problems. However, the optimal algorithm parameters corresponding to different problem instances may be different, which poses a challenge to algorithm selection. To this end, the authors designed a deep reinforcement learning framework that can adaptively select the optimal algorithm parameter configuration based on the characteristics of the problem instance. Through experimental verification on a series of benchmark functions, the method shows better performance than traditional differential evolution algorithms. |
API¶
- class src.baseline.metabbo.rldas.Actor(dim, optimizer_num, feature_dim, device)[source]¶
Bases:
src.baseline.metabbo.networks.nn.ModuleInitialization
Introduction¶
Initializes the model with multiple embedders, a final embedder, and a main model for processing input features and producing optimizer selection probabilities.
Args:¶
dim (int): The input dimension for each embedder.
optimizer_num (int): The number of optimizers, determines how many embedders are created.
feature_dim (int): The dimension of the input features for the final embedder.
device (torch.device or str): The device on which to place the model components (e.g., ‘cpu’ or ‘cuda’).
Attributes:¶
device: Stores the computation device.
embedders (nn.ModuleList): Contains pairs of sequential neural network modules for each optimizer.
embedder_final (nn.Sequential): Processes concatenated features from all embedders and input features.
model (nn.Sequential): Produces a probability distribution over optimizers using a softmax layer.
- class src.baseline.metabbo.rldas.Critic(dim, optimizer_num, feature_dim, device)[source]¶
Bases:
src.baseline.metabbo.networks.nn.ModuleInitialization
- class src.baseline.metabbo.rldas.RLDAS(config)[source]¶
Bases:
src.rl.ppo.PPO_AgentIntroduction¶
This paper proposes a dynamic algorithm selection method based on deep reinforcement learning, aiming to improve the performance of solving real-parameter optimization problems. The paper points out that evolutionary algorithms (such as differential evolution) perform well in solving real-parameter optimization problems. However, the optimal algorithm parameters corresponding to different problem instances may be different, which poses a challenge to algorithm selection. To this end, the authors designed a deep reinforcement learning framework that can adaptively select the optimal algorithm parameter configuration based on the characteristics of the problem instance. Through experimental verification on a series of benchmark functions, the method shows better performance than traditional differential evolution algorithms.
Original Paper¶
“Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution.” IEEE Transactions on Systems, Man, and Cybernetics: Systems (2024)
Official Implementation¶
Application Scenario¶
single-object optimization problems(SOOP)
Args:¶
`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.Attributes:¶
config (object): Stores the configuration object. actor (Actor): The actor network used for policy generation. critic (Critic): The critic network used for value estimation. optimizer (torch.optim.Optimizer): Optimizer for training the actor and critic networks. learning_time (int): Tracks the number of learning steps performed. cur_checkpoint (int): Tracks the current checkpoint for saving the model.
Methods:¶
__str__(): Returns the string representation of the class. train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}): Trains the agent for one episode using the PPO algorithm. Args: envs (list): List of environments for training. seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization. para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments. compute_resource (dict): Resources for computation (e.g., CPUs, GPUs). tb_logger (Optional): TensorBoard logger for logging training metrics. required_info (dict): Additional information required from the environment. Returns: Tuple[bool, dict]: A tuple containing a boolean indicating if training is complete and a dictionary with training information. rollout_episode(env, seed=None, required_info={}): Executes a single rollout episode in a given environment. Args: env (object): The environment for the rollout. seed (Optional): Seed for environment initialization. required_info (dict): Additional information required from the environment. Returns: dict: A dictionary containing rollout results, including costs, function evaluations, and returns.
Returns:¶
NoneRaises:¶
NoneInitialization
Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.
Args:¶
config: Configuration object containing all necessary parameters for the experiment.
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.