src.baseline.metabbo.rlemmo

Module Contents

Classes

mySequential

Actor

Critic

RLEMMO

Introduction

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. RLEMMO adopts a meta-black-box optimization framework, maintains a population of solutions, and combines a reinforcement learning agent to flexibly adjust individual-level search strategies to match the current optimization state, thereby improving the search performance for MMOPs. Specifically, RLEMMO encodes terrain characteristics and evolution path information into each individual, and then uses an attention network to promote group information sharing. Through a new reward mechanism that encourages quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. In the CEC2013 MMOP benchmark, RLEMMO’s optimization performance outperforms several strong baseline algorithms, demonstrating its competitiveness in solving multimodal optimization problems.

API

class src.baseline.metabbo.rlemmo.mySequential[source]

Bases: torch.nn.Sequential

forward(*inputs)[source]
class src.baseline.metabbo.rlemmo.Actor(embedding_dim, hidden_dim, n_heads_actor, n_layers, normalization, hidden_dim1, hidden_dim2, output_dim, global_dim, local_dim, ind_dim)[source]

Bases: torch.nn.Module

Initialization

get_parameter_number()[source]
forward(x_in, fixed_action=None, require_entropy=False, to_critic=False, only_critic=False, sampling=True)[source]
class src.baseline.metabbo.rlemmo.Critic(input_dim, hidden_dim1, hidden_dim2)[source]

Bases: torch.nn.Module

Initialization

forward(h_features)[source]
class src.baseline.metabbo.rlemmo.RLEMMO(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. RLEMMO adopts a meta-black-box optimization framework, maintains a population of solutions, and combines a reinforcement learning agent to flexibly adjust individual-level search strategies to match the current optimization state, thereby improving the search performance for MMOPs. Specifically, RLEMMO encodes terrain characteristics and evolution path information into each individual, and then uses an attention network to promote group information sharing. Through a new reward mechanism that encourages quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. In the CEC2013 MMOP benchmark, RLEMMO’s optimization performance outperforms several strong baseline algorithms, demonstrating its competitiveness in solving multimodal optimization problems.

Original Paper

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning..”

Official Implementation

None

Application Scenario

multi-modal optimization problems(MMOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Stores the configuration object with hyperparameters and settings.
output_dim (int): Dimension of the output layer for the actor network.
global_dim (int): Dimension of the global input features.
local_dim (int): Dimension of the local input features.
ind_dim (int): Dimension of the individual input features.
learning_time (int): Tracks the number of learning steps performed.
cur_checkpoint (int): Tracks the current checkpoint for saving the model.

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode using the PPO algorithm.
    Args:
        envs (list): List of environments for training.
        seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization.
        para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments.
        compute_resource (dict): Dictionary specifying compute resources (e.g., num_cpus, num_gpus).
        tb_logger (object): TensorBoard logger for logging training metrics.
        required_info (dict): Additional information required from the environment.
    Returns:
        is_train_ended (bool): Indicates whether training has reached the maximum learning step.
        return_info (dict): Dictionary containing training metrics and results.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode without training.
    Args:
        env (object): Environment for the rollout.
        seed (Optional[int]): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    Returns:
        results (dict): Dictionary containing rollout metrics and results.

Returns:

None

Raises:

None

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.ppo.Optional[src.rl.ppo.Union[int, src.rl.ppo.List[int], src.rl.ppo.np.ndarray]], para_mode: src.rl.ppo.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]