src.baseline.metabbo.rlemmo¶
Module Contents¶
Classes¶
Introduction¶RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. RLEMMO adopts a meta-black-box optimization framework, maintains a population of solutions, and combines a reinforcement learning agent to flexibly adjust individual-level search strategies to match the current optimization state, thereby improving the search performance for MMOPs. Specifically, RLEMMO encodes terrain characteristics and evolution path information into each individual, and then uses an attention network to promote group information sharing. Through a new reward mechanism that encourages quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. In the CEC2013 MMOP benchmark, RLEMMO’s optimization performance outperforms several strong baseline algorithms, demonstrating its competitiveness in solving multimodal optimization problems. |
API¶
- class src.baseline.metabbo.rlemmo.Actor(embedding_dim, hidden_dim, n_heads_actor, n_layers, normalization, hidden_dim1, hidden_dim2, output_dim, global_dim, local_dim, ind_dim)[source]¶
Bases:
torch.nn.ModuleInitialization
- class src.baseline.metabbo.rlemmo.Critic(input_dim, hidden_dim1, hidden_dim2)[source]¶
Bases:
torch.nn.ModuleInitialization
- class src.baseline.metabbo.rlemmo.RLEMMO(config)[source]¶
Bases:
src.rl.ppo.PPO_AgentIntroduction¶
RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. RLEMMO adopts a meta-black-box optimization framework, maintains a population of solutions, and combines a reinforcement learning agent to flexibly adjust individual-level search strategies to match the current optimization state, thereby improving the search performance for MMOPs. Specifically, RLEMMO encodes terrain characteristics and evolution path information into each individual, and then uses an attention network to promote group information sharing. Through a new reward mechanism that encourages quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. In the CEC2013 MMOP benchmark, RLEMMO’s optimization performance outperforms several strong baseline algorithms, demonstrating its competitiveness in solving multimodal optimization problems.
Original Paper¶
“RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning..”
Official Implementation¶
None
Application Scenario¶
multi-modal optimization problems(MMOP)
Args:¶
`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.Attributes:¶
config (object): Stores the configuration object with hyperparameters and settings. output_dim (int): Dimension of the output layer for the actor network. global_dim (int): Dimension of the global input features. local_dim (int): Dimension of the local input features. ind_dim (int): Dimension of the individual input features. learning_time (int): Tracks the number of learning steps performed. cur_checkpoint (int): Tracks the current checkpoint for saving the model.
Methods:¶
__str__(): Returns the string representation of the class. train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}): Trains the agent for one episode using the PPO algorithm. Args: envs (list): List of environments for training. seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization. para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments. compute_resource (dict): Dictionary specifying compute resources (e.g., num_cpus, num_gpus). tb_logger (object): TensorBoard logger for logging training metrics. required_info (dict): Additional information required from the environment. Returns: is_train_ended (bool): Indicates whether training has reached the maximum learning step. return_info (dict): Dictionary containing training metrics and results. rollout_episode(env, seed=None, required_info={}): Executes a single rollout episode without training. Args: env (object): Environment for the rollout. seed (Optional[int]): Seed for environment initialization. required_info (dict): Additional information required from the environment. Returns: results (dict): Dictionary containing rollout metrics and results.
Returns:¶
NoneRaises:¶
NoneInitialization
Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.
Args:¶
config: Configuration object containing all necessary parameters for the experiment.
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.