src.baseline.metabbo.rldeafl

Module Contents

Classes

mySequential

Feature_Extractor

Actor

Actor is a neural network module designed to select mutation and crossover parameters for a differential evolution algorithm.

Critic

RLDEAFL

Introduction

This paper proposes a new reinforcement learning-based adaptive differential evolution algorithm, called RLDE-AFL. The algorithm achieves automatic feature learning for optimization problems by integrating a learnable feature extraction module into the differential evolution algorithm. At the same time, it also adopts a reinforcement learning-driven adaptive algorithm configuration strategy to better adapt to the characteristics of different optimization problems. Specifically, RLDE-AFL uses an attention mechanism neural network and a mantissa-based embedding method to transform the solution population and its target value into expressive optimization problem features. In addition, it introduces a comprehensive algorithm configuration space containing multiple differential evolution operators, and uses reinforcement learning to adaptively adjust these operators and their parameters, thereby improving the algorithm’s behavioral diversity and optimization performance.

API

class src.baseline.metabbo.rldeafl.mySequential[source]

Bases: torch.nn.Sequential

forward(*input)[source]
class src.baseline.metabbo.rldeafl.Feature_Extractor(node_dim=3, hidden_dim=16, n_heads=1, ffh=16, n_layers=1, use_positional_encoding=True, is_train=False, device=None)[source]

Bases: torch.nn.Module

Initialization

set_on_train()[source]
set_off_train()[source]
get_parameter_number()[source]
forward(state)[source]
encode_y(ys, constant=32.0)[source]

Encodes the input tensor ys by separating it into mantissa and exponent parts using PyTorch.

Parameters:

  • ys: Input tensor of shape (bs, n), where bs is the batch size and n is the population size.

  • constant: A scaling constant for the exponent part, default is 32.0.

Returns:

  • encoded_y: The mantissa part of the encoded ys, scaled by 1/10, shape (bs, n).

  • encoded_e: The exponent part of the encoded ys, scaled by the given constant, shape (bs, n).

_run(state)[source]

Processes the input tensors and applies attention mechanisms based on the selected order.

Parameters:

  • xs: Input numpy_array of shape (bs, n, d), where bs is the batch size, n is the population size, and d is the dimension.

  • ys: Input numpy_array of shape (bs, n), representing target values.

Returns:

  • out: Processed output tensor, depending on the attention mechanism used, with shape (bs, pop_size, hidden_dim).

class src.baseline.metabbo.rldeafl.Actor(input_dim, mu_operator, cr_operator, n_mutation, n_crossover)[source]

Bases: torch.nn.Module

Actor is a neural network module designed to select mutation and crossover parameters for a differential evolution algorithm.

This network consists of:

  • An operator selection network to choose a mutation operator.

  • Networks to predict mutation and crossover parameters (means and standard deviations).

Attributes:

  • input_dim: The dimension of the input state.

  • n_operator: The number of mutation operators to choose from.

  • n_mutation: The number of mutation parameters to output.

  • n_crossover: The number of crossover parameters to output.

  • output_dim: The total number of outputs, including the selected operator and mutation/crossover parameters.

  • max_sigma: The maximum value for the standard deviation of mutation/crossover parameters.

  • min_sigma: The minimum value for the standard deviation of mutation/crossover parameters.

Initialization

get_parameter_number()[source]
get_action(x)[source]
forward(x, fixed_action=None, require_entropy=False)[source]
class src.baseline.metabbo.rldeafl.Critic(input_dim)[source]

Bases: torch.nn.Module

Initialization

forward(state)[source]
get_parameter_number()[source]
class src.baseline.metabbo.rldeafl.RLDEAFL(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

This paper proposes a new reinforcement learning-based adaptive differential evolution algorithm, called RLDE-AFL. The algorithm achieves automatic feature learning for optimization problems by integrating a learnable feature extraction module into the differential evolution algorithm. At the same time, it also adopts a reinforcement learning-driven adaptive algorithm configuration strategy to better adapt to the characteristics of different optimization problems. Specifically, RLDE-AFL uses an attention mechanism neural network and a mantissa-based embedding method to transform the solution population and its target value into expressive optimization problem features. In addition, it introduces a comprehensive algorithm configuration space containing multiple differential evolution operators, and uses reinforcement learning to adaptively adjust these operators and their parameters, thereby improving the algorithm’s behavioral diversity and optimization performance.

Original Paper

Reinforcement Learning-based Self-adaptive Differential Evolution through Automated Landscape Feature Learning.”

Official Implementation

RLDE-AFL

Application Scenario

single-object optimization problems(SOOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Stores the configuration object with hyperparameters and settings.
fe (Feature_Extractor): Feature extractor for processing input states.
actor (Actor): Actor network for generating actions.
critic (Critic): Critic network for estimating value functions.
optimizer (torch.optim.Optimizer): Optimizer for training the agent.
learning_time (int): Tracks the number of learning steps performed.
cur_checkpoint (int): Tracks the current checkpoint for saving the model.

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode using the PPO algorithm.
    Args:
        envs (list): List of environments for training.
        seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization.
        para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments.
        compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
        tb_logger (object): TensorBoard logger for tracking metrics.
        required_info (dict): Additional information required from the environment.
    Returns:
        Tuple[bool, dict]: A tuple containing a boolean indicating if training has ended and a dictionary with training information.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode in a given environment.
    Args:
        env (object): The environment for the rollout.
        seed (Optional[int]): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    Returns:
        dict: A dictionary containing rollout results such as costs, returns, and additional information.

Returns:

None

Raises:

None

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.ppo.Optional[src.rl.ppo.Union[int, src.rl.ppo.List[int], src.rl.ppo.np.ndarray]], para_mode: src.rl.ppo.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]