`src.baseline.metabbo.surrrlde`¶

Module Contents¶

Classes¶

SurrRLDE

Introduction¶

The paper “Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study” introduces a novel framework, Surr-RLDE, that combines surrogate modeling and reinforcement learning to enhance Meta-Black-Box Optimization (MetaBBO),the authors propose a two-stage learning process: (1) Surrogate learning, where a Kolmogorov-Arnold Network (KAN) is trained using a relative-order-aware loss to accurately approximate objective functions, and (2) Policy learning, where reinforcement learning dynamically configures mutation operators in a Differential Evolution (DE) algorithm. By integrating the surrogate model into policy training, Surr-RLDE significantly reduces evaluation costs while maintaining competitive performance.

API¶

class src.baseline.metabbo.surrrlde.SurrRLDE(config)[source]¶

Bases: src.rl.ddqn.DDQN_Agent

Introduction¶

Original Paper¶

“Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study.” The Genetic and Evolutionary Computation Conference (GECCO 2025)

Official Implementation¶

Surr-RLDE

Application Scenario¶

single-object optimization problems(SOOP), in this implementation, the built-in DDQN_Agent is used as the parent class, since SurrRLDE is based on DDQN.

Raises:¶

None explicitly raised in the provided code, but potential exceptions may occur during tensor operations, environment interactions, or model updates.

Initialization

Args:¶

-config: Configuration object containing all necessary parameters for experiment.For details you can visit config.py. # Built-in Attributes: -config (object): Stores the configuration object. -device (str): Device to be used for computation (‘cpu’ or ‘cuda’). -memory_size (int): Size of the replay buffer. -n_act (int): Number of possible actions. -epsilon (float): Initial epsilon value for epsilon-greedy policy. -gamma (float): Discount factor for future rewards. -max_learning_step (int): Maximum number of learning steps. -cur_checkpoint (int): Current checkpoint index for saving the model. -replay_buffer (ReplayBuffer_torch): Replay buffer for storing experiences. -model (MLP): Neural network model for Q-value prediction.

__str__()[source]¶

get_epsilon(step, start=0.5, end=0.05)[source]¶: Calculates the epsilon value for epsilon-greedy policy based on the current step. -step (int): Current training step. -start (float): Starting epsilon value. -end (float): Minimum epsilon value.

get_action(state, epsilon_greedy=False)[source]¶: Selects an action based on the current state using epsilon-greedy policy. -state (array-like): Current state. -epsilon_greedy (bool): Whether to use epsilon-greedy policy.

train_episode(envs, seeds: src.rl.ddqn.Optional[src.rl.ddqn.Union[int, src.rl.ddqn.List[int], src.rl.ddqn.np.ndarray]], para_mode: src.rl.ddqn.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]¶: Trains the agent for one episode. -envs (list): List of environments. -seeds (int, list, or np.ndarray): Seeds for environment initialization. -para_mode (str): Parallelization mode (‘dummy’, ‘subproc’, ‘ray’, ‘ray-subproc’). -compute_resource (dict): Dictionary specifying computational resources. -tb_logger (object): TensorBoard logger for logging training metrics. -required_info (dict): Additional information required from the environment.

src.baseline.metabbo.surrrlde¶

Module Contents¶

Classes¶

Introduction¶

API¶

Introduction¶

Original Paper¶

Official Implementation¶

Application Scenario¶

Raises:¶

Args:¶

`src.baseline.metabbo.surrrlde`¶