src.environment.optimizer.gleet_optimizer¶
Module Contents¶
Classes¶
Introduction¶GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning. |
API¶
- class src.environment.optimizer.gleet_optimizer.GLEET_Optimizer(config)[source]¶
Bases:
src.environment.optimizer.learnable_optimizer.Learnable_OptimizerIntroduction¶
GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning.
Original paper¶
“Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning.” Proceedings of the Genetic and Evolutionary Computation Conference (2024).
Official Implementation¶
Initialization
Introduction¶
Initializes the optimizer with the provided configuration and sets up internal parameters for optimization.
Args:¶
config (object): Config object containing optimizer settings.
The Attributes needed for the GLEET_Optimizer are the following:
log_interval (int): Interval at which logs are recorded.Default is config.maxFEs/config.n_logpoint.
n_logpoint (int): Number of log points to record.Default is 50.
full_meta_data (bool): Flag indicating whether to use full meta data.Default is False.
maxFEs (int): Maximum number of function evaluations.
__FEs (int): Counter for the number of function evaluations.Default is 0.
__config (object): Stores the config object from src/config.py.
PS (int): Population size.Default is 100.
Built-in Attribute:¶
self.__config (object): Stores the configuration object.
self.w_decay (bool): Flag to determine weight decay usage.Default is True.
self.w (float): Inertia weight, set based on
w_decay.Default is 0.9 ifw_decayis True, otherwise 0.729.self.c (float): Acceleration coefficient.Default is 4.1.
self.reward_scale (int): Scaling factor for rewards.Default is 100.
self.ps (int): Population size or related parameter.Default is 100.
self.no_improve (int): Counter for iterations without improvement.Default is 0.
self.boarder_method (str): Method for handling boundaries.Default is ‘clipping’.
self.reward_func (str): Reward function type.Default is ‘direct’.
self.fes (Any): Tracks function evaluations (initialized as None).Default is None.
self.cost (Any): Tracks cost (initialized as None).Default is None.
self.log_index (Any): Logging index (initialized as None).Default is None.
self.log_interval (int): Interval for logging progress.
Returns:¶
None
- __str__()[source]¶
Introduction¶
Returns a string representation of the GLEET optimizer instance.
Returns:¶
str: The name of the optimizer, “GLEET_Optimizer”.
- initialize_particles(problem)[source]¶
Introduction¶
Initializes the particles for a particle swarm optimization (PSO) algorithm by generating random positions and velocities, evaluating initial costs, and setting up personal and global bests.
Args:¶
problem (object): The problem object, which has attributes
lb(lower bounds),ub(upper bounds), and be compatible with theget_costsmethod.
Returns:¶
None: This method updates the internal state of the optimizer by initializing the
particlesattribute with positions, velocities, costs, and bests.
Notes:¶
The method uses the optimizer’s random number generator (
self.rng) and assumes the existence of attributes such asps(particle size),dim(problem dimensionality), andmax_velocity.The
particlesdictionary stores all relevant information for each particle, including current and best positions, costs, velocities, and the global best.
- get_cat_xy()[source]¶
Introduction¶
Concatenates the current, personal best, and global best positions and their corresponding cost/fitness values for all particles in the optimizer.
Returns:¶
np.ndarray: A concatenated NumPy array containing the current positions and costs, personal best positions and values, and global best positions and values for all particles.
Notes:¶
The method assumes that the
self.particlesdictionary contains the keys: ‘current_position’, ‘c_cost’, ‘pbest_position’, ‘pbest’, ‘gbest_position’, and ‘gbest_val’.The concatenation is performed along the last axis for position-value pairs and along the first axis to combine all groups.
- init_population(problem)[source]¶
Introduction¶
Initializes the population and related state variables for the optimizer, preparing it for a new optimization run.
Args:¶
problem (object): An object representing the optimization problem, expected to have attributes
ub(upper bounds) andlb(lower bounds) for the search space.
Built-in Attribute:¶
self.fes (int): Function evaluation steps, initialized to 0.
self.per_no_improve (np.ndarray): Array to track the number of iterations without improvement for each particle, initialized to zeros.
self.max_velocity (np.ndarray): Maximum velocity for each particle, calculated based on the problem’s bounds.
self.max_dist (float): Maximum distance in the search space, calculated based on the problem’s bounds.
self.no_improve (int): Counter for the number of iterations without improvement, initialized to 0.
self.log_index (int): Index for logging progress, initialized to 1.
self.cost (list): List to store the best cost found at each logging interval, initialized with the global best value.
self.pbest_feature (np.ndarray): Array to store the personal best features of the particles.
self.gbest_feature (np.ndarray): Array to store the global best features of the particles.
self.meta_X (list): List to store the positions of the particles for meta-data logging, if configured.
self.meta_Cost (list): List to store the costs of the particles for meta-data logging, if configured.
Returns:¶
np.ndarray: The concatenated state of the population, including both the population state and additional features, with shape (ps, 27).
Notes:¶
Resets various counters and state variables to their initial values.
Initializes particle positions and velocities.
Optionally stores meta-data if configured.
Prepares features for exploration and exploitation tracking.
- get_costs(position, problem)[source]¶
Introduction¶
Calculates the cost(s) for a given position or set of positions in the search space, updating the function evaluation count.
Args:¶
position (np.ndarray): The position(s) in the search space for which the cost is to be evaluated. Shape is typically (n_samples, n_dimensions).
problem (object): The optimization problem instance, which must provide an
evalmethod and an optionaloptimumattribute.
Returns:¶
np.ndarray or float: The evaluated cost(s) for the given position(s). If
problem.optimumis defined, returns the difference between the evaluated value and the optimum.
Notes:¶
Increments the
fes(function evaluation steps) counter by the number of positions evaluated.
- observe()[source]¶
Introduction¶
Computes and returns a set of normalized features representing the current state of the particle swarm optimizer. These features are used for monitoring or as input to learning-based optimization strategies.
Returns:¶
np.ndarray: A 2D array of shape (ps, 9), where each row contains the following normalized features for each particle:
fea0: Current cost normalized by maximum cost.
fea1: Difference between current cost and global best value, normalized by maximum cost.
fea2: Difference between current cost and personal best, normalized by maximum cost.
fea3: Remaining function evaluations normalized by maximum evaluations.
fea4: Number of iterations without improvement for each particle, normalized by maximum steps.
fea5: Number of iterations without improvement for the whole swarm, normalized by maximum steps.
fea6: Euclidean distance between current position and global best position, normalized by maximum distance.
fea7: Euclidean distance between current position and personal best position, normalized by maximum distance.
fea8: Cosine similarity between the vectors from current to personal best and from current to global best.
Notes:¶
Handles division by zero and NaN values in cosine similarity calculation.
Assumes all required attributes (such as
self.particles,self.max_cost, etc.) are properly initialized.
- gp_cat()[source]¶
Introduction¶
Concatenates the personal best features and the repeated global best feature for all particles.
Returns:¶
np.ndarray: A concatenated array of shape (ps, 18), where
psis the number of particles. The array consists of each particle’s personal best features and the global best feature repeated for each particle.
Notes:¶
Assumes
self.pbest_featureis an array of shape (ps, n_features).Assumes
self.gbest_featureis an array of shape (n_features,).The concatenation is performed along the last axis.
- cal_reward_direct(new_gbest, pre_gbest)[source]¶
Introduction¶
Calculates the direct reward based on the improvement of the global best cost in an optimization process.
Args:¶
new_gbest (float or np.ndarray): The new global best cost(s) after an optimization step.
pre_gbest (float or np.ndarray): The previous global best cost(s) before the optimization step.
Returns:¶
float or np.ndarray: The normalized bonus reward(s) computed as the improvement in global best cost divided by
self.max_cost.
Raises:¶
AssertionError: If any computed reward is less than 0, indicating that the new global best is not better than the previous one.
- cal_reward_11(new_gbest, pre_gbest)[source]¶
Introduction¶
Calculates a reward based on the comparison between the new global best value and the previous global best value.
Args:¶
new_gbest (float): The new global best value obtained.
pre_gbest (float): The previous global best value.
Returns:¶
int: Returns 1 if the new global best is better (i.e., less than) the previous global best, otherwise returns -1.
- cal_reward_relative(new_gbest, pre_gbest)[source]¶
Introduction¶
Calculates the relative reward based on the change in global best values.
Args:¶
new_gbest (float): The new global best value after an optimization step.
pre_gbest (float): The previous global best value before the optimization step.
Returns:¶
float: The relative improvement in the global best value, computed as (pre_gbest - new_gbest) / pre_gbest.
Raises:¶
ZeroDivisionError: If
pre_gbestis zero, as division by zero is not allowed.
- cal_reward_triangle(new_gbest, pre_gbest)[source]¶
Introduction¶
Calculates the reward based on the improvement of the global best cost (gbest) using a triangular reward function.
Args:¶
new_gbest (float): The new global best cost after an optimization step.
pre_gbest (float): The previous global best cost before the optimization step.
Returns:¶
float: The calculated reward, which is non-negative and reflects the improvement in gbest.
Raises:¶
AssertionError: If the computed reward is negative, indicating an unexpected calculation error.
- update(action, problem)[source]¶
Introduction¶
Updates the state of the particle swarm optimizer (PSO) for one iteration based on the given action and problem definition. This includes updating particle velocities and positions, handling boundary conditions, evaluating costs, updating personal and global bests, managing stagnation counters, calculating rewards, and preparing the next state for further optimization or reinforcement learning.
Args:¶
action (np.ndarray): The action(s) to be applied to the particles, typically representing control parameters or decisions for the optimizer.
problem (object): The optimization problem instance, which must provide lower and upper bounds (
lb,ub), and optionally anoptimumattribute.
Returns:¶
next_state (np.ndarray): The updated state representation of the particle population after the current iteration.
reward (float): The reward signal calculated based on the improvement in global best value.
is_end (bool): Flag indicating whether the optimization process has reached its termination condition.
info (dict): Additional information (currently empty, but can be extended for logging or debugging).
Raises:¶
None explicitly, but may raise exceptions if input shapes are inconsistent or if required attributes are missing from
problem.