src.environment.optimizer.gleet_optimizer

Module Contents

Classes

GLEET_Optimizer

Introduction

GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning.

API

class src.environment.optimizer.gleet_optimizer.GLEET_Optimizer(config)[source]

Bases: src.environment.optimizer.learnable_optimizer.Learnable_Optimizer

Introduction

GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning.

Original paper

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning.” Proceedings of the Genetic and Evolutionary Computation Conference (2024).

Official Implementation

GLEET

Initialization

Introduction

Initializes the optimizer with the provided configuration and sets up internal parameters for optimization.

Args:

  • config (object): Config object containing optimizer settings.

    • The Attributes needed for the GLEET_Optimizer are the following:

      • log_interval (int): Interval at which logs are recorded.Default is config.maxFEs/config.n_logpoint.

      • n_logpoint (int): Number of log points to record.Default is 50.

      • full_meta_data (bool): Flag indicating whether to use full meta data.Default is False.

      • maxFEs (int): Maximum number of function evaluations.

      • __FEs (int): Counter for the number of function evaluations.Default is 0.

      • __config (object): Stores the config object from src/config.py.

      • PS (int): Population size.Default is 100.

Built-in Attribute:

  • self.__config (object): Stores the configuration object.

  • self.w_decay (bool): Flag to determine weight decay usage.Default is True.

  • self.w (float): Inertia weight, set based on w_decay.Default is 0.9 if w_decay is True, otherwise 0.729.

  • self.c (float): Acceleration coefficient.Default is 4.1.

  • self.reward_scale (int): Scaling factor for rewards.Default is 100.

  • self.ps (int): Population size or related parameter.Default is 100.

  • self.no_improve (int): Counter for iterations without improvement.Default is 0.

  • self.boarder_method (str): Method for handling boundaries.Default is ‘clipping’.

  • self.reward_func (str): Reward function type.Default is ‘direct’.

  • self.fes (Any): Tracks function evaluations (initialized as None).Default is None.

  • self.cost (Any): Tracks cost (initialized as None).Default is None.

  • self.log_index (Any): Logging index (initialized as None).Default is None.

  • self.log_interval (int): Interval for logging progress.

Returns:

  • None

__str__()[source]

Introduction

Returns a string representation of the GLEET optimizer instance.

Returns:

  • str: The name of the optimizer, “GLEET_Optimizer”.

initialize_particles(problem)[source]

Introduction

Initializes the particles for a particle swarm optimization (PSO) algorithm by generating random positions and velocities, evaluating initial costs, and setting up personal and global bests.

Args:

  • problem (object): The problem object, which has attributes lb (lower bounds), ub (upper bounds), and be compatible with the get_costs method.

Returns:

  • None: This method updates the internal state of the optimizer by initializing the particles attribute with positions, velocities, costs, and bests.

Notes:

  • The method uses the optimizer’s random number generator (self.rng) and assumes the existence of attributes such as ps (particle size), dim (problem dimensionality), and max_velocity.

  • The particles dictionary stores all relevant information for each particle, including current and best positions, costs, velocities, and the global best.

get_cat_xy()[source]

Introduction

Concatenates the current, personal best, and global best positions and their corresponding cost/fitness values for all particles in the optimizer.

Returns:

  • np.ndarray: A concatenated NumPy array containing the current positions and costs, personal best positions and values, and global best positions and values for all particles.

Notes:

  • The method assumes that the self.particles dictionary contains the keys: ‘current_position’, ‘c_cost’, ‘pbest_position’, ‘pbest’, ‘gbest_position’, and ‘gbest_val’.

  • The concatenation is performed along the last axis for position-value pairs and along the first axis to combine all groups.

init_population(problem)[source]

Introduction

Initializes the population and related state variables for the optimizer, preparing it for a new optimization run.

Args:

  • problem (object): An object representing the optimization problem, expected to have attributes ub (upper bounds) and lb (lower bounds) for the search space.

Built-in Attribute:

  • self.fes (int): Function evaluation steps, initialized to 0.

  • self.per_no_improve (np.ndarray): Array to track the number of iterations without improvement for each particle, initialized to zeros.

  • self.max_velocity (np.ndarray): Maximum velocity for each particle, calculated based on the problem’s bounds.

  • self.max_dist (float): Maximum distance in the search space, calculated based on the problem’s bounds.

  • self.no_improve (int): Counter for the number of iterations without improvement, initialized to 0.

  • self.log_index (int): Index for logging progress, initialized to 1.

  • self.cost (list): List to store the best cost found at each logging interval, initialized with the global best value.

  • self.pbest_feature (np.ndarray): Array to store the personal best features of the particles.

  • self.gbest_feature (np.ndarray): Array to store the global best features of the particles.

  • self.meta_X (list): List to store the positions of the particles for meta-data logging, if configured.

  • self.meta_Cost (list): List to store the costs of the particles for meta-data logging, if configured.

Returns:

  • np.ndarray: The concatenated state of the population, including both the population state and additional features, with shape (ps, 27).

Notes:

  • Resets various counters and state variables to their initial values.

  • Initializes particle positions and velocities.

  • Optionally stores meta-data if configured.

  • Prepares features for exploration and exploitation tracking.

get_costs(position, problem)[source]

Introduction

Calculates the cost(s) for a given position or set of positions in the search space, updating the function evaluation count.

Args:

  • position (np.ndarray): The position(s) in the search space for which the cost is to be evaluated. Shape is typically (n_samples, n_dimensions).

  • problem (object): The optimization problem instance, which must provide an eval method and an optional optimum attribute.

Returns:

  • np.ndarray or float: The evaluated cost(s) for the given position(s). If problem.optimum is defined, returns the difference between the evaluated value and the optimum.

Notes:

  • Increments the fes (function evaluation steps) counter by the number of positions evaluated.

observe()[source]

Introduction

Computes and returns a set of normalized features representing the current state of the particle swarm optimizer. These features are used for monitoring or as input to learning-based optimization strategies.

Returns:

  • np.ndarray: A 2D array of shape (ps, 9), where each row contains the following normalized features for each particle:

    • fea0: Current cost normalized by maximum cost.

    • fea1: Difference between current cost and global best value, normalized by maximum cost.

    • fea2: Difference between current cost and personal best, normalized by maximum cost.

    • fea3: Remaining function evaluations normalized by maximum evaluations.

    • fea4: Number of iterations without improvement for each particle, normalized by maximum steps.

    • fea5: Number of iterations without improvement for the whole swarm, normalized by maximum steps.

    • fea6: Euclidean distance between current position and global best position, normalized by maximum distance.

    • fea7: Euclidean distance between current position and personal best position, normalized by maximum distance.

    • fea8: Cosine similarity between the vectors from current to personal best and from current to global best.

Notes:

  • Handles division by zero and NaN values in cosine similarity calculation.

  • Assumes all required attributes (such as self.particles, self.max_cost, etc.) are properly initialized.

gp_cat()[source]

Introduction

Concatenates the personal best features and the repeated global best feature for all particles.

Returns:

  • np.ndarray: A concatenated array of shape (ps, 18), where ps is the number of particles. The array consists of each particle’s personal best features and the global best feature repeated for each particle.

Notes:

  • Assumes self.pbest_feature is an array of shape (ps, n_features).

  • Assumes self.gbest_feature is an array of shape (n_features,).

  • The concatenation is performed along the last axis.

cal_reward_direct(new_gbest, pre_gbest)[source]

Introduction

Calculates the direct reward based on the improvement of the global best cost in an optimization process.

Args:

  • new_gbest (float or np.ndarray): The new global best cost(s) after an optimization step.

  • pre_gbest (float or np.ndarray): The previous global best cost(s) before the optimization step.

Returns:

  • float or np.ndarray: The normalized bonus reward(s) computed as the improvement in global best cost divided by self.max_cost.

Raises:

  • AssertionError: If any computed reward is less than 0, indicating that the new global best is not better than the previous one.

cal_reward_11(new_gbest, pre_gbest)[source]

Introduction

Calculates a reward based on the comparison between the new global best value and the previous global best value.

Args:

  • new_gbest (float): The new global best value obtained.

  • pre_gbest (float): The previous global best value.

Returns:

  • int: Returns 1 if the new global best is better (i.e., less than) the previous global best, otherwise returns -1.

cal_reward_relative(new_gbest, pre_gbest)[source]

Introduction

Calculates the relative reward based on the change in global best values.

Args:

  • new_gbest (float): The new global best value after an optimization step.

  • pre_gbest (float): The previous global best value before the optimization step.

Returns:

  • float: The relative improvement in the global best value, computed as (pre_gbest - new_gbest) / pre_gbest.

Raises:

  • ZeroDivisionError: If pre_gbest is zero, as division by zero is not allowed.

cal_reward_triangle(new_gbest, pre_gbest)[source]

Introduction

Calculates the reward based on the improvement of the global best cost (gbest) using a triangular reward function.

Args:

  • new_gbest (float): The new global best cost after an optimization step.

  • pre_gbest (float): The previous global best cost before the optimization step.

Returns:

  • float: The calculated reward, which is non-negative and reflects the improvement in gbest.

Raises:

  • AssertionError: If the computed reward is negative, indicating an unexpected calculation error.

update(action, problem)[source]

Introduction

Updates the state of the particle swarm optimizer (PSO) for one iteration based on the given action and problem definition. This includes updating particle velocities and positions, handling boundary conditions, evaluating costs, updating personal and global bests, managing stagnation counters, calculating rewards, and preparing the next state for further optimization or reinforcement learning.

Args:

  • action (np.ndarray): The action(s) to be applied to the particles, typically representing control parameters or decisions for the optimizer.

  • problem (object): The optimization problem instance, which must provide lower and upper bounds (lb, ub), and optionally an optimum attribute.

Returns:

  • next_state (np.ndarray): The updated state representation of the particle population after the current iteration.

  • reward (float): The reward signal calculated based on the improvement in global best value.

  • is_end (bool): Flag indicating whether the optimization process has reached its termination condition.

  • info (dict): Additional information (currently empty, but can be extended for logging or debugging).

Raises:

  • None explicitly, but may raise exceptions if input shapes are inconsistent or if required attributes are missing from problem.