src.environment.optimizer.rlemmo_optimizer¶
Module Contents¶
Classes¶
RLEMMO_Optimizer¶A reinforcement learning-based evolutionary multi-modal optimizer (RLEMMO) that extends |
API¶
- class src.environment.optimizer.rlemmo_optimizer.RLEMMO_Optimizer(config)[source]¶
Bases:
src.environment.optimizer.learnable_optimizer.Learnable_OptimizerRLEMMO_Optimizer¶
A reinforcement learning-based evolutionary multi-modal optimizer (RLEMMO) that extends
Learnable_Optimizer. This optimizer is designed for multi-modal optimization problems and leverages a population-based approach with multiple mutation strategies, neighborhood structures, and reward mechanisms.Introduction¶
RLEMMO_Optimizer maintains a population of candidate solutions and applies various evolutionary operators (actions) to explore and exploit the search space. It uses neighborhood information, clustering, and reinforcement learning-inspired mechanisms to adaptively guide the search process. The optimizer is suitable for problems with multiple global optima and supports meta-data collection for analysis.
Initialization
Introduction¶
Initializes the RLEMmoOptimizer with the provided configuration and sets default hyperparameters and attributes.
Args:¶
config (dict): Configuration dictionary containing optimizer settings.
The attributes needed for the RLEMMO are the following:
Attributes:¶
ps (int): Population size, default is 100.
k_neighbors (int): Number of neighbors for local search, default is 4.
n_action (int): Number of possible actions, default is 5.
FF (float): Differential evolution scaling factor, default is 0.5.
CR (float): Crossover probability, default is 0.9.
eps (float): Epsilon value for exploration, default is 0.2.
min_samples (int): Minimum samples for clustering or selection, default is 3.
reward_scale (int): Scaling factor for rewards, default is 1000.
fes (Any): Function evaluation counter, initialized as None.
cost (Any): Cost value, initialized as None.
pr (Any): Probability-related attribute, initialized as None.
sr (Any): Success rate or similar metric, initialized as None.
log_index (Any): Logging index, initialized as None.
log_interval (Any): Logging interval, initialized as None.
archive (Any): Archive for storing solutions, initialized as None.
archive_val (Any): Archive for storing solution values, initialized as None.
Raises:¶
None
- __str__()[source]¶
Introduction¶
Returns a string representation of the RLEMMO_Optimizer object.
Returns:¶
str: The name of the optimizer, “RLEMMO_Optimizer”.
- get_costs(position, problem)[source]¶
Introduction¶
Calculates the cost values for a given set of positions using the provided optimization problem.
Args:¶
position (np.ndarray): An array representing one or more candidate solutions (positions) to be evaluated.
problem (object): An optimization problem instance that must have
evalandoptimumattributes.
Built-in Attribute:¶
self.fes (int): Increments the function evaluation counter by the number of positions evaluated.
Returns:¶
np.ndarray: The cost values for each position, computed as the difference between the evaluated value and the problem’s optimum.
Raises:¶
AttributeError: If
problemdoes not have the requiredevalmethod oroptimumattribute.
- find_nei(pop_dist)[source]¶
Introduction¶
Constructs a neighbor matrix for a population based on pairwise distances, marking the k-nearest neighbors for each individual.
Args:¶
pop_dist (np.ndarray): A 2D square numpy array of shape (ps, ps) representing pairwise distances between population members.
Returns:¶
np.ndarray: A binary matrix of shape (ps, ps) where entry (i, j) is 1 if individual j is among the k-nearest neighbors of individual i, otherwise 0.
Notes:¶
The diagonal of
pop_distis set to infinity to exclude self-neighbors.The number of neighbors considered is determined by
self.k_neighbors.The population size is determined by
self.ps.
- initialize_individuals(problem)[source]¶
Introduction¶
Initializes the population of individuals for the optimizer, setting up their positions, costs, neighborhood relationships, and best-known solutions.
Args:¶
problem (object): An object representing the optimization problem, which must have the attributes
dim(int, dimensionality of the problem),ub(np.ndarray or float, upper bounds), andlb(np.ndarray or float, lower bounds).
Side Effects:¶
Initializes and stores the following attributes in
self.individuals:‘current_position’: np.ndarray of shape (ps, dim), the positions of all individuals.
‘c_cost’: np.ndarray of shape (ps,), the cost of each individual.
‘pop_dist’: np.ndarray of shape (ps, ps), pairwise distances between individuals.
‘neighbor_matrix’: np.ndarray of shape (ps, ps), neighborhood relationships.
‘gbest_position’: np.ndarray of shape (dim,), the position of the global best individual.
‘gbest_val’: float, the cost of the global best individual.
‘no_improve’: int, counter for global no improvement.
‘lbest_position’: list of np.ndarray, best positions in each individual’s neighborhood.
‘lbest_val’: list of float, best costs in each individual’s neighborhood.
‘local_no_improve’: np.ndarray of shape (ps,), counters for local no improvement.
‘per_no_improve’: np.ndarray of shape (ps,), counters for personal no improvement.
Sets
self.max_costto the maximum cost in the initial population.Sets
self.gbest_valto the global best cost.
Returns:¶
None
- init_population(problem)[source]¶
Introduction¶
Initializes the population and related state variables for the optimizer based on the provided problem instance.
Args:¶
problem (object): An object representing the optimization problem, expected to have attributes such as
maxfes,dim,ub, andlb.
Returns:¶
np.ndarray: The initial state of the population, including population state, exploration state, and exploitation state.
Notes:¶
Sets up internal counters and logging intervals.
Initializes individuals and their costs.
Calculates and stores initial performance metrics (pr, sr).
Optionally collects meta-data if configured.
- observe()[source]¶
Introduction¶
Computes and returns a set of state features for each individual in the population, capturing various statistics and relationships relevant to the optimizer’s environment. These features are used for monitoring or further decision-making in the optimization process.
Returns:¶
np.ndarray: A 2D array of shape (ps, 22), where each row corresponds to an individual and each column represents a specific feature describing the individual’s state in the population.
Built-in Attribute:¶
self.individuals (dict): Contains arrays for current positions, costs, distances, neighbor matrices, best positions, and improvement counters for all individuals.
self.ps (int): Population size.
self.max_fes (int): Maximum number of function evaluations.
self.fes (int): Current number of function evaluations.
self.max_dist (float): Maximum possible distance in the search space.
self.max_cost (float): Maximum possible cost value.
self.k_neighbors (int): Number of neighbors considered for each individual.
Raises:¶
AssertionError: If the neighbor count does not match
self.k_neighborsor if any NaN values are present in the resulting state array.
- mydbscan(problem)[source]¶
Introduction¶
Applies the DBSCAN clustering algorithm to the current population of individuals, normalized within the problem bounds.
Args:¶
problem: The optimization problem object, which has
lb(lower bounds) andub(upper bounds) attributes.
Returns:¶
numpy.ndarray: An array of cluster labels assigned to each individual in the population.
Raises:¶
AttributeError: If
problemdoes not havelborubattributes.ValueError: If the population shape is incompatible with the problem bounds.
- update(action, problem)[source]¶
Introduction¶
Updates the optimizer’s population based on the provided actions and problem instance, applying evolutionary operators, updating global and local bests, and calculating rewards and termination conditions.
Args:¶
action (np.ndarray): An array of actions to apply to each individual in the population.
problem (object): The optimization problem instance, providing bounds and cost evaluation.
Returns:¶
next_state (np.ndarray): The observed state of the population after the update.
reward (float): The calculated reward for the current update step, scaled by
reward_scale.is_end (bool): Flag indicating whether the optimization process has reached its end condition.
info (dict): Additional information (currently empty).
Raises:¶
ValueError: If an invalid action is encountered in the
actionarray.