src.environment.optimizer.rlepso_optimizer

Module Contents

Classes

RLEPSO_Optimizer

Introduction

RLEPSO is a new particle swarm optimization algorithm that combines reinforcement learning.

API

class src.environment.optimizer.rlepso_optimizer.RLEPSO_Optimizer(config)[source]

Bases: src.environment.optimizer.learnable_optimizer.Learnable_Optimizer

Introduction

RLEPSO is a new particle swarm optimization algorithm that combines reinforcement learning.

Original paper

RLEPSO: Reinforcement learning based Ensemble particle swarm optimizer.” Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. (2021).

Initialization

Introduction

Initializes the RL-EPSO optimizer with the provided configuration, setting up key hyperparameters and internal state variables.

Args:

  • config (object): Configuration object containing optimizer settings such as population size, weight decay, logging interval, and maximum function evaluations.

Built-in Attribute:

  • self.__config (object): Stores the configuration object.

  • self.__w_decay (bool): Indicates whether weight decay is enabled.

  • self.__w (float): Inertia weight, set based on weight decay.

  • self.__NP (int): Number of particles in the population.Default is 100.

  • self.__pci (np.ndarray): Array of learning probabilities for each particle.

  • self.__n_group (int): Number of groups for grouping particles.

  • self.__no_improve (int): Counter for iterations with no improvement.

  • self.__per_no_improve (np.ndarray): Array tracking no-improvement counts per particle.

  • self.fes (Any): Function evaluation state (initialized as None).

  • self.cost (Any): Cost state (initialized as None).

  • self.log_index (Any): Logging index (initialized as None).

  • self.log_interval (int): Interval for logging progress.

  • self.__max_fes (int): Maximum number of function evaluations.

  • self.__is_done (bool): Flag indicating if optimization is complete.

Returns:

  • None

__str__()[source]

Returns a string representation of the RLEPSO_Optimizer instance.

Returns:

str: The name of the optimizer ("RLEPSO_Optimizer").
init_population(problem)[source]

Introduction

Initializes the particle population for the RL-EPSO optimizer, setting up positions, velocities, and tracking variables for the optimization process.

Args:

  • problem (object): An object representing the optimization problem, which must have attributes lb (lower bounds), ub (upper bounds), and be compatible with the cost evaluation method.

Returns:

  • dict: The initial state of the optimizer, including particle positions, velocities, personal and global bests, and other relevant metadata.

Side Effects:

  • Updates internal attributes such as particle positions, velocities, costs, and logging variables.

  • Optionally stores meta-data if configured.

Notes:

  • Assumes that self.rng is a random number generator and self.__get_costs is a method for evaluating the cost of particle positions.

  • Resets counters for stagnation and improvement tracking.

__get_costs(problem, position)[source]

Introduction

Computes the cost(s) of a given position for the specified optimization problem, accounting for the number of function evaluations and the problem’s optimum if available.

Args:

  • problem: An object representing the optimization problem, expected to have eval(position) and optimum attributes.

  • position: The candidate solution(s) whose cost is to be evaluated.

Built-in Attribute:

  • self.fes (int): Increments by the number of particles (self.__NP) to track function evaluations.

Returns:

  • cost: The evaluated cost(s) for the given position(s), adjusted by the problem’s optimum if it exists.

Raises:

  • Any exception raised by problem.eval(position) if evaluation fails.

__get_v_clpso()[source]

Introduction

Computes the velocity update for particles using the CLPSO (Comprehensive Learning Particle Swarm Optimization) strategy.

Args:

None

Built-in Attribute:

  • self.rng: Random number generator for reproducibility.

  • self.__NP (int): Number of particles in the swarm.

  • self.__dim (int): Dimensionality of the search space.

  • self.__pci (np.ndarray): Learning probability for each particle.

  • self.__particles (dict): Contains ‘pbest_position’ and ‘current_position’ arrays for all particles.

Returns:

  • np.ndarray: The updated velocity matrix for all particles according to the CLPSO strategy.

Raises:

None

__tournament_selection()[source]

Introduction

Performs tournament selection among particle personal bests to select target positions for each particle and dimension in the swarm.

Args:

None

Built-in Attribute:

  • self.rng: Random number generator for reproducibility.

  • self.__NP: Number of particles in the swarm.

  • self.__dim: Dimensionality of the problem.

  • self.__particles: Dictionary containing particle information, including ‘pbest_position’ and ‘pbest’.

Returns:

  • np.ndarray: Selected target positions for each particle and dimension, shape (self.__NP, self.__dim).

Raises:

None

__get_v_fdr()[source]

Introduction

Computes the velocity update component based on the Fitness-Distance-Ratio (FDR) for each particle in the swarm. This method is typically used in particle swarm optimization algorithms to guide particles towards promising regions in the search space.

Args:

None

Built-in Attribute:

  • self.__particles (dict): Contains particle information, including ‘pbest_position’ and ‘pbest’.

  • self.__NP (int): Number of particles in the swarm.

  • self.__dim (int): Dimensionality of the search space.

  • self.rng (np.random.Generator): Random number generator for stochastic operations.

Returns:

  • np.ndarray: An array of shape (self.__NP, self.__dim) representing the FDR-based velocity component for each particle.

Raises:

None

__get_coe(actions)[source]

Introduction

Computes and returns coefficient arrays for each group based on the provided actions array. The coefficients include inertia weight, mutation coefficient, and four additional coefficients (c1, c2, c3, c4), which are calculated for each group of particles in the optimizer.

Args:

  • actions (np.ndarray): A 1D numpy array of shape (self.__n_group * 7,) containing action values for each group. Each group is associated with 7 action values.

Returns:

  • dict: A dictionary containing the following keys and their corresponding numpy arrays:

    • ‘w’: Inertia weights, shape (self.__NP, 1)

    • ‘c_mutation’: Mutation coefficients, shape (self.__NP,)

    • ‘c1’: First coefficient, shape (self.__NP, 1)

    • ‘c2’: Second coefficient, shape (self.__NP, 1)

    • ‘c3’: Third coefficient, shape (self.__NP, 1)

    • ‘c4’: Fourth coefficient, shape (self.__NP, 1)

Raises:

  • AssertionError: If the shape of actions does not match (self.__n_group * 7,).

__reinit(filter, problem)[source]

Introduction

Reinitializes selected particles in the swarm based on a filter mask, updating their positions, velocities, and personal/global bests as part of the RL-EPSO optimization process.

Args:

  • filter (np.ndarray): Boolean mask indicating which particles to reinitialize.

  • problem (object): Optimization problem instance containing lower and upper bounds (lb, ub) and other problem-specific attributes.

Returns:

  • None

Side Effects:

  • Updates the internal state of the optimizer, including particle positions, velocities, personal bests, global best, and function evaluation count (fes).

Notes:

  • If no particles are selected by the filter, the method returns immediately.

  • The method uses random number generation for reinitialization, which depends on the optimizer’s RNG state.

__get_state()[source]

Introduction

Returns the current state of the optimizer as a normalized value.

Returns:

  • np.ndarray: A NumPy array containing a single float value representing the ratio of the current function evaluations (self.fes) to the maximum allowed function evaluations (self.__max_fes).

Notes:

  • This method is intended for internal use to track the progress of the optimizer.

update(action, problem)[source]

Introduction

Updates the state of the RL-based Particle Swarm Optimization (PSO) optimizer for one iteration, including particle velocities, positions, personal and global bests, and handles reinitialization and logging. Calculates the reward and determines if the optimization process should terminate.

Args:

  • action (np.ndarray): Action array representing the ratio to learn from pbest and gbest, typically in the range (0, 1).

  • problem (object): Problem instance containing the objective function, lower and upper bounds, and other problem-specific information.

Returns:

  • next_state (np.ndarray): The next state representation after the update.

  • reward (int): Reward signal indicating improvement (1 if global best improved, -1 otherwise).

  • is_end (bool): Flag indicating whether the optimization process has reached its end condition.

  • info (dict): Additional information (currently empty).

Notes:

  • Updates particle velocities and positions using multiple velocity components (CLPSO, FDR, pbest, gbest).

  • Applies velocity and position clamping to respect problem bounds.

  • Updates personal and global bests based on new costs.

  • Handles reinitialization of particles based on patience and mutation coefficients.

  • Logs progress and meta-data if configured.

  • Calculates reward based on improvement of the global best value.

  • Checks for termination based on function evaluation limits or problem-specific optimum.