src.environment.optimizer.rlpso_optimizer

Module Contents

Classes

RLPSO_Optimizer

Introduction

RLPSO develops a reinforcement learning strategy to enhance PSO in convergence by replacing the uniformly distributed random number in the updating function with a random number generated from a selected normal distribution.

Functions

API

class src.environment.optimizer.rlpso_optimizer.RLPSO_Optimizer(config)[source]

Bases: src.environment.optimizer.learnable_optimizer.Learnable_Optimizer

Introduction

RLPSO develops a reinforcement learning strategy to enhance PSO in convergence by replacing the uniformly distributed random number in the updating function with a random number generated from a selected normal distribution.

Original paper

Employing reinforcement learning to enhance particle swarm optimization methods.” Engineering Optimization (2022).Intelligence. (2021).

Initialization

Introduction

Initializes the RLPSO optimizer with the provided configuration, setting up key hyperparameters and internal state.

Args:

  • config (object): Configuration object containing optimizer parameters such as inertia weight decay, acceleration coefficient, population size, maximum function evaluations, and logging interval.

Built-in Attribute:

  • self.__config (object): Stores the configuration object.

  • self.__w_decay (bool): Indicates whether inertia weight decay is enabled.Default is True.

  • self.__w (float): Inertia weight value, initialized based on w_decay.Default is 0.9 if w_decay is True, otherwise 0.729.

  • self.__c (float): Acceleration coefficient. Default is 2.05.

  • self.__NP (int): Population size.Default is 100.

  • self.__max_fes (int): Maximum number of function evaluations.

  • self.fes (Any): Tracks the number of function evaluations (initialized as None).

  • self.cost (Any): Tracks the cost or fitness value (initialized as None).

  • self.log_index (Any): Tracks the logging index (initialized as None).

  • self.log_interval (int): Interval for logging progress.

Returns:

  • None

__str__()[source]

Returns a string representation of the RLPSO_Optimizer instance.

Returns:

  • str: The name of the optimizer, “RLPSO_Optimizer”.

init_population(problem)[source]

Introduction

Initializes the particle population for the RLPSO (Reinforcement Learning Particle Swarm Optimization) algorithm, setting up positions, velocities, and tracking variables for optimization.

Args:

  • problem (object): The optimization problem object, which provides lower and upper bounds (lb, ub) for the search space.

Built-in Attribute:

  • self.__dim (int): The dimensionality of the problem, set to problem.dim.

  • self.__particles (dict): A dictionary to hold particle attributes such as current position, cost, personal best position, global best position, and velocity.

  • self.__max_velocity (float): The maximum velocity for particles, calculated based on the problem’s bounds.

  • self.fes (int): The number of function evaluations, initialized to 0.

  • self.__max_cost (float): The maximum cost value among the particles, initialized based on the initial costs.

  • self.__cur_index (int): The current index of the particle being updated, initialized to 0.

  • self.log_index (int): The index for logging progress, initialized to 1.

  • self.cost (list): A list to store the global best cost values at specified intervals.

  • self.meta_X (list): A list to store the positions of particles for full meta-data logging.

  • self.meta_Cost (list): A list to store the costs of particles for full meta-data logging.

  • self.meta_tmp_x (list): A temporary list to store positions during a single iteration for full meta-data logging.

  • self.meta_tmp_cost (list): A temporary list to store costs during a single iteration for full meta-data logging.

Returns:

  • object: The initial state of the optimizer, as returned by self.__get_state(self.__cur_index).

Notes:

  • Assumes that self.rng is a random number generator and that self.__get_costs and self.__get_state are defined elsewhere in the class.

  • The method is intended to be called at the start of the optimization process.

__get_state(index)[source]

Introduction

Retrieves the current state representation for a given particle by concatenating the global best position and the current position of the specified particle.

Args:

  • index (int): The index of the particle whose state is to be retrieved.

Returns:

  • np.ndarray: A 1D array representing the concatenated state of the global best position and the current position of the specified particle.

Raises:

  • IndexError: If index is out of bounds for the current positions array.

__get_costs(problem, position)[source]

Introduction

Computes the cost(s) of a given position or set of positions for a specified optimization problem, updating the function evaluation count.

Args:

  • problem: An object representing the optimization problem, expected to have eval(position) and optimum attributes.

  • position (np.ndarray): The position(s) in the search space for which the cost is to be evaluated. Can be a 1D or 2D numpy array.

Built-in Attribute:

  • self.fes (int): Tracks the number of function evaluations performed.

Returns:

  • float or np.ndarray: The computed cost(s) for the given position(s). If problem.optimum is set, returns the difference between the evaluated value and the optimum.

Raises:

  • AttributeError: If problem does not have the required eval method or optimum attribute.

update(action, problem)[source]

Introduction

Updates the state of the RL-PSO (Reinforcement Learning Particle Swarm Optimization) optimizer for a single particle based on the provided action and problem definition. This includes updating velocity, position, personal best, and global best, as well as calculating rewards and logging progress.

Args:

  • action (np.ndarray or float): The action to be taken, typically representing a random factor for velocity update.

  • problem (object): The optimization problem instance, which must provide lower and upper bounds (lb, ub), and optionally an optimum value.

Returns:

  • state (np.ndarray): The updated state representation for the next step.

  • reward (float): The reward signal computed from the improvement in cost.

  • is_done (bool): Whether the optimization process has reached its termination condition.

  • info (dict): Additional information (currently empty, reserved for future use).

Notes:

  • The method linearly decreases the inertia coefficient if enabled.

  • Velocity and position are updated according to the PSO update rules, with velocity clipping and position boundary handling.

  • Updates personal and global bests if improvements are found.

  • Logs global best values at specified intervals.

  • Handles full meta-data logging if configured.

  • Ensures the cost log is filled up to the required number of log points upon completion.

src.environment.optimizer.rlpso_optimizer.clipping(x: Union[numpy.ndarray, Iterable], lb: Union[numpy.ndarray, Iterable, int, float, None], ub: Union[numpy.ndarray, Iterable, int, float, None]) numpy.ndarray[source]