src.baseline.metabbo.rlepso

Module Contents

Classes

Actor

Critic

RLEPSO

Introduction

This paper proposes a new reinforcement learning driven ensemble particle swarm optimization algorithm (RLEPSO). The algorithm uses reinforcement learning technology to adaptively select different PSO variants, thereby improving the algorithm’s exploration ability and convergence. Specifically, the RLEPSO algorithm uses reinforcement learning to dynamically adjust the use probability of different PSO variants to better balance exploration and utilization. At the same time, it uses an ensemble learning method to combine multiple PSO variants together to fully utilize the advantages of different variants.

API

class src.baseline.metabbo.rlepso.Actor(config)[source]

Bases: torch.nn.Module

Initialization

forward(x_in, fixed_action=None, require_entropy=False)[source]
class src.baseline.metabbo.rlepso.Critic(config)[source]

Bases: torch.nn.Module

Initialization

forward(h_features)[source]
class src.baseline.metabbo.rlepso.RLEPSO(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

This paper proposes a new reinforcement learning driven ensemble particle swarm optimization algorithm (RLEPSO). The algorithm uses reinforcement learning technology to adaptively select different PSO variants, thereby improving the algorithm’s exploration ability and convergence. Specifically, the RLEPSO algorithm uses reinforcement learning to dynamically adjust the use probability of different PSO variants to better balance exploration and utilization. At the same time, it uses an ensemble learning method to combine multiple PSO variants together to fully utilize the advantages of different variants.

Original Paper

RLEPSO: Reinforcement learning based Ensemble particle swarm optimizer.” Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. (2021)

Official Implementation

None

Application Scenario

single-object optimization problems(SOOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Configuration object containing hyperparameters and settings.
actor (Actor): The actor network responsible for policy generation.
critic (Critic): The critic network responsible for value estimation.
optimizer (torch.optim.Optimizer): Optimizer used for training the networks.
learning_time (int): Counter for the number of learning steps performed.
device (str): Device used for computation ('cpu' or 'cuda').

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode using the PPO algorithm.
    Args:
        envs (list): List of environments for training.
        seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization.
        para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments.
        compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
        tb_logger (object): TensorBoard logger for logging training metrics.
        required_info (dict): Additional information required from the environment.
    Returns:
        tuple: A tuple containing a boolean indicating if training has ended and a dictionary with training information.
    Raises:
        None.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode in a given environment.
    Args:
        env (object): The environment for the rollout.
        seed (Optional[int]): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    Returns:
        dict: A dictionary containing rollout results, including costs, function evaluations, and returns.
    Raises:
        None.

Returns:

None.

Raises:

None.

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.ppo.Optional[src.rl.ppo.Union[int, src.rl.ppo.List[int], src.rl.ppo.np.ndarray]], para_mode: src.rl.ppo.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]