src.baseline.metabbo.rlepso¶
Module Contents¶
Classes¶
Introduction¶This paper proposes a new reinforcement learning driven ensemble particle swarm optimization algorithm (RLEPSO). The algorithm uses reinforcement learning technology to adaptively select different PSO variants, thereby improving the algorithm’s exploration ability and convergence. Specifically, the RLEPSO algorithm uses reinforcement learning to dynamically adjust the use probability of different PSO variants to better balance exploration and utilization. At the same time, it uses an ensemble learning method to combine multiple PSO variants together to fully utilize the advantages of different variants. |
API¶
- class src.baseline.metabbo.rlepso.RLEPSO(config)[source]¶
Bases:
src.rl.ppo.PPO_AgentIntroduction¶
This paper proposes a new reinforcement learning driven ensemble particle swarm optimization algorithm (RLEPSO). The algorithm uses reinforcement learning technology to adaptively select different PSO variants, thereby improving the algorithm’s exploration ability and convergence. Specifically, the RLEPSO algorithm uses reinforcement learning to dynamically adjust the use probability of different PSO variants to better balance exploration and utilization. At the same time, it uses an ensemble learning method to combine multiple PSO variants together to fully utilize the advantages of different variants.
Original Paper¶
“RLEPSO: Reinforcement learning based Ensemble particle swarm optimizer.” Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. (2021)
Official Implementation¶
None
Application Scenario¶
single-object optimization problems(SOOP)
Args:¶
`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.Attributes:¶
config (object): Configuration object containing hyperparameters and settings. actor (Actor): The actor network responsible for policy generation. critic (Critic): The critic network responsible for value estimation. optimizer (torch.optim.Optimizer): Optimizer used for training the networks. learning_time (int): Counter for the number of learning steps performed. device (str): Device used for computation ('cpu' or 'cuda').
Methods:¶
__str__(): Returns the string representation of the class. train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}): Trains the agent for one episode using the PPO algorithm. Args: envs (list): List of environments for training. seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization. para_mode (Literal['dummy', 'subproc', 'ray', 'ray-subproc']): Parallelization mode for environments. compute_resource (dict): Resources for computation (e.g., CPUs, GPUs). tb_logger (object): TensorBoard logger for logging training metrics. required_info (dict): Additional information required from the environment. Returns: tuple: A tuple containing a boolean indicating if training has ended and a dictionary with training information. Raises: None. rollout_episode(env, seed=None, required_info={}): Executes a single rollout episode in a given environment. Args: env (object): The environment for the rollout. seed (Optional[int]): Seed for environment initialization. required_info (dict): Additional information required from the environment. Returns: dict: A dictionary containing rollout results, including costs, function evaluations, and returns. Raises: None.
Returns:¶
None.Raises:¶
None.Initialization
Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.
Args:¶
config: Configuration object containing all necessary parameters for the experiment.
networks (dict): A dictionary of neural networks used by the agent.
learning_rates (float): Learning rate for the optimizer.