src.baseline.metabbo.psorlns

Module Contents

Classes

PSORLNS

Introduction

PSORLNS:A reinforcement learning-based neighborhood search operator for multi-modal optimization and its applications PSORLNS is a neighborhood search operator based on reinforcement learning for solving multimodal optimization problems. PSORLNS adaptively adjusts the search range through reinforcement learning to balance exploration and exploitation, thereby improving the search performance for multimodal optimization problems.

API

class src.baseline.metabbo.psorlns.PSORLNS(config)[source]

Bases: src.rl.dqn.DQN_Agent

Introduction

PSORLNS:A reinforcement learning-based neighborhood search operator for multi-modal optimization and its applications PSORLNS is a neighborhood search operator based on reinforcement learning for solving multimodal optimization problems. PSORLNS adaptively adjusts the search range through reinforcement learning to balance exploration and exploitation, thereby improving the search performance for multimodal optimization problems.

Original Paper

A reinforcement learning-based neighborhood search operator for multi-modal optimization and its applications.”

Official Implementation

None

Application Scenario

multi-modal optimization problems(MMOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Stores the configuration object.
ps (int): Number of parallel environments.
replay_buffer (object): Replay buffer for storing experience tuples.
learning_time (int): Counter for the number of learning steps.
cur_checkpoint (int): Counter for the current checkpoint.
device (str): Device to be used for computation ('cpu' or 'cuda').
model (object): Neural network model used for Q-value estimation.
optimizer (object): Optimizer for training the model.
criterion (object): Loss function used for training.
gamma (float): Discount factor for future rewards.
batch_size (int): Batch size for training.
warm_up_size (int): Minimum number of samples in the replay buffer before training starts.
max_grad_norm (float): Maximum gradient norm for gradient clipping.
n_act (int): Number of possible actions.

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode.
    Args:
        envs (list): List of environments.
        seeds (int, list, or np.ndarray): Seeds for environment initialization.
        para_mode (str): Parallelization mode ('dummy', 'subproc', 'ray', 'ray-subproc').
        compute_resource (dict): Resources for computation (e.g., num_cpus, num_gpus).
        tb_logger (object): TensorBoard logger.
        required_info (dict): Additional information required from the environment.
    Returns:
        is_train_ended (bool): Whether the training has ended.
        return_info (dict): Information about the training episode.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode.
    Args:
        env (object): Environment object.
        seed (int, optional): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    Returns:
        results (dict): Results of the rollout, including rewards and metadata.

Returns:

None for the class itself. Individual methods return specific outputs as described above.

Raises:

AssertionError: If the state shape does not match the expected dimensions during training.
Exception: If there are issues with tensor reshaping or environment interactions.

Initialization

Initializes the DQN agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • network (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.dqn.Optional[src.rl.dqn.Union[int, src.rl.dqn.List[int], src.rl.dqn.np.ndarray]], para_mode: src.rl.dqn.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]