src.baseline.metabbo.rlhpsde

Module Contents

Classes

RLHPSDE

Introduction

The paper “Differential evolution with hybrid parameters and mutation strategies based on reinforcement learning” explores an innovative approach to enhancing the performance of Differential Evolution (DE), a widely-used optimization algorithm. The authors propose a hybrid framework that integrates reinforcement learning (RL) to dynamically adjust DE’s key parameters and mutation strategies. This adaptive mechanism enables the algorithm to balance exploration and exploitation more effectively, improving its ability to solve complex optimization problems. The study demonstrates the efficacy of the proposed method through extensive experiments on benchmark functions and real-world applications, highlighting its potential to achieve superior performance compared to traditional DE variants.

API

class src.baseline.metabbo.rlhpsde.RLHPSDE(config)[source]

Bases: src.rl.qlearning.QLearning_Agent

Introduction

The paper “Differential evolution with hybrid parameters and mutation strategies based on reinforcement learning” explores an innovative approach to enhancing the performance of Differential Evolution (DE), a widely-used optimization algorithm. The authors propose a hybrid framework that integrates reinforcement learning (RL) to dynamically adjust DE’s key parameters and mutation strategies. This adaptive mechanism enables the algorithm to balance exploration and exploitation more effectively, improving its ability to solve complex optimization problems. The study demonstrates the efficacy of the proposed method through extensive experiments on benchmark functions and real-world applications, highlighting its potential to achieve superior performance compared to traditional DE variants.

Original Paper

Differential evolution with hybrid parameters and mutation strategies based on reinforcement learning.” Swarm and Evolutionary Computation (2022): 101194

Official Implementation

None

Application Scenario

single-object optimization problems(SOOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Configuration object passed during initialization.
device (str): Device to be used for computation ('cpu' or 'gpu').
__alpha_max (float): Maximum learning rate for the agent.
__alpha_decay (bool): Flag indicating whether learning rate decay is enabled.
__max_learning_step (int): Maximum number of learning steps allowed.
q_table (torch.Tensor): Q-table used for storing state-action values.
learning_time (int): Counter for the number of learning steps completed.
cur_checkpoint (int): Current checkpoint index for saving the agent's state.

Methods:

__str__():
    Returns the string representation of the class.
__get_action(state):
    Determines the action to take based on the current state using the Q-table and softmax probabilities.
    Args:
        state (torch.Tensor): Current state of the environment.
    Returns:
        numpy.ndarray: Selected action(s) for the given state.
train_episode(envs, seeds, para_mode='dummy', asynchronous=None, num_cpus=1, num_gpus=0, tb_logger=None, required_info={}):
    Trains the agent for one episode in the given environment(s).
    Args:
        envs (list): List of environments for training.
        seeds (int, list, or np.ndarray): Seed(s) for environment randomization.
        para_mode (str): Parallelization mode for environments ('dummy', 'subproc', 'ray', 'ray-subproc').
        asynchronous (str or None): Asynchronous mode for environment execution.
        num_cpus (int or None): Number of CPUs to use for parallelization.
        num_gpus (int): Number of GPUs to use for computation.
        tb_logger (object): TensorBoard logger for logging training metrics.
        required_info (dict): Additional information to retrieve from the environment.
    Returns:
        tuple: A tuple containing:
            - bool: Whether the training has ended.
            - dict: Information about the training episode, including returns, losses, and environment attributes.

Returns:

__str__(): str: The string representation of the class.
__get_action(state): numpy.ndarray: Selected action(s) for the given state.
train_episode(): tuple: A tuple containing training status and episode information.

Raises:

None

Initialization

Initializes the Q-Learning agent with the given configuration.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

__str__()[source]
__get_action(state)[source]
train_episode(envs, seeds: src.rl.qlearning.Optional[src.rl.qlearning.Union[int, src.rl.qlearning.List[int], src.rl.qlearning.np.ndarray]], para_mode: src.rl.qlearning.Literal[dummy, subproc, ray, ray-subproc] = 'dummy', asynchronous: src.rl.qlearning.Literal[None, idle, restart, continue] = None, num_cpus: src.rl.qlearning.Optional[src.rl.qlearning.Union[int, None]] = 1, num_gpus: int = 0, tb_logger=None, required_info={})[source]