src.baseline.metabbo.l2t

Module Contents

Classes

Actor

Critic

Memory

L2T

Introduction

L2T: Learning to Transfer for Evolutionary Multitasking Existing implicit EMT(Evolutionary Multitasking) methods are challenging in terms of adaptability because they use limited evolutionary operators and cannot fully utilize the evolutionary state for effective knowledge transfer. To overcome these limitations, this paper proposes a “Learning to Transfer” (L2T) framework to enhance the adaptability of EMT by automatically discovering efficient knowledge transfer strategies. The framework models the knowledge transfer process as a learning agent making a series of policy decisions during the EMT process. It includes action design, state representation, reward function, and an interactive environment with multi-task optimization problems. Through the agent-critic network structure and the proximal policy optimization algorithm, the agent can learn efficient knowledge transfer strategies and can be integrated with various evolutionary algorithms to improve their ability to solve new multi-task optimization problems. Comprehensive experimental results show that the L2T framework can significantly improve the adaptability and performance of implicit EMT and achieve excellent results on a variety of different multi-task optimization problems.

API

class src.baseline.metabbo.l2t.Actor(n_state, n_action, hidden_dim=64)[source]

Bases: torch.nn.Module

Initialization

forward(state, fixed_action=None)[source]
class src.baseline.metabbo.l2t.Critic(n_state, hidden_dim=64)[source]

Bases: torch.nn.Module

Initialization

forward(x)[source]
class src.baseline.metabbo.l2t.Memory[source]

Initialization

clear_memory()[source]
class src.baseline.metabbo.l2t.L2T(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

L2T: Learning to Transfer for Evolutionary Multitasking Existing implicit EMT(Evolutionary Multitasking) methods are challenging in terms of adaptability because they use limited evolutionary operators and cannot fully utilize the evolutionary state for effective knowledge transfer. To overcome these limitations, this paper proposes a “Learning to Transfer” (L2T) framework to enhance the adaptability of EMT by automatically discovering efficient knowledge transfer strategies. The framework models the knowledge transfer process as a learning agent making a series of policy decisions during the EMT process. It includes action design, state representation, reward function, and an interactive environment with multi-task optimization problems. Through the agent-critic network structure and the proximal policy optimization algorithm, the agent can learn efficient knowledge transfer strategies and can be integrated with various evolutionary algorithms to improve their ability to solve new multi-task optimization problems. Comprehensive experimental results show that the L2T framework can significantly improve the adaptability and performance of implicit EMT and achieve excellent results on a variety of different multi-task optimization problems.

Original Paper

Learning to Transfer for Evolutionary Multitasking.”

Official Implementation

None

Application Scenario

multi-task optimization problems(MTOP)

Args:

`config`: Configuration object containing all necessary parameters for experiment.For details you can visit config.py.

Attributes:

config (object): Configuration object containing hyperparameters and settings.
task_cnt (int): Number of tasks based on the training or testing problem.
gamma (float): Discount factor for rewards.
n_step (int): Number of steps for n-step returns.
K_epochs (int): Number of epochs for PPO updates.
eps_clip (float): Clipping parameter for PPO.
max_grad_norm (float): Maximum gradient norm for clipping.
device (str): Device to run the computations ('cpu' or 'cuda').
actor (Actor): Actor network for policy generation.
critic (Critic): Critic network for value estimation.
optimizer (torch.optim.Optimizer): Optimizer for training the networks.
learning_time (int): Counter for the number of learning steps.
cur_checkpoint (int): Current checkpoint index for saving the model.

Methods:

__str__():
    Returns the string representation of the class.
train_episode(envs, seeds, para_mode='dummy', compute_resource={}, tb_logger=None, required_info={}):
    Trains the agent for one episode using PPO.
    # Args:
        envs (list): List of environments for training.
        seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment initialization.
        para_mode (str): Parallelization mode for environments.
        compute_resource (dict): Resources for computation (e.g., CPUs, GPUs).
        tb_logger (object): TensorBoard logger for logging training metrics.
        required_info (dict): Additional information required from the environment.
    # Returns:
        tuple: A boolean indicating if training ended and a dictionary with training information.
    # Raises:
        None.
rollout_episode(env, seed=None, required_info={}):
    Executes a single rollout episode in the environment.
    # Args:
        env (object): Environment for the rollout.
        seed (Optional[int]): Seed for environment initialization.
        required_info (dict): Additional information required from the environment.
    # Returns:
        dict: Results of the rollout including return, cost, and other metadata.
    # Raises:
        None.

Returns:

None.

Raises:

None.

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: Optional[Union[int, List[int], src.rl.utils.np.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]