src.baseline.metabbo.gleet

Module Contents

Classes

mySequential

Actor

Critic

GLEET

Introduction

GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning.

API

class src.baseline.metabbo.gleet.mySequential[source]

Bases: torch.nn.Sequential

forward(*inputs)[source]
class src.baseline.metabbo.gleet.Actor(embedding_dim, hidden_dim, n_heads_actor, n_heads_decoder, n_layers, normalization, v_range, node_dim, hidden_dim1, hidden_dim2, max_sigma=0.7, min_sigma=0.001)[source]

Bases: torch.nn.Module

Initialization

get_parameter_number()[source]
forward(x_in, fixed_action=None, require_entropy=False, to_critic=False, only_critic=False)[source]
class src.baseline.metabbo.gleet.Critic(input_dim, hidden_dim1, hidden_dim2)[source]

Bases: torch.nn.Module

Initialization

forward(h_features)[source]
class src.baseline.metabbo.gleet.GLEET(config)[source]

Bases: src.rl.ppo.PPO_Agent

Introduction

GLEET is a Generalizable Learning-based Exploration-Exploitation Tradeoff framework, which could explicitly control the exploration-exploitation tradeoff hyper-parameters of a given EC algorithm to solve a class of problems via reinforcement learning.

Original paper

Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning.” Proceedings of the Genetic and Evolutionary Computation Conference (2024).

Official Implementation

GLEET

Args:

  • config (object): Configuration object containing hyperparameters and settings for the agent, neural networks, and training process.

Methods:

  • str(): Returns the string representation of the agent.

  • train_episode(envs, seeds, para_mode=’dummy’, compute_resource={}, tb_logger=None, required_info={}): Trains the agent for one episode using parallel environments and PPO updates.

    Args:

    • envs: List of environments to train on.

    • seeds (Optional[int, List[int], np.ndarray]): Random seeds for environment reproducibility.

    • para_mode (str): Parallelization mode (‘dummy’, ‘subproc’, ‘ray’, ‘ray-subproc’).

    • compute_resource (dict): Resource allocation for CPUs/GPUs.

    • tb_logger: TensorBoard logger for tracking training metrics.

    • required_info (dict): Additional environment attributes to collect.

    Returns:

    • Tuple[bool, dict]: Training end status and information dictionary with returns, loss, and other metrics.

  • rollout_episode(env, seed=None, required_info={}): Executes a single rollout in one environment without updating the agent.

    Args:

    • env: The environment to interact with.

    • seed: Random seed for reproducibility.

    • required_info (dict): Additional environment attributes to collect.

    Returns:

    • dict: Results including cost, function evaluations, returns, and optional metadata.

Attributes:

  • config: Configuration object with all agent and network hyperparameters.

  • actor: The actor neural network for policy approximation.

  • critic: The critic neural network for value estimation.

  • optimizer: Optimizer for training the networks.

  • learning_time: Counter for training steps.

  • device: Device used for computation (‘cpu’ or ‘cuda’).

Raises:

  • Any exceptions raised by underlying neural network operations, environment interactions, or resource allocation.

Initialization

Initializes the PPO agent with the given configuration, networks, and learning rates.Store the initial agent in the checkpoint directory.

Args:

  • config: Configuration object containing all necessary parameters for the experiment.

  • networks (dict): A dictionary of neural networks used by the agent.

  • learning_rates (float): Learning rate for the optimizer.

__str__()[source]
train_episode(envs, seeds: src.rl.ppo.Optional[src.rl.ppo.Union[int, src.rl.ppo.List[int], src.rl.ppo.np.ndarray]], para_mode: src.rl.ppo.Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]