src.baseline.metabbo.glhf

Module Contents

Classes

SMBND

GBMutModel

GBLearnCrRate

Policy

GLHF

Introduction

GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

Functions

sortIndiv

作用: 将一批种群中的个体按照 fitness维度的值来排序号

sortIndivBND

API

class src.baseline.metabbo.glhf.SMBND(device)[source]

Bases: torch.nn.Module

Initialization

forward(batchpop1, batchpop2, minimize=True)[source]

实现选择操作,默认是最小化函数,若minimize=False,则为最大化目标值问题 batchpop1 是offpop

class src.baseline.metabbo.glhf.GBMutModel(device, hdim=1000)[source]

Bases: torch.nn.Module

Initialization

forward(x)[source]
class src.baseline.metabbo.glhf.GBLearnCrRate(hdim=100)[source]

Bases: torch.nn.Module

Initialization

forward(x)[source]
class src.baseline.metabbo.glhf.Policy(popsize=100, selmod='1-to-1', cr_policy='learned', muthdim=1000, crhdim=4, device='cpu')[source]

Bases: torch.nn.Module

Initialization

setAdapter(adapter)[source]
RoulleteSelectWithElite(pop, popsize)[source]

保留精英的轮盘赌选择 pop -> b,n,d+1

genGBMutToken(x, ranks)[source]
genGBMutToken2(x, ranks)[source]
genCrRankToken(fitness)[source]

input:b,n

genCrRankTokenWithoutFit(father, off)[source]

input:b,n,改为不用适应度的版本

clearMutstate()[source]
forward(batchPop=None)[source]

输入: 已经有适应度的种群(batch,n,d)

class src.baseline.metabbo.glhf.GLHF(config)[source]

Bases: src.rl.basic_agent.Basic_Agent

Introduction

GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

Original paper

GLHF: General Learned Evolutionary Algorithm Via Hyper Functions.” arXiv preprint arXiv:2405.03728 (2024).

Official Implementation

GLHF

Args:

  • config (object): Configuration object containing hyperparameters and settings for the agent, such as optimizer type, learning rate, device, and save directories.

Attributes:

  • Pom (Policy): The policy model used by the agent.

  • optimizer (torch.optim.Optimizer): The optimizer for training the policy.

  • learning_time (int): Counter for the number of training steps taken.

  • cur_checkpoint (int): Counter for the current checkpoint index.

  • config (object): The configuration object with agent settings.

Methods:

  • str(): Returns the string representation of the agent.

  • train_episode(…): Trains the agent for one episode in parallel environments.

  • rollout_episode(…): Evaluates the agent in a single environment without training.

  • log_to_tb_train(…): Logs training metrics to TensorBoard.

train_episode

Trains the agent for one episode using parallel environments. Handles environment setup, policy optimization, checkpointing, and logging.

Args:
  • envs: List of environments to train on.

  • seeds (Optional[Union[int, List[int], np.ndarray]]): Seeds for environment reproducibility.

  • para_mode (str): Parallelization mode (‘dummy’, ‘subproc’, ‘ray’, ‘ray-subproc’).

  • compute_resource (dict): Resources for parallelization (e.g., number of CPUs/GPUs).

  • tb_logger: TensorBoard logger for recording metrics.

  • required_info (dict): Additional environment attributes to record.

Returns:
  • is_train_ended (bool): Whether the training has reached the maximum step.

  • return_info (dict): Dictionary containing returns, losses, learning steps, and additional info.

rollout_episode

Evaluates the agent in a single environment without updating the policy.

Args:
  • env: The environment to evaluate in.

  • seed: Seed for reproducibility.

  • required_info (dict): Additional environment attributes to record.

Returns:
  • results (dict): Dictionary containing cost, function evaluations, return, and optional metadata.

log_to_tb_train

Logs training statistics and metrics to TensorBoard.

Args:
  • tb_logger: TensorBoard logger.

  • mini_step (int): Current training step.

  • grad_norms: Gradient norms before and after clipping.

  • loss_1, loss_2, loss: Loss components.

  • Return: Episode returns.

  • reward: Rewards for the current step.

  • extra_info (dict): Additional metrics to log.

Raises:

  • ValueError: If invalid configuration or environment state is encountered during training.

Initialization

Initialize the basic_agent with config.

__str__()[source]
train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]
rollout_episode(env, seed=None, required_info={})[source]
log_to_tb_train(tb_logger, mini_step, grad_norms, loss_1, loss_2, loss, Return, reward, extra_info={})[source]
src.baseline.metabbo.glhf.sortIndiv(batchPop)[source]

作用: 将一批种群中的个体按照 fitness维度的值来排序号

输入: batchPop:一批种群,维度为(batchSize,dim+1,L*L) 返回: 排好序的(batch,dim+1,w,h的矩阵)

src.baseline.metabbo.glhf.sortIndivBND(batchPop)[source]