src.baseline.metabbo.b2opt

Module Contents

Classes

AttnWithFit

BaseModel

OB

Policy

B2OPT

Introduction

B2Opt: Learning to Optimize Black-box Optimization with Little Budget.

API

class src.baseline.metabbo.b2opt.AttnWithFit(popSize=100, hiddenDim=100)[source]

Bases: torch.nn.Module

Initialization

forward(x, fitx)[source]
getStrategy(fitx, dim)[source]
class src.baseline.metabbo.b2opt.BaseModel[source]

Bases: torch.nn.Module

Initialization

sortpop(x, fitness)[source]

说明: 输入:x(n,dim),f(x)(n) 输出:排序后的x和f(x)

class src.baseline.metabbo.b2opt.OB(dim=64, hidden_dim=100, popSize=10, temid=0)[source]

Bases: src.baseline.metabbo.b2opt.BaseModel

Initialization

forward(x, xfit)[source]
class src.baseline.metabbo.b2opt.Policy(popSize=10, dim=64, hidden_dim=100, ems=10, ws=False)[source]

Bases: src.baseline.metabbo.b2opt.BaseModel

Initialization

forward(x, cost, pointer)[source]
class src.baseline.metabbo.b2opt.B2OPT(config)[source]

Bases: src.rl.basic_agent.Basic_Agent

Introduction

B2Opt: Learning to Optimize Black-box Optimization with Little Budget.

Original paper

B2Opt: Learning to Optimize Black-box Optimization with Little Budget”. arXiv preprint arXiv:2304.11787, (2023).

Official Implementation

B2Opt

Raises:

  • None explicitly, but underlying methods may raise exceptions related to environment interaction, tensor operations, or file I/O.

Initialization

Args:

  • config (object): Configuration object containing hyperparameters and settings for the agent, such as optimizer type, learning rate, device, save directory, and environment dimensions.

Built-in Attributes:

  • Opt: The policy network.

  • optimizer: The optimizer instance (Adam).

  • scheduler: Learning rate scheduler.

  • learning_time (int): Number of training steps completed.

  • cur_checkpoint (int): Current checkpoint index.

  • lr: learning rate is set as 1e-2 in B2Opt.

  • lr_step_size: learning rate decay periord, 100 steps as default.

  • lr_decay: the decay rate of lr is set as 0.9.

__str__()[source]
train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]

Trains the agent for one episode across parallel environments.

  • envs: List of environments.

  • seeds (Optional[int, List[int], np.ndarray]): Random seeds for reproducibility.

  • para_mode (str): Parallelization mode (‘dummy’, ‘subproc’, ‘ray’, ‘ray-subproc’).

  • compute_resource (dict): Resource allocation for CPUs/GPUs.

  • tb_logger: TensorBoard logger for training metrics.

  • required_info (dict): Additional environment attributes to log.

  • Returns: (is_train_ended (bool), return_info (dict))

rollout_episode(env, seed=None, required_info={})[source]

Evaluates the agent in a single/multiple environment without training.

  • env: Environment instance.

  • seed (Optional[int]): Random seed.

  • required_info (dict): Additional environment attributes to log.

  • Returns: results (dict) with evaluation metrics.

log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, extra_info={})[source]

Logs training metrics to TensorBoard.

  • tb_logger: TensorBoard logger.

  • mini_step (int): Current training step.

  • grad_norms (tuple): Gradient norms before and after clipping.

  • loss (torch.Tensor): Training loss.

  • Return (torch.Tensor): Episode returns.

  • extra_info (dict): Additional metrics to log.