src.baseline.metabbo.b2opt¶
Module Contents¶
Classes¶
API¶
- class src.baseline.metabbo.b2opt.AttnWithFit(popSize=100, hiddenDim=100)[source]¶
Bases:
torch.nn.ModuleInitialization
- class src.baseline.metabbo.b2opt.OB(dim=64, hidden_dim=100, popSize=10, temid=0)[source]¶
Bases:
src.baseline.metabbo.b2opt.BaseModelInitialization
- class src.baseline.metabbo.b2opt.Policy(popSize=10, dim=64, hidden_dim=100, ems=10, ws=False)[source]¶
Bases:
src.baseline.metabbo.b2opt.BaseModelInitialization
- class src.baseline.metabbo.b2opt.B2OPT(config)[source]¶
Bases:
src.rl.basic_agent.Basic_AgentIntroduction¶
B2Opt: Learning to Optimize Black-box Optimization with Little Budget.
Original paper¶
“B2Opt: Learning to Optimize Black-box Optimization with Little Budget”. arXiv preprint arXiv:2304.11787, (2023).
Official Implementation¶
Raises:¶
None explicitly, but underlying methods may raise exceptions related to environment interaction, tensor operations, or file I/O.
Initialization
Args:¶
config (object): Configuration object containing hyperparameters and settings for the agent, such as optimizer type, learning rate, device, save directory, and environment dimensions.
Built-in Attributes:¶
Opt: The policy network.
optimizer: The optimizer instance (Adam).
scheduler: Learning rate scheduler.
learning_time (int): Number of training steps completed.
cur_checkpoint (int): Current checkpoint index.
lr: learning rate is set as 1e-2 in B2Opt.
lr_step_size: learning rate decay periord, 100 steps as default.
lr_decay: the decay rate of lr is set as 0.9.
- train_episode(envs, seeds: Optional[Union[int, List[int], numpy.ndarray]], para_mode: Literal[dummy, subproc, ray, ray - subproc] = 'dummy', compute_resource={}, tb_logger=None, required_info={})[source]¶
Trains the agent for one episode across parallel environments.
envs: List of environments.
seeds (Optional[int, List[int], np.ndarray]): Random seeds for reproducibility.
para_mode (str): Parallelization mode (‘dummy’, ‘subproc’, ‘ray’, ‘ray-subproc’).
compute_resource (dict): Resource allocation for CPUs/GPUs.
tb_logger: TensorBoard logger for training metrics.
required_info (dict): Additional environment attributes to log.
Returns: (is_train_ended (bool), return_info (dict))
- rollout_episode(env, seed=None, required_info={})[source]¶
Evaluates the agent in a single/multiple environment without training.
env: Environment instance.
seed (Optional[int]): Random seed.
required_info (dict): Additional environment attributes to log.
Returns: results (dict) with evaluation metrics.
- log_to_tb_train(tb_logger, mini_step, grad_norms, loss, Return, extra_info={})[source]¶
Logs training metrics to TensorBoard.
tb_logger: TensorBoard logger.
mini_step (int): Current training step.
grad_norms (tuple): Gradient norms before and after clipping.
loss (torch.Tensor): Training loss.
Return (torch.Tensor): Episode returns.
extra_info (dict): Additional metrics to log.