`src.environment.problem.SOO.HPO_B.hpob_dataset`¶

Problem Difficulty Classification¶

By default, the dataset is split into a fixed training set and a testing set.

Set Type	Source Data	Number of Problems
Training Set	`meta_train_data`	758
Testing Set	`meta_vali_data` + `meta_test_data`	177

Note: If difficulty is set to ‘all’, the training and testing sets are merged, containing all 935 problems.

Module Contents¶

Classes¶

HPOB_Dataset

Introduction¶

HPO-B is an autoML hyper-parameter optimization benchmark which includes a wide range of hyperparameter optimization tasks for 16 different model types (e.g., SVM, XGBoost, etc.), resulting in a total of 935 problem instances. The dimension of these problem instances range from 2 to 16. We also note that HPO-B represents problems with ill-conditioned landscape such as huge flattern.

Functions¶

`get_data`	Introduction¶ Loads and returns training, validation, and test datasets along with Bayesian Optimization (BO) initializations and surrogate statistics based on the specified mode and directory paths.
`load_data`	Introduction¶ Loads HPOB benchmark datasets according to specified parameters, supporting different dataset versions, test/train splits, and augmented training data.
`get_bst`	Introduction¶ Loads a pre-trained XGBoost surrogate model and retrieves its associated normalization statistics for a given search space and dataset.

API¶

class src.environment.problem.SOO.HPO_B.hpob_dataset.HPOB_Dataset(data, batch_size=1)[source]¶

Bases: torch.utils.data.Dataset

Introduction¶

Original paper¶

“Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml.” arXiv preprint arXiv:2106.06257 (2021).

Official Implementation¶

HPO-B

License¶

None

Initialization

Introduction¶

Initializes the dataset object for handling batches of data items, determining the maximum dimension among items, and setting up batch pointers and indices.

Args:¶

data (list): A list of data items, each expected to have a dim attribute.
batch_size (int, optional): The number of items per batch. Defaults to 1.

Built-in Attribute:¶

self.data (list): Stores the input data.
self.maxdim (int): The maximum dimension found among all data items.Defaults to 0.
self.batch_size (int): The batch size for processing data.
self.N (int): The total number of data items.
self.ptr (list): List of starting indices for each batch.
self.index (np.ndarray): Array of indices for the data items.

Returns:¶

None

static get_datasets(datapath=None, train_batch_size=1, test_batch_size=1, upperbound=None, difficulty=None, user_train_list=None, user_test_list=None, cost_normalize=False)[source]¶

Introduction¶

Loads and processes the HPO-B benchmark datasets, returning train and test sets as HPOB_Dataset objects. Handles dataset downloading, extraction, and filtering based on user-specified lists or difficulty level.

Args:¶

datapath (str, optional): Path to the root directory containing the HPO-B data. If None, defaults to a subdirectory in the current working directory.
train_batch_size (int, optional): Batch size for the training dataset. Defaults to 1.
test_batch_size (int, optional): Batch size for the test dataset. Defaults to 1.
upperbound (float, optional): Upper bound for the problem domain. Used to set the search space limits.
difficulty (str, optional): If set to ‘all’, merges train and test sets. Otherwise, uses user-specified lists for filtering.
user_train_list (list of str, optional): List of problem identifiers to include in the training set.
user_test_list (list of str, optional): List of problem identifiers to include in the test set.
cost_normalize (bool, optional): Whether to normalize the cost values in the problems. Defaults to False.

Returns:¶

tuple: A tuple containing:
- HPOB_Dataset: The training dataset.
- HPOB_Dataset: The test dataset.

Raises:¶

NotImplementedError: If neither user_train_list nor user_test_list is provided when required for filtering.

__getitem__(item)[source]¶

Introduction¶

Retrieves a batch of data samples corresponding to the given index.

Args:¶

item (int): The index of the batch to retrieve.

Returns:¶

list: A list containing the data samples for the specified batch.

Raises:¶

IndexError: If item is out of range of the available batches.

__len__()[source]¶

Returns the number of elements in the dataset.

Returns:¶

int: The total number of elements in the dataset.

__add__(other: src.environment.problem.SOO.HPO_B.hpob_dataset.HPOB_Dataset)[source]¶

Introduction¶

Combines two HPOB_Dataset instances by concatenating their data attributes.

Args:¶

other (HPOB_Dataset): Another dataset instance to be added.

Returns:¶

HPOB_Dataset: A new dataset containing the combined data from both instances.

Raises:¶

AttributeError: If other does not have a data attribute.

shuffle()[source]¶

Introduction¶

Randomly shuffles the indices of the dataset to change the order of data access.

Built-in Attribute:¶

self.N (int): The total number of data points in the dataset.

Returns:¶

None

Side Effects:¶

Updates self.index with a new permutation of indices from 0 to self.N - 1.

src.environment.problem.SOO.HPO_B.hpob_dataset.get_data(mode, surrogates_dir, root_dir)[source]¶

Introduction¶

Loads and returns training, validation, and test datasets along with Bayesian Optimization (BO) initializations and surrogate statistics based on the specified mode and directory paths.

Args:¶

mode (str): The mode specifying which dataset version or configuration to load. Supported values are “v1”, “v2”, “v3”, “v3-test”, and “v3-train-augmented”.
surrogates_dir (str): Directory path where the surrogate statistics file (“summary-stats.json”) is located.
root_dir (str): Root directory path containing the dataset files.

Returns:¶

train_set: The training dataset, or None if not applicable for the selected mode.
vali_set: The validation dataset, or None if not applicable for the selected mode.
test_set: The test dataset.
bo_initializations: Initializations for Bayesian Optimization.
surrogates_stats (dict): Dictionary containing surrogate statistics loaded from “summary-stats.json”.

Raises:¶

ValueError: If an invalid mode is provided.

src.environment.problem.SOO.HPO_B.hpob_dataset.load_data(rootdir='', version='v3', only_test=True, augmented_train=False)[source]¶

Introduction¶

Loads HPOB benchmark datasets according to specified parameters, supporting different dataset versions, test/train splits, and augmented training data.

Args:¶

rootdir (str, optional): Path to the directory containing the benchmark data files. Defaults to “”.
version (str, optional): HPOB dataset version to use. Options are “v1”, “v2”, or “v3”. Defaults to “v3”.
only_test (bool, optional): If True, loads only the test data (valid only for version “v3”). Defaults to True.
augmented_train (bool, optional): If True, loads the augmented training data (valid only for version “v3”). Defaults to False.

Returns:¶

meta_train_data (dict or None): The meta-training dataset, or None if only_test is True.
meta_validation_data (dict or None): The meta-validation dataset, or None if only_test is True.
meta_test_data (dict): The meta-testing dataset.
bo_initializations (dict): The Bayesian optimization initializations.

Raises:¶

FileNotFoundError: If any of the required dataset files are missing in the specified root directory.
json.JSONDecodeError: If any of the dataset files are not valid JSON.

src.environment.problem.SOO.HPO_B.hpob_dataset.get_bst(surrogates_dir, search_space_id, dataset_id, surrogates_stats)[source]¶

Introduction¶

Loads a pre-trained XGBoost surrogate model and retrieves its associated normalization statistics for a given search space and dataset.

Args:¶

surrogates_dir (str): Directory path where surrogate models are stored.
search_space_id (str): Identifier for the search space.
dataset_id (str): Identifier for the dataset.
surrogates_stats (dict): Dictionary containing normalization statistics for each surrogate model.

Returns:¶

bst_surrogate (xgboost.Booster): Loaded XGBoost surrogate model.
y_min (float): Minimum target value used for normalization.
y_max (float): Maximum target value used for normalization.

Raises:¶

AssertionError: If y_min is None for the specified surrogate model.

src.environment.problem.SOO.HPO_B.hpob_dataset¶

Problem Difficulty Classification¶

Module Contents¶

Classes¶

Introduction¶

Functions¶

Introduction¶

Introduction¶

Introduction¶

API¶

Introduction¶

Original paper¶

Official Implementation¶

License¶

Introduction¶

Args:¶

Built-in Attribute:¶

Returns:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

Returns:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

Introduction¶

Built-in Attribute:¶

Returns:¶

Side Effects:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

Introduction¶

Args:¶

Returns:¶

Raises:¶

`src.environment.problem.SOO.HPO_B.hpob_dataset`¶