GP-based Bayesian Optimization

boptim.py

Utility functions for the Gaussian process-based Bayesian optimization for selecting the next query points in images and image-like data.

Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class boptimizer(X_seed, y_seed, X_full, target_function, acquisition_function='cb', exploration_steps=10, batch_size=100, batch_update=False, kernel='RBF', lengthscale=None, sparse=False, indpoints=None, gp_iterations=1000, seed=0, **kwargs)

Gaussian process-based Bayesian optimization for selecting next point(s) to measure/evaluate. The Bayesian optimization strategy consists of: i) defining prior and posterior distributions over the objective (target) function \(f\) using GP; ii) using the posterior to derive an acquistion function \(\alpha (x)\); iii) using the acquisition function to derive the next query point according to \(x_{next}=argmax(\alpha (x))\); iv) evaluating \(f\) in \(x_{next}\) and updating the posterior.

Parameters
  • X_seed (ndarray) – Initial seed of (sparse) grid indices with dimensions \(c \times N \times M\) or \(c \times N \times M \times L\) where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3). Missing coordinates are NaNs. Can be potentially larger than 3D but this has not beed tested yet.

  • y_seed (ndarray) – Initial seed with sparse “observations” with dimensions \(N \times M\) or \(N \times M \times L\). Typically, for 2D image N and M are image height and width, whereas for 3D hyperspectral data N and M are spatial dimensions and L is a spectorcopic dimension (e.g. voltage or wavelength). Can be potentially larger than 3D but this has not beed tested yet.

  • X_full (ndarray) – Full grid indices (for prediction with a trained GP model) with dimensions \(N \times M\) or \(N \times M \times L\). Can be potentially larger than 3D but this has not beed tested yet.

  • target_function (python function) – Target (or objective) function. Takes a list of indices and returns a function value as a float.

  • acquisition_function (str or python function) – Acquisition function choise.’cb’ is confidence bound, ‘ei’ is expected improvement, ‘poi’ is a probability of improvement. One can also pass a custom function, which takes GP model, full grid and sparse grid as parameters and returns acquisition function values with GP prediction (mean + sd)

  • exploration_steps (int) – Number of exploration-exploitation steps (the exploitation-exploration trade-off is determined by the acquisition function)

  • batch_size (int) – Number of query points in one batch. Returns a single query point when batch_update parameter is set to False (default)

  • batch_update (bool) – Filters the query points based on the kernel lengthscale. Returns a batch of points when set to True. The number of points in the batch may be different from batch_size as points are filtered based on the lengthscale. In this case the random indices from the initial batch will be automatically added to the output. Defaults to False.

  • kernel (str) – Kernel type (‘RBF’, ‘Matern52’, ‘RationalQuadratic’). Defaults to ‘RBF’.

  • lengthscale (list of int or list of 2 lists with int) – Determines lower (1st value or 1st list) and upper (2nd value or 2nd list) bounds for kernel lengthscales. For list with two integers, the kernel will have only one lenghtscale, even if the dataset is multi-dimensional. For lists of two lists, the number of elements in each list must be equal to the dataset dimensionality.

  • sparse (bool) – Uses inducing points-based sparse GP regression when set to True. False by default.

  • indpoints (int) – Number of inducing points for SparseGPRegression. Defaults to total_number_of_points // 10.

  • learning_rate (float) – Learning rate for GP model training

  • gp_iterations (int) – Number of SVI training iteratons for GP model

  • seed (int) – for reproducibility

  • **alpha (float or int) – alpha coefficient in the ‘confidence bound’ acquisition function (Default: 0)

  • **beta (float or int) – beta coefficient in the ‘confidence bound’ acquisition function (Default: 1)

  • **xi (float) – xi coefficient in ‘expected improvement’ and ‘probability of improvement’ acquisition functions (Default: 0.01)

  • **use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’. Notice that for large datasets training model without GPU is extremely slow. At the same time, for small and (very) sparse data there may be no much benefit.

  • **precision (str) – “single” ot “double” floating point precision

  • **jitter (float) – Float between 1e-4 and 1e-6 for numerical stability

  • **isotropic (bool) – Use single lenghtscale for all dimensions in GP (Default: False)

  • **mask (ndarray) – Mask of ones and NaNs (NaNs are values that are not counted when searching for acquisition function maximum).

  • **dscale (int) – Distance parameter used in boptimizer.checkvalues. It is used in conjuction with ‘gamma’ and ‘memory’ parameters to select the next query point using the information about the location of previous points. It can be understood as a short-term memory, where the point at step \(s\) is chosen no closer than \(d\) to the point selected at step \(s-1\), \(\gamma d\) to the point selected at step \(s-2\), \(\gamma^2 d\) at step \(s-3\), and so on. Defaults to 0.

  • **batch_dscale (int) – Distance paramater used in boptimizer.update_points. It is used to return a batch of points for a given step, which are no closer to each other than batch_dscale value. Defauts to kernel average lenghtscale at this step. When used in combination with dscale, it will select the first queiry point in the batch based on the information about the location of previous points. THe remianing points will be selected based on kernel lengthscale value without taking into account the information about the already visited points.

  • **batch_out_max (int) – maximum number of points to output when batch_update=True. Note that the number of output points can be smaller than this value.

  • **gamma (float) – gamma coefficient, value between 0 and 1. Used in boptimizer.checkvalues together with a ‘dscale’ parameter to determine how close the next query point can be to the previously selected points.

  • **memory (int) – Number of previously selected points to account for when using ‘dscale’.

  • **exit_strategy (0 or 1) – Exit strategy for boptimizer.checkvalues when none of the points satisfies the imposed selection criteria. 1 means that a random value will be chosen, while 0 means that the last point in the list will be chosen (the length of the list is defined by ‘batch_size’ parameter)

  • **extent (list of lists) – Define bounds for multi-dimensional data. For example, for 2D data, the extent parameter is [[xmin, xmax], [ymin, ymax]]

  • **save_checkpoints (bool) – save results to disk after each iteration

  • **filename (str) – name of a file to save checkpoints to

  • **verbose (int) – Level of verbosity (0, 1, or 2)

update_posterior()

Updates GP posterior

evaluate_function(indices, y_measured=None)

Evaluates target function in the new point(s)

next_point()

Calculates next query point(s)

update_points(acqfunc_values, indices, dscale)

Takes arrays of query points (indices and values) corresponding to first n max values of the acquisition function and returns a batch two lists with updated query points (indices and values) whose separation distance is determined by kernel lengthscale. Notice that the updated lists will contain less points than the original ones.

Parameters
  • acqfunc_values (ndarray) – (N,) numpy array with values of acquisition function

  • indices (ndarray) – (N, c) numpy array with corresponding indices, where c ia a number of dimensions of the dataset

  • dscale (float) – kernel lengthscale

Returns

Tuple with computed indices and corresponding values

checkvalues(idx_list, val_list)

Checks if a current point was already used or if the euclidian distances from the current point to the previous points are above a value controlled by parameter ‘alpha’, and if so then it selects the next-in-line point.

Parameters
  • idx_list (list of lists with ints) – Indices for max values of acquisition function; the list is ordered (max -> min)

  • val_list (list with floats) – Standard deviation values for each index in idx_list

Returns

The first element in the input list (idx_list) if no previous occurences found. Otherwise, returns the next/closest value from the list.

single_step(*args)

Single Bayesian optimization step

run()

Run GP-based Bayesian optimization loop

save_results(*args)

Save indermediary and/or final results

acqfunc.py

Acquisition functions

confidence_bound(gpmodel, X_full, **kwargs)

Confidence bound acquisition function (a modification of upper confidence bound)

Parameters
  • gpmodel (gpim reconstructor object) – Surrogate function that allows computing mean and standard deviation

  • X_full (ndarray) – Full grid indices

  • **alpha (float) – \(\alpha\) coefficient in \(\alpha \mu + \beta \sigma\)

  • **beta (float) – \(\beta\) coefficient in \(\alpha \mu + \beta \sigma\)

Returns

Acquisition function values and GP prediction (mean + stddev)

expected_improvement(gpmodel, X_full, X_sparse, **kwargs)

Expected improvement acquisition function

Parameters
  • gpmodel (gpim reconstructor object) – Surrogate function that allows computing mean and standard deviation

  • X_full (ndarray) – Full grid indices

  • X_sparse (ndarray) – Sparse grid indices

  • **xi (float) – xi constant value

Returns

Acquisition function values and GP prediction (mean + stddev)

probability_of_improvement(gpmodel, X_full, X_sparse, **kwargs)

Probability of improvement acquisition function

Parameters
  • gpmodel (gpim reconstructor object) – Surrogate function that allows computing mean and standard deviation

  • X_full (ndarray) – Full grid indices

  • X_sparse (ndarray) – Sparse grid indices

  • **xi (float) – xi constant value

Returns

Acquisition function values and GP prediction (mean + stddev)