GP for 2D-4D images

gpr.py

Gaussian process regression: model training, prediction and uncertainty exploration This module serves as a high-level wrapper for sparse Gaussian processes module from Pyro probabilistic programming library (https://pyro.ai/) for easy work with scientific image (2D) and hyperspectral (3D) data. Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class reconstructor(X, y, Xtest=None, kernel='RBF', lengthscale=None, sparse=False, indpoints=None, learning_rate=0.05, iterations=1000, use_gpu=False, verbose=1, seed=0, **kwargs)

Class for Gaussian process-based reconstuction of sparse 2D images and 3D spectroscopic datasets, and exploration/explotaition routines for the selection of next query point.

Parameters
  • X (ndarray) – Grid indices with dimensions \(c \times N \times M\) or \(c \times N \times M \times L\) where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimensions \(N \times M\) or \(N \times M \times L\). Typically, for 2D image N and M are image height and width, whereas for 3D hyperspectral data N and M are spatial dimensions and L is a spectorcopic dimension (e.g. voltage or wavelength).

  • Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimensions \(N \times M\) or \(N \times M \times L\)

  • kernel (str) – Kernel type (‘RBF’, ‘Matern52’, ‘RationalQuadratic’)

  • lengthscale (list of int or list of two lists with int) – Determines lower (1st value or 1st list) and upper (2nd value or 2nd list) bounds for kernel lengthscales. For list with two integers, the kernel will have only one lenghtscale, even if the dataset is multi-dimensional. For lists of two lists, the number of elements in each list must be equal to the dataset dimensionality.

  • sparse (bool) – Perform sparse GP regression when set to True.

  • indpoints (int) – Number of inducing points for SparseGPRegression. Defaults to total_number_of_points // 10.

  • learning_rate (float) – Learning rate for model training

  • iterations (int) – Number of SVI training iteratons

  • use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’. Notice that for large datasets training model without GPU is extremely slow.

  • verbose (int) – Level of verbosity (0, 1, or 2)

  • seed (int) – for reproducibility

  • **amplitude (float) – kernel variance or amplitude squared

  • **precision (str) – Choose between single (‘single’) and double (‘double’) precision

  • **jitter (float) – Float between 1e-4 and 1e-6 for numerical stability

train(**kwargs)

Training sparse GP regression model

Parameters
  • **learning_rate (float) – learning rate

  • **iterations (int) – number of SVI training iteratons

predict(Xtest=None, **kwargs)

Uses trained GP regression model to make predictions

Parameters

Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimensions \(N \times M\) or \(N \times M \times L\). Uses Xtest from __init__ by default. If Xtest is None, uses training data X.

Returns

Predictive mean and standard deviation

run(**kwargs)

Trains the initialized model and calculate predictive mean and variance

Parameters
  • **learning_rate (float) – learning rate for GP regression model training

  • **steps (int) – number of SVI training iteratons

Returns

Predictive mean, standard deviation and dictionary with hyperparameters evolution as a function of SVI steps

step(acquisition_function=None, batch_size=100, batch_update=False, lscale=None, **kwargs)

Performs single train-predict step for exploration analysis returning a new point with maximum value of acquisition function

Parameters
  • acquisition_function (python function) – Function that takes two parameters, mean and sd, and applies some math operation to them (e.g. \(\upmu - 2 \times \upsigma\))

  • batch_size (int) – Number of query points to return

  • batch_update – Filters the query points based on the specified lengthscale

  • lscale (float) – Lengthscale determining the separation (euclidean) distance between query points. Defaults to the kernel lengthscale

  • **learning_rate (float) – Learning rate for GP regression model training

  • **steps (int) – Number of SVI training iteratons

Returns

Lists of indices and values for points with maximum uncertainty, predictive mean and standard deviation (as flattened numpy arrays)

skgpr.py

Gaussian process regression model with a structured kernel interpolation or a spectral mixture kernel. Serves as a high-level wrapper for GPyTorch’s (https://gpytorch.ai) Gaussian process modules with structred kernel interpolation and spectral mixture kernel methods.

Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class skreconstructor(X, y, Xtest=None, kernel='RBF', lengthscale=None, ski=True, learning_rate=0.1, iterations=50, use_gpu=1, verbose=1, seed=0, **kwargs)

GP regression model with structured kernel interpolation or spectral mixture kernel for 2D/3D/4D image data reconstruction

Parameters
  • X (ndarray) – Grid indices with dimension \(c \times N \times M\), \(c \times N \times M \times L\) or \(c \times N \times M \times L \times K\), where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\). Typically, for 2D image N and M are image height and width. For 3D hyperspectral data N and M are spatial dimensions and L is a “spectroscopic” dimension (e.g. voltage or wavelength). For 4D datasets, both L and K are “spectroscopic” dimensions.

  • Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)

  • kernel (str) – Kernel type (‘RBF’ or ‘Matern52’)

  • lengthscale (list of int list of two list with ins) – Determines lower (1st list) and upper (2nd list) bounds for kernel lengthscales. The number of elements in each list is equal to the dataset dimensionality.

  • ski (bool) – Perform structured kernel interpolation GP. Set to True by default.

  • iterations (int) – Number of training steps

  • learning_rate (float) – Learning rate for model training

  • use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’

  • verbose (int) – Level of verbosity (0, 1, or 2)

  • seed (int) – for reproducibility

  • **grid_points_ratio (float) – Ratio of inducing points to overall points

  • **n_mixtures (int) – number of mixtures for spectral mixture kernel

  • **isotropic (bool) – one kernel lengthscale in all dimensions

  • **max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage

  • **num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

  • **precision (str) – Choose between single (‘single’) and double (‘double’) precision

train(**kwargs)

Training GP regression model

Parameters
  • **learning_rate (float) – learning rate

  • **iterations (int) – number of SVI training iteratons

predict(Xtest=None, **kwargs)

Makes a prediction with trained GP regression model

Parameters
  • Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)

  • max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage

  • num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

run()

Combines train and step methods

step(acquisition_function=None, batch_size=100, batch_update=False, lscale=None, **kwargs)

Performs single train-predict step and computes next query point with maximum value of acquisition function. Notice that it doesn’t seem to work properly with a structred kernel.

Parameters
  • acquisition_function (python function) – Function that takes two parameters, mean and sd, and applies some math operation to them (e.g. \(\upmu - 2 \times \upsigma\))

  • batch_size (int) – Number of query points to return

  • batch_update – Filters the query points based on the specified lengthscale

  • lscale (float) – Lengthscale determining the separation (euclidean) distance between query points. Defaults to the kernel lengthscale

  • **learning_rate (float) – Learning rate for GP regression model training

  • **steps (int) – Number of SVI training iteratons

Returns

Lists of indices and values for points with maximum uncertainty, predictive mean and standard deviation (as flattened numpy arrays)

class skgprmodel(X, y, kernel, likelihood, input_dim=3, grid_points_ratio=1.0, do_ski=False)

GP regression model with structured kernel interpolation or spectral mixture kernel.

Parameters
  • X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimension n

  • kernel (gpytorch kernel object) – Kernel

  • likelihood (gpytorch likelihood object) – The Gaussian likelihood

  • input_dim (int) – Number of input dimensions (equal to number of feature vector columns)

  • grid_points_ratio (float) – Ratio of inducing points to overall points

forward(x)

Forward path

vgpr.py

Gaussian process regression model for vector-valued functions. Serves as a high-level wrapper for GPyTorch’s (https://gpytorch.ai) Gaussian processes with correlated and independent output dimensions. Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class vreconstructor(X, y, Xtest=None, kernel='RBF', lengthscale=None, independent=False, learning_rate=0.1, iterations=50, use_gpu=1, verbose=1, seed=0, **kwargs)

Multi-output GP regression model for vector-valued 2D/3D/4D functions.

Parameters
  • X (ndarray) – Grid indices with dimension \(c \times N \times M\), \(c \times N \times M \times L\) or \(c \times N \times M \times L \times K\), where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimension \(N \times M\), \(N \times M \times L \times d\) or \(N \times M \times L \times K \times d\), where d is a number of output dimensions. Typically, for 2D image N and M are image height and width. For 3D hyperspectral data N and M are spatial dimensions and L is a “spectroscopic” dimension (e.g. voltage or wavelength). For 4D datasets, both L and K are “spectroscopic” dimensions.

  • Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)

  • kernel (str) – Kernel type (‘RBF’ or ‘Matern52’)

  • lengthscale (list of int list of two list with ins) – Determines lower (1st list) and upper (2nd list) bounds for kernel lengthscales. The number of elements in each list is equal to the dataset dimensionality.

  • independent (bool) – Indicates whether output dimensions are independent or correlated

  • iterations (int) – Number of training steps

  • learning_rate (float) – Learning rate for model training

  • use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’

  • verbose (int) – Level of verbosity (0, 1, or 2)

  • seed (int) – for reproducibility

  • **isotropic (bool) – one kernel lengthscale in all dimensions

  • **max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage

  • **num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

train(**kwargs)

Training GP regression model

Parameters
  • **learning_rate (float) – learning rate

  • **iterations (int) – number of SVI training iteratons

predict(Xtest=None, **kwargs)

Makes a prediction with trained GP regression model

Parameters
  • Xtest (ndarray) –

  • points ("Test") –

:param with dimension \(N \times M\), \(N \times M \times L\): :param or \(N \times M \times L \times K\):

max_root (int):

Maximum number of Lanczos iterations to perform in prediction stage

num_batches (int):

Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

run()

Combines train and step methods

class vgprmodel(X, y, kernel, likelihood, num_tasks)

GP regression model for vector-valued functions with correlated output dimensions

Parameters
  • X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimension \(n \times d\), where d is number of the function components

  • kernel (gpytorch kernel object) – ‘RBF’ or ‘Matern52’ kernels

  • likelihood (gpytorch likelihood object) – The Gaussian likelihood

  • num_tasks (int) – Number of tasks (equal to number of outputs)

forward(x)
class ivgprmodel(X, y, kernel, likelihood, num_tasks)

GP regression model for vector-valued functions with independent output dimensions

Parameters
  • X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)

  • y (ndarray) – Observations (data points) with dimension \(n \times d\), where d is number of the function components

  • kernel (gpytorch kernel object) – ‘RBF’ or ‘Matern52’ kernels

  • likelihood (gpytorch likelihood object) – The Gaussian likelihood

  • num_tasks (int) – Number of tasks (equal to number of outputs)

forward(x)