GP for 2D4D images¶
gpr.py¶
Gaussian process regression: model training, prediction and uncertainty exploration This module serves as a highlevel wrapper for sparse Gaussian processes module from Pyro probabilistic programming library (https://pyro.ai/) for easy work with scientific image (2D) and hyperspectral (3D) data. Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class
reconstructor
(X, y, Xtest=None, kernel='RBF', lengthscale=None, sparse=False, indpoints=None, learning_rate=0.05, iterations=1000, use_gpu=False, verbose=1, seed=0, **kwargs)¶ Class for Gaussian processbased reconstuction of sparse 2D images and 3D spectroscopic datasets, and exploration/explotaition routines for the selection of next query point.
 Parameters
X (ndarray) – Grid indices with dimensions \(c \times N \times M\) or \(c \times N \times M \times L\) where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimensions \(N \times M\) or \(N \times M \times L\). Typically, for 2D image N and M are image height and width, whereas for 3D hyperspectral data N and M are spatial dimensions and L is a spectorcopic dimension (e.g. voltage or wavelength).
Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimensions \(N \times M\) or \(N \times M \times L\)
kernel (str) – Kernel type (‘RBF’, ‘Matern52’, ‘RationalQuadratic’)
lengthscale (list of int or list of two lists with int) – Determines lower (1st value or 1st list) and upper (2nd value or 2nd list) bounds for kernel lengthscales. For list with two integers, the kernel will have only one lenghtscale, even if the dataset is multidimensional. For lists of two lists, the number of elements in each list must be equal to the dataset dimensionality.
sparse (bool) – Perform sparse GP regression when set to True.
indpoints (int) – Number of inducing points for SparseGPRegression. Defaults to total_number_of_points // 10.
learning_rate (float) – Learning rate for model training
iterations (int) – Number of SVI training iteratons
use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’. Notice that for large datasets training model without GPU is extremely slow.
verbose (int) – Level of verbosity (0, 1, or 2)
seed (int) – for reproducibility
**amplitude (float) – kernel variance or amplitude squared
**precision (str) – Choose between single (‘single’) and double (‘double’) precision
**jitter (float) – Float between 1e4 and 1e6 for numerical stability

train
(**kwargs)¶ Training sparse GP regression model
 Parameters
**learning_rate (float) – learning rate
**iterations (int) – number of SVI training iteratons

predict
(Xtest=None, **kwargs)¶ Uses trained GP regression model to make predictions
 Parameters
Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimensions \(N \times M\) or \(N \times M \times L\). Uses Xtest from __init__ by default. If Xtest is None, uses training data X.
 Returns
Predictive mean and standard deviation

run
(**kwargs)¶ Trains the initialized model and calculate predictive mean and variance
 Parameters
**learning_rate (float) – learning rate for GP regression model training
**steps (int) – number of SVI training iteratons
 Returns
Predictive mean, standard deviation and dictionary with hyperparameters evolution as a function of SVI steps

step
(acquisition_function=None, batch_size=100, batch_update=False, lscale=None, **kwargs)¶ Performs single trainpredict step for exploration analysis returning a new point with maximum value of acquisition function
 Parameters
acquisition_function (python function) – Function that takes two parameters, mean and sd, and applies some math operation to them (e.g. \(\upmu  2 \times \upsigma\))
batch_size (int) – Number of query points to return
batch_update – Filters the query points based on the specified lengthscale
lscale (float) – Lengthscale determining the separation (euclidean) distance between query points. Defaults to the kernel lengthscale
**learning_rate (float) – Learning rate for GP regression model training
**steps (int) – Number of SVI training iteratons
 Returns
Lists of indices and values for points with maximum uncertainty, predictive mean and standard deviation (as flattened numpy arrays)
skgpr.py¶
Gaussian process regression model with a structured kernel interpolation or a spectral mixture kernel. Serves as a highlevel wrapper for GPyTorch’s (https://gpytorch.ai) Gaussian process modules with structred kernel interpolation and spectral mixture kernel methods.
Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class
skreconstructor
(X, y, Xtest=None, kernel='RBF', lengthscale=None, ski=True, learning_rate=0.1, iterations=50, use_gpu=1, verbose=1, seed=0, **kwargs)¶ GP regression model with structured kernel interpolation or spectral mixture kernel for 2D/3D/4D image data reconstruction
 Parameters
X (ndarray) – Grid indices with dimension \(c \times N \times M\), \(c \times N \times M \times L\) or \(c \times N \times M \times L \times K\), where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\). Typically, for 2D image N and M are image height and width. For 3D hyperspectral data N and M are spatial dimensions and L is a “spectroscopic” dimension (e.g. voltage or wavelength). For 4D datasets, both L and K are “spectroscopic” dimensions.
Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)
kernel (str) – Kernel type (‘RBF’ or ‘Matern52’)
lengthscale (list of int list of two list with ins) – Determines lower (1st list) and upper (2nd list) bounds for kernel lengthscales. The number of elements in each list is equal to the dataset dimensionality.
ski (bool) – Perform structured kernel interpolation GP. Set to True by default.
iterations (int) – Number of training steps
learning_rate (float) – Learning rate for model training
use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’
verbose (int) – Level of verbosity (0, 1, or 2)
seed (int) – for reproducibility
**grid_points_ratio (float) – Ratio of inducing points to overall points
**n_mixtures (int) – number of mixtures for spectral mixture kernel
**isotropic (bool) – one kernel lengthscale in all dimensions
**max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage
**num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)
**precision (str) – Choose between single (‘single’) and double (‘double’) precision

train
(**kwargs)¶ Training GP regression model
 Parameters
**learning_rate (float) – learning rate
**iterations (int) – number of SVI training iteratons

predict
(Xtest=None, **kwargs)¶ Makes a prediction with trained GP regression model
 Parameters
Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)
max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage
num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

run
()¶ Combines train and step methods

step
(acquisition_function=None, batch_size=100, batch_update=False, lscale=None, **kwargs)¶ Performs single trainpredict step and computes next query point with maximum value of acquisition function. Notice that it doesn’t seem to work properly with a structred kernel.
 Parameters
acquisition_function (python function) – Function that takes two parameters, mean and sd, and applies some math operation to them (e.g. \(\upmu  2 \times \upsigma\))
batch_size (int) – Number of query points to return
batch_update – Filters the query points based on the specified lengthscale
lscale (float) – Lengthscale determining the separation (euclidean) distance between query points. Defaults to the kernel lengthscale
**learning_rate (float) – Learning rate for GP regression model training
**steps (int) – Number of SVI training iteratons
 Returns
Lists of indices and values for points with maximum uncertainty, predictive mean and standard deviation (as flattened numpy arrays)

class
skgprmodel
(X, y, kernel, likelihood, input_dim=3, grid_points_ratio=1.0, do_ski=False)¶ GP regression model with structured kernel interpolation or spectral mixture kernel.
 Parameters
X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimension n
kernel (gpytorch kernel object) – Kernel
likelihood (gpytorch likelihood object) – The Gaussian likelihood
input_dim (int) – Number of input dimensions (equal to number of feature vector columns)
grid_points_ratio (float) – Ratio of inducing points to overall points

forward
(x)¶ Forward path
vgpr.py¶
Gaussian process regression model for vectorvalued functions. Serves as a highlevel wrapper for GPyTorch’s (https://gpytorch.ai) Gaussian processes with correlated and independent output dimensions. Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

class
vreconstructor
(X, y, Xtest=None, kernel='RBF', lengthscale=None, independent=False, learning_rate=0.1, iterations=50, use_gpu=1, verbose=1, seed=0, **kwargs)¶ Multioutput GP regression model for vectorvalued 2D/3D/4D functions.
 Parameters
X (ndarray) – Grid indices with dimension \(c \times N \times M\), \(c \times N \times M \times L\) or \(c \times N \times M \times L \times K\), where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimension \(N \times M\), \(N \times M \times L \times d\) or \(N \times M \times L \times K \times d\), where d is a number of output dimensions. Typically, for 2D image N and M are image height and width. For 3D hyperspectral data N and M are spatial dimensions and L is a “spectroscopic” dimension (e.g. voltage or wavelength). For 4D datasets, both L and K are “spectroscopic” dimensions.
Xtest (ndarray) – “Test” points (for prediction with a trained GP model) with dimension \(N \times M\), \(N \times M \times L\) or \(N \times M \times L \times K\)
kernel (str) – Kernel type (‘RBF’ or ‘Matern52’)
lengthscale (list of int list of two list with ins) – Determines lower (1st list) and upper (2nd list) bounds for kernel lengthscales. The number of elements in each list is equal to the dataset dimensionality.
independent (bool) – Indicates whether output dimensions are independent or correlated
iterations (int) – Number of training steps
learning_rate (float) – Learning rate for model training
use_gpu (bool) – Uses GPU hardware accelerator when set to ‘True’
verbose (int) – Level of verbosity (0, 1, or 2)
seed (int) – for reproducibility
**isotropic (bool) – one kernel lengthscale in all dimensions
**max_root (int) – Maximum number of Lanczos iterations to perform in prediction stage
**num_batches (int) – Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

train
(**kwargs)¶ Training GP regression model
 Parameters
**learning_rate (float) – learning rate
**iterations (int) – number of SVI training iteratons

predict
(Xtest=None, **kwargs)¶ Makes a prediction with trained GP regression model
 Parameters
Xtest (ndarray) –
points ("Test") –
:param with dimension \(N \times M\), \(N \times M \times L\): :param or \(N \times M \times L \times K\):
 max_root (int):
Maximum number of Lanczos iterations to perform in prediction stage
 num_batches (int):
Number of batches for splitting the Xtest array (for large datasets, you may not have enough GPU memory to process the entire dataset at once)

run
()¶ Combines train and step methods

class
vgprmodel
(X, y, kernel, likelihood, num_tasks)¶ GP regression model for vectorvalued functions with correlated output dimensions
 Parameters
X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimension \(n \times d\), where d is number of the function components
kernel (gpytorch kernel object) – ‘RBF’ or ‘Matern52’ kernels
likelihood (gpytorch likelihood object) – The Gaussian likelihood
num_tasks (int) – Number of tasks (equal to number of outputs)

forward
(x)¶

class
ivgprmodel
(X, y, kernel, likelihood, num_tasks)¶ GP regression model for vectorvalued functions with independent output dimensions
 Parameters
X (ndarray) – Grid indices with dimension \(n \times c\), where n is the number of observation points and c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimension \(n \times d\), where d is number of the function components
kernel (gpytorch kernel object) – ‘RBF’ or ‘Matern52’ kernels
likelihood (gpytorch likelihood object) – The Gaussian likelihood
num_tasks (int) – Number of tasks (equal to number of outputs)

forward
(x)¶