Utility functions¶
gprutils.py¶
Utility functions for the analysis of sparse image and hyperspectral data with Gaussian processes.
Author: Maxim Ziatdinov (email: maxim.ziatdinov@ai4microcopy.com)

prepare_training_data
(X, y=None, vector_valued=False, **kwargs)¶ Reshapes and converts data to torch tensors for GP analysis
 Parameters
X (ndarray) – Grid indices with dimensions \(c \times N \times M \times L\), where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
y (ndarray) – Observations (data points) with dimensions N x M x L
**precision (str) – Choose between single (‘single’) and double (‘double’) precision
 Returns
Pytorch tensors with dimensions \(N \times M \times L \times c\) and \(N \times M \times L\)

prepare_test_data
(X, **kwargs)¶ Reshapes and converts data to torch tensors for GP analysis
 Parameters
X (ndarray) – Grid indices with dimensions \(c \times N \times M \times L\) where c is equal to the number of coordinates (for example, for xyz coordinates, c = 3)
**precision (str) – Choose between single (‘single’) and double (‘double’) precision
 Returns
Pytorch tensor with dimensions \(N \times M \times L \times c\)

get_grid_indices
(R, dense_x=1.0)¶ Returns full and sparse grid indices for 2D and 3D arrays
 Parameters
R (ndarray) – Sparse grid measurements as 2D or 3D numpy array
dense_x (float) – Determines grid density (can be increased at prediction stage)

get_full_grid
(R, extent=None, dense_x=1.0)¶ Creates grid indices for 2D4D numpy arrays
 Parameters
R (ndarray) – Grid measurements as 2D4D numpy array
extent (list of lists) – Define multidimensional data bounds. For example, for 2D data, the extent parameter is [[xmin, xmax], [ymin, ymax]]
dense_x (float) – Determines grid density (can be increased at prediction stage)
 Returns
Grid indices as numpy array

get_sparse_grid
(R, extent=None)¶ Returns sparse grid for sparse image data
 Parameters
R (ndarray) – Sparse grid measurements (missing values are NaNs)
 Returns
Sparse grid indices

to_constrained_interval
(state_dict, lscale, amp)¶ Transforms kernel’s unconstrained lenghscale and variance to their constrained domains (intervals)
 Parameters
state_dict (dict) – Kernel’s state dictionary; can be obtained from self.spgr.kernel.state_dict
lscale (list) – List of two lists with lower and upper bound(s) for lenghtscale prior. Number of elements in each list is usually equal to the number of (independent) input dimensions
amp (list) – List with two floats corresponding to lower and upper bounds for variance (square of amplitude) prior
 Returns
Lengthscale and variance in the constrained domain (interval)

corrupt_data_xy
(X_true, R_true, prob=0.5, replace_w_zeros=False)¶ Replaces certain % of 2D or 3D image data with NaNs; see gprutils.corrupt_image2d and gprutils.corrupt_image3d
 Parameters
X_true (ndarray) – Grid indices for 2D image or 3D hyperspectral data (3D and 4D numpy arrays, respectively)
R_true (ndarray) – Observations as 2D image or 3D hyperspectral data
prob (float) – Controls % of data to be corrupted (takes values between 0 and 1)
replace_w_zeros (bool) – Corrupts data with zeros instead of NaNs
 Returns
ndarays of grid indices (3D or 4D) and observations (2D or 3D)

corrupt_image2d
(X_true, R_true, prob, replace_w_zeros)¶ Replaces certain % of 2D image data with NaNs.
 Parameters
X_true (ndarray) – 3D array with grid indices for 2D image
R_true (ndarray) – 2D image with observations
prob (float) – Controls % of data to be corrupted (takes values between 0 and 1)
replace_w_zeros (bool) – Corrupts data with zeros instead of NaNs
 Returns
3D ndarray of grid coordinates and 2D ndarray of observatons where the part of points is replaced with NaNs.

corrupt_image3d
(X_true, R_true, prob, replace_w_zeros)¶ Replaces certain % of 3D hyperspectral data with NaNs. Applies differently in xy and in z dimensions. Specifically, for every corrupted (x, y) point we remove all z values associated with this point.
 Parameters
X_true (ndarray) – 4D array with grid indices for 3D hyperspectral data
R_true (ndarray) – 3D hyperspectral data with observations
prob (float) – Controls % of data to be corrupted (takes values between 0 and 1)
replace_w_zeros (bool) – Corrupts data with zeros instead of NaNs
 Returns
4D ndarray of grid coordinates and 3D ndarray of observatons where certain % of points is replaced with NaNs (note that for every corrupted (x, y) point we remove all z values associated with this point)

open_edge_points
(R, R_true, s=6)¶ Opens measured curves at the edges of FOV
 Parameters
R (ndarray) – empty/sparse data
R_true (ndarray) – “ground truth”
s (int) – step value, which determines the density of opened edge points
 Returns
3D ndarray with opened edge points

plot_kernel_hyperparams
(hyperparams)¶ Plots evolution of kernel hyperparameters as a function of training steps
 Parameters
hyperparams (dict) – dictionary with kernel hyperparameters (see gpreg.gpr.reconstructor)

plot_mixture_hyperparams
(hyperparams)¶ Plots evolution of spectral mixture kernel hyperparameters as a function of training iterations
 Parameters
hyperparams (dict) – dictionary with kernel hyperparameters (see gpreg.skgpr.skreconstructor)

plot_raw_data
(raw_data, slice_number, pos, spec_window=2, norm=False, **kwargs)¶ Plots hyperspectral data as 2D image integrated over a certain range of energy/frequency and selected individual spectroscopic curves
 Parameters
raw_data (3D ndarray) – hyperspectral cube (the first two dimensions are xy coordinates and the last dimension is a “spectroscopic” dimension)
slice_number (int) – slice from datacube to visualize
pos (list of lists) – list with [x, y] coordinates of points where single spectroscopic curves will be extracted and visualized
spec_window (int) – window to integrate over in frequency dimension (for 2D “slices”)
**cmap (str) – cmap for 2D image (“slice”) plot
**z_vec (1D ndarray) – spectroscopic measurements values (e.g. frequency, bias)
**z_vec_label (str) – spectroscopic measurements label (e.g. frequency, bias voltage)
**z_vec_units (str) – spectroscopic measurements units (e.g. Hz, V)

plot_reconstructed_data2d
(R, mean, save_fig=False, **kwargs)¶ Plots original and GPreconstructed data for 2D images
 Parameters
R (2D ndarray) – Input image for GP regression
mean (1D ndarray) – Predictive mean, usually an output of gpr.reconstructor or skgpr.skreconstructor. The array is flattened (the actual dimensions are the same as for R)
**cmap (str) – cmap for 2D image plot
**savedir (str) – directory to save output figure
**filepath (str) – name of input file (to create a unique filename for plot)
**sparsity (float) – indicates % of data points removed (used only for figure title)

plot_reconstructed_data3d
(R, mean, sd, slice_number, pos, spec_window=2, save_fig=False, **kwargs)¶ Plots original and GPreconstructed data for 3D images
 Parameters
R (3D ndarray) – Input image for GP regression
mean (1D ndarray) – Predictive mean, usually an output of gpr.reconstructor or skgpr.skreconstructor. The array is flattened (the actual dimensions are the same as for R)
sd (1D ndarray) – Standard deviation (can be flattened; actual dimensions are the same as in R)
slice_number (int) – slice from datacube to visualize
pos (list of lists) – list with [x, y] coordinates of points where single spectroscopic curves will be extracted and visualized
spec_window (int) – window to integrate over in frequency dimension (for 2D “slices”)
**cmap (str) – colormap for 2D image (“slices”) plots
**savedir (str) – directory to save output figure
**sparsity (float) – indicates % of data points removed (used only for figure title)
**filepath (str) – path/name of input file (to create a unique filename for plot)
**z_vec (1D ndarray) – spectroscopic measurements values (e.g. frequency, bias)
**z_vec_label (str) – spectroscopic measurements label (e.g. frequency, bias voltage)
**z_vec_units (str) – spectroscopic measurements units (e.g. Hz, V)

plot_exploration_results
(R_all, mean_all, sd_all, R_true, episodes, slice_number, pos, dist_edge, spec_window=2, mask_predictions=False, **kwargs)¶ Plots predictions at different stages (“episodes”) of maximum uncertaintybased sample exploration with GP
 Parameters
R_all (list with ndarrays) – Observed data points at each exploration step
mean_all (list of ndarrays) – Predictive mean at each exploration step
sd_all (list of ndarrays) – Integrated (along energy dimension) SD at each exploration step
R_true (ndarray) – 3D array with ground truth data (full observations) for simulated experiment OR a 3D array of zeros/NaNs for real experiment
episodes (list of ints) – list with the numbers indicating which iteration steps to visualize
slice_number (int) – slice from datacube to visualize
pos (list of lists) – list with [x, y] coordinates of points where single spectroscopic curves will be extracted and visualized
dist_edge (list with two integers) – this should be the same as in exploration analysis
spec_win (int) – window to integrate over in frequency dimension (for 2D “slices”)
mask_predictions (bool) – mask edge regions not used in max uncertainty evaluation in predictive mean plots
**sparsity (float) – indicates % of data points removed (used only for figure title)
**z_vec (1D ndarray) – spectroscopic measurements values (e.g. frequency, bias)
**z_vec_label (str) – spectroscopic measurements label (e.g. frequency, bias voltage)
**z_vec_units (str) – spectroscopic measurements units (e.g. Hz, V)

plot_inducing_points
(hyperparams, **kwargs)¶ Plots inducing points evolution during training

plot_inducing_points_2d
(hyperparams, **kwargs)¶ Plots 2D trajectories if inducing points
 Parameters
hyperparams (dict) – Dictionary of hyperparameters
**plot_from (int) – plot from specific step
**plot_to (int) – plot till specific step
**slice_step (int) – plot every nth inducing point

plot_inducing_points_3d
(hyperparams, **kwargs)¶ Plots 3D trajectories if inducing points during model training
 Parameters
hyperparams (dict) – dictionary of hyperparameters
plot_from (int) – plot from specific step
plot_to (int) – plot till specific step
slice_step (int) – plot every nth inducing point

plot_query_points
(inds_all, **kwargs)¶ Plots the exploration path (all the query points) in GPbased Bayesian optimization. Currently supports only 2D data.
 Parameters
inds_all (list) – list of indices
**cmap (str) – colormap