Deconvolution and Decomposition

class deconvolution.Evaluation(proportion_truth, proportion_estimated_list, methods, out_dir='', cluster=None, type_list=None, colors=None, coordinates=None, min_spot_distance=112)[source]

Bases: object

static JSD(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]

Jensen–Shannon divergence :param proportion_truth: Ground truth of the cell proportion. :param proportion_estimated: Estimated proportion. :param metric_type: How the metric is calculated.

static absolute_error(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]
static correct_fraction(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]
static correlation(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]
static cosine(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]
evaluate_metric(metric='Cosine similarity', metric_type='Spot', region=None)[source]

Evaluate the proportions based on the metric.

Parameters:
  • metric – Name of the metric.

  • metric_type – How the metric is calculated. ‘Spot’: metric is calculated for each spot; ‘Cell type’, metric is calculated for each cell type; ‘Individual’: metric is calculated for each individual proportion estimation.

  • region – The region that is being evaluated.

plot_metric(save=False, region=None, metric='Cosine similarity', metric_type='Spot', cell_types=None, suffix='', show=True)[source]

Plot the box plot of each method based on the metric.

Box number equals to the number of methods.

Parameters:
  • save – If true, save the figure.

  • region – Regions of the tissue.

  • metric – Name of the metric.

  • metric_type – How the metric is calculated. ‘Spot’: metric is calculated for each spot; ‘Cell type’, metric is calculated for each cell type; ‘Individual’: metric is calculated for each individual proportion estimation.

  • cell_types – If metric_type is ‘Cell type’ and cell_types is not None, then only plot the results corresponding to the cell_types.

  • suffix – suffix of the save file.

  • show – Whether to show the figure

plot_metric_all(save=False, metric='Absolute error', region=None)[source]
plot_metric_spot_type(save=False, metric='Absolute error')[source]

Similar to plot_metric_spot, but the figures are separated for each cell type.

static square_error(proportion_truth: ndarray, proportion_estimated: ndarray, metric_type='Spot')[source]
deconvolution.archive_assign_type_out_gp(nucleus_df, cell_proportion, spot_centers, type_list, max_distance=100, return_gp=False)[source]

Assign the cell type to the cells outside the spot.

Parameters:
  • nucleus_df – Dataframe of the nucleus. Part of spotiphy.segmentation.Segmentation.

  • spot_centers – Centers of the spots.

  • cell_proportion – Proportion of each cell type in each spot.

  • type_list – List of the cell types.

  • max_distance – If the distance between a nucleus and the closest spot is larger than max_distance, the cell type will not be assigned to this nucleus.

  • return_gp – If return the fitted GP models.

Returns:

nucleus_df with assigned spot

deconvolution.assign_type_out(nucleus_df, cell_proportion, spot_centers, type_list, max_distance=100, band_width=100)[source]

Assign the cell type to the cells outside the spot.

Parameters:
  • nucleus_df – Dataframe of the nucleus. Part of spotiphy.segmentation.Segmentation.

  • spot_centers – Centers of the spots.

  • cell_proportion – Proportion of each cell type in each spot.

  • type_list – List of the cell types.

  • max_distance – If the distance between a nucleus and the closest spot is larger than max_distance, the cell type will not be assigned to this nucleus.

  • band_width – Band width of the kernel.

Returns:

nucleus_df with assigned spot

deconvolution.assign_type_spot(nucleus_df, n_cell_df, cell_number, type_list)[source]

Assign the cell type to the cells inside the spot.

Parameters:
  • nucleus_df – Dataframe of the nucleus. Part of spotiphy.segmentation.Segmentation.

  • n_cell_df – Dataframe of the number of cells in each spot. Part of spotiphy.segmentation.Segmentation.

  • cell_number – Number of each cell type in each spot.

  • type_list – List of the cell types.

Returns:

nucleus_df with assigned spot

deconvolution.decomposition(adata_st: AnnData, adata_sc: AnnData, key_type: str, cell_proportion: ndarray, save=True, out_dir='', threshold=0.1, n_cell=None, spot_location: ndarray | None = None, filtering_gene=False, filename='ST_decomposition.h5ad', verbose=0, use_original_proportion=False)[source]

Decompose ST.

Parameters:
  • adata_st – Original spatial transcriptomics data.

  • adata_sc – Original single-cell data.

  • key_type – The key that is used to extract cell type information from adata_sc.obs.

  • cell_proportion – Proportion of each cell type obtained by the deconvolution.

  • save – If True, save the generated adata_st as a file.

  • out_dir – Output directory.

  • threshold – If n_cell is none, discard cell types with proportion less than threshold.

  • n_cell – Number of cells in each spot.

  • spot_location – Coordinates of the spots.

  • filtering_gene – Whether filter the genes in sc_reference.initialization.

  • filename – Name of the saved file.

  • verbose – Whether print the time spend.

  • use_original_proportion – If the original proportion is used to estimate the iscRNA. Note that even when the original proportion is used, we still filter some cells in iscRNA.

Returns:

Anndata similar to scRNA, but obtained by decomposing ST.

Return type:

adata_st_decomposed

deconvolution.deconvolute(X, sc_ref, device='cuda', n_epoch=8000, adam_params=None, batch_prior=2, plot=False, fig_size=(4.8, 3.6), dpi=200)[source]

Deconvolution of the proportion of genes contributed by each cell type.

Parameters:
  • X – Spatial transcriptomics data. n_spot*n_gene.

  • sc_ref – Single cell reference. n_type*n_gene.

  • device – The device used for the deconvolution.

  • plot – Whether to plot the ELBO loss.

  • n_epoch – Number of training epochs.

  • adam_params – Parameters for the adam optimizer.

  • batch_prior – Parameter of the prior distribution of the batch effect: 2^(Uniform(0, batch_prior))

  • fig_size – Size of the figure.

  • dpi – Dots per inch (DPI) of the figure.

Returns:

Parameters in the generative model.

deconvolution.estimation_proportion(X, adata_sc, sc_ref, type_list, key_type, device='cuda', n_epoch=8000, adam_params=None, batch_prior=2, plot=False, fig_size=(4.8, 3.6), dpi=200)[source]

Estimate the proportion of each cell type in each spot.

Parameters:
  • X – Spatial transcriptomics data. n_spot*n_gene.

  • adata_sc (anndata.Anndata) – scRNA data.

  • sc_ref – Single cell reference. n_type*n_gene.

  • type_list – List of the cell types.

  • key_type – Column name of the cell types in adata_sc.

  • device – The device used for the deconvolution.

  • plot – Whether to plot the ELBO loss.

  • n_epoch – Number of training epochs.

  • adam_params – Parameters for the adam optimizer.

  • batch_prior – Parameter of the prior Dirichlet distribution of the batch effect: 2^(Uniform(0, batch_prior))

  • fig_size – Size of the figure.

  • dpi – Dots per inch (DPI) of the figure.

Returns:

Parameters in the generative model.

deconvolution.plot_proportion(img, proportion, spot_location, radius, cmap_name='viridis', alpha=0.4, save_path='proportion.png', vmax=0.98, spot_scale=1.3, show_figure=False, int_ticks=False, bar_location=(5800, 8100))[source]

Plot the proportion of a cell type.

Parameters:
  • img – 3 channel img with integer values in [0, 255]

  • proportion – Proportion of a cell type.

  • spot_location – Location of the spots.

  • radius – Radius of the spot

  • cmap_name – Name of the camp.

  • alpha – Level of transparency of the background img.

  • save_path – If not none, save the img to the path.

  • vmax – Quantile of the maximum value in the color bar.

  • spot_scale – Scale of the spot in the figure.

  • show_figure – Whether plot the figure.

  • int_ticks – Whether the ticks must be integers.

deconvolution.proportion_to_count(p, n, multiple_spots=False)[source]

Convert the cell proportion to the absolute cell number.

Parameters:
  • p – Cell proportion.

  • n – Number of cells.

  • multiple_spots – If the data is related to multiple spots

Returns:

Cell count of each cell type.

deconvolution.simulation(adata_st: AnnData, adata_sc: AnnData, key_type: str, cell_proportion: ndarray, n_cell=10, batch_effect_sigma=0.1, zero_proportion=0.3, additive_noise_sigma=0.05, save=True, out_dir='', filename='ST_Simulated.h5ad', verbose=0)[source]

Simulation of the spatial transcriptomics data based on a real spatial sample and deconvolution results of Spotiphy.

Parameters:
  • adata_st – Original spatial transcriptomics data.

  • adata_sc – Original single-cell data.

  • key_type – The key that is used to extract cell type information from adata_sc.obs.

  • cell_proportion – Proportion of each cell type obtained by the deconvolution.

  • n_cell – Number of cells in each spot, either a key of adata_st.obs or a positive integer.

  • batch_effect_sigma – Sigma of the log-normal distribution when generate batch effect.

  • zero_proportion – Proportion of gene expression set to 0. Note that since some gene expression in the original X is already 0, the final proportion of 0 gene read is larger than zero_proportion.

  • additive_noise_sigma – Sigma of the log-normal distribution when generate additive noise.

  • save – If True, save the generated adata_st as a file.

  • out_dir – Output directory.

  • filename – Name of the saved file.

  • verbose – Whether print the time spend.

Returns:

Simulated ST Anndata.