Skip to content

API Reference

Model classes

scdef.scDEF

Bases: object

Single-cell Deep Exponential Families model.

This model learns multi-level gene signatures describing the input scRNA-seq data from an AnnData object.

Parameters:

Name Type Description Default
adata AnnData

AnnData object containing the gene expression data. scDEF learns a model from counts, so they must be present in either adata.X or in adata.layers.

required
counts_layer Optional[str]

layer from adata.layers to get the count data from.

None
layer_sizes Optional[list]

number of factors per scDEF layer.

[100, 60, 30, 10, 1]
batch_key Optional[str]

key in adata.obs containing batch annotations for batch correction. If None, or not found, no batch correction is performed.

'batch'
seed Optional[int]

random seed for JAX

1
logginglevel Optional[int]

verbosity level for logger

INFO
layer_shapes Optional[list]

prior parameters for the z shape to use in each scDEF layer

None
brd_strength Optional[float]

BRD prior concentration parameter

1000.0
brd_mean Optional[float]

BRD prior mean parameter

0.01
use_brd Optional[bool]

whether to use the BRD prior for factor relevance estimation

True
cell_scale_shape Optional[float]

concentration level in the cell scale prior

1.0
gene_scale_shape Optional[float]

concentration level in the gene scale prior

1.0
factor_shapes Optional[list]

prior parameters for the W shape to use in each scDEF layer

None
factor_rates Optional[list]

prior parameters for the W rate to use in each scDEF layer

None
layer_diagonals Optional[list]

prior diagonal strengths for the W parameters in each scDEF layer

None
batch_cpal Optional[str]

default color palette for batch annotations

'Dark2'
layer_cpal Optional[list]

default color palettes for scDEF layers

None
lightness_mult Optional[float]

multiplier to define lightness of color palette at each scDEF layer

0.15

filter_factors(thres=None, iqr_mult=0.0, min_cells=0.005, filter_up=True)

Filter our irrelevant factors based on the BRD posterior or the cell attachments.

Parameters:

Name Type Description Default
thres Optional[float]

minimum factor BRD value

None
iqr_mult Optional[float]

multiplier of the difference between the third quartile and the median BRD values to set the threshold

0.0
min_cells Optional[float]

minimum number of cells that each factor must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value

0.005
filter_up Optional[bool]

whether to remove factors in upper layers via inter-layer attachments

True

get_hierarchy(simplified=True)

Get a dictionary containing the polytree contained in the scDEF graph.

Parameters:

Name Type Description Default
simplified Optional[bool]

whether to collapse single-child nodes

True

Returns:

Name Type Description
hierarchy Mapping[str, Sequence[str]]

the dictionary containing the hierarchy

learn(n_epoch=[1000, 1000], lr=0.1, annealing=1.0, num_samples=10, batch_size=None, layerwise=False)

Fit a variational approximation to the posterior over scDEF parameters.

Parameters:

Name Type Description Default
n_epoch Optional[Union[int, list]]

number of epochs (full passes of the data). Can be a list of ints for multi-step learning.

[1000, 1000]
lr Optional[Union[float, list]]

learning rate. Can be a list of floats for multi-step learning.

0.1
annealing Optional[Union[float, list]]

scale factor for the entropy term. Can be a list of floats for multi-step learning.

1.0
num_samples Optional[int]

number of Monte Carlo samples to use in the ELBO approximation.

10
batch_size Optional[int]

number of data points to use per iteration. If None, uses all. Useful for data sets that do not fit in GPU memory.

None
layerwise Optional[bool]

whether to optimize the model parameters in a step-wise manner: first learn only Layer 0 and 1, and then 2, and then 3, and so on. The size of the n_epoch or lr schedules will be ignored, only the first value will be used and each step will use that n_epoch value.

False

make_graph(hierarchy=None, show_all=False, factor_annotations=None, top_factor=None, show_signatures=True, enrichments=None, top_genes=None, show_batch_counts=False, filled=None, wedged=None, color_edges=True, show_confidences=False, mc_samples=100, n_cells_label=False, n_cells=False, node_size_max=2.0, node_size_min=0.05, scale_level=False, show_label=True, gene_score=None, gene_cmap='viridis', **fontsize_kwargs)

Make Graphviz-formatted scDEF graph.

Parameters:

Name Type Description Default
hierarchy Optional[dict]

a dictionary containing the polytree to draw instead of the whole graph

None
show_all Optional[bool]

whether to show all factors even post filtering

False
factor_annotations Optional[dict]

factor annotations to include in the node labels

None
top_factor Optional[str]

only include factors below this factor

None
show_signatures Optional[bool]

whether to show the ranked gene signatures in the node labels

True
enrichments Optional[DataFrame]

enrichment results from gseapy to include in the node labels

None
top_genes Optional[int]

number of genes from each signature to be shown in the node labels

None
show_batch_counts Optional[bool]

whether to show the number of cells from each batch that attach to each factor

False
filled Optional[str]

key from self.adata.obs to use to fill the nodes with

None
wedged Optional[str]

key from self.adata.obs to use to wedge the nodes with

None
color_edges Optional[bool]

whether to color the graph edges according to the upper factors

True
show_confidences Optional[bool]

whether to show the confidence score for each signature

False
mc_samples Optional[int]

number of Monte Carlo samples to take from the posterior to compute signature confidences

100
n_cells_label Optional[bool]

wether to show the number of cells that attach to the factor

False
n_cells Optional[bool]

wether to scale the node sizes by the number of cells that attach to the factor

False
node_size_max Optional[int]

maximum node size when scaled by cell numbers

2.0
node_size_min Optional[int]

minimum node size when scaled by cell numbers

0.05
scale_level Optional[bool]

wether to scale node sizes per level instead of across all levels

False
show_label Optional[bool]

wether to show labels on nodes

True
gene_score Optional[str]

color the nodes by the score they attribute to a gene, normalized by layer. Overrides filled and wedged

None
gene_cmap Optional[str]

colormap to use for gene_score

'viridis'
**fontsize_kwargs

keyword arguments to adjust the fontsizes according to the gene scores

{}

plot_multilevel_paga(neighbors_rep='X_factors', layers=None, figsize=(16, 4), reuse_pos=True, fontsize=12, show=True, **paga_kwargs)

Plot a PAGA graph from each scDEF layer.

Parameters:

Name Type Description Default
neighbors_rep Optional[str]

the self.obsm key to use to compute the PAGA graphs

'X_factors'
layers Optional[list]

which layers to plot

None
figsize Optional[tuple]

figure size

(16, 4)
reuse_pos Optional[bool]

whether to initialize each PAGA graph with the graph from the layer above

True
show Optional[bool]

whether to show the plot

True
**paga_kwargs

keyword arguments to adjust the PAGA layouts

{}

plot_obs_scores(obs_keys, hierarchy=None, mode='fracs', **kwargs)

Plot the association between a set of cell annotations and factors.

Parameters:

Name Type Description Default
obs_keys Sequence[str]

the keys in self.adata.obs to use

required
hierarchy Optional[dict]

the polytree to restrict the associations to

None
mode Literal['f1', 'fracs', 'weights']

whether to compute scores based on assignments or weights

'fracs'
**kwargs

plotting keyword arguments

{}

plot_pathway_scores(pathways, top_genes=20, **kwargs)

Plot the association between a set of cell annotations and a set of gene signatures.

Parameters:

Name Type Description Default
obs_keys

the keys in self.adata.obs to use

required
pathways DataFrame

a pandas DataFrame containing PROGENy pathways

required
**kwargs

plotting keyword arguments

{}

plot_signatures_scores(obs_keys, markers, top_genes=10, hierarchy=None, **kwargs)

Plot the association between a set of cell annotations and a set of gene signatures.

Parameters:

Name Type Description Default
obs_keys Sequence[str]

the keys in self.adata.obs to use

required
markers Mapping[str, Sequence[str]]

a dictionary with keys corresponding to self.adata.obs[obs_keys] and values to gene lists

required
top_genes Optional[int]

number of genes to consider in the score computations

10
hierarchy Optional[dict]

the polytree to restrict the associations to

None
**kwargs

plotting keyword arguments

{}

scdef.iscDEF

Bases: scDEF

Informed scDEF model.

This model extends the basic scDEF by using gene sets to guide the factors. iscDEF can either set the given sets as top layer factors and learn higher-resolution structure, or use them as the lowest resolution and learn a hierarchy that relates them. All the methods from scDEF are available in iscDEF.

Parameters:

Name Type Description Default
adata AnnData

AnnData object containing the gene expression data. scDEF learns a model from counts, so they must be present in either adata.X or in adata.layers.

required
markers_dict Mapping[str, Sequence[str]]

dictionary containing named gene lists.

required
add_other Optional[bool]

whether to add factors for cells which don't express any of the sets in markers_dict.

False
markers_layer Optional[int]

scDEF layer at which the gene sets are defined. If > 0, this defines the number of layers.

0
n_factors_per_set Optional[int]

number of lower level factors per gene set.

3
n_layers Optional[int]

default number of scDEF layers, including a top layer of size 1 if markers_layer is 0.

4
cn_small_scale

scale for low connectivity

required
cn_big_scale

scale for large connectivity

required
cn_small_strength Optional[float]

strength for weak connectivity

100.0
cn_big_strength Optional[float]

strength for large connectivity

1.0
gs_small_scale Optional[float]

scale for genes not in set

0.1
gs_big_scale Optional[float]

scale for genes in set

10.0
marker_strength Optional[float]

strength for marker genes

100.0
nonmarker_strength Optional[float]

strength for non-marker genes

1.0
other_strength Optional[float]

strength for marker genes of other sets

100.0
**kwargs

keyword arguments for base scDEF.

{}

Benchmarking utilities

scdef.benchmark.run_multiple_resolutions(method, ad, resolution_sweep, layer_prefix='h', **kwargs)

Run a clustering and gene signature learning method at multiple resolutions.

Parameters:

Name Type Description Default
method Callable

the function that runs the method. Must take a resolution parameter as argument and return a list containing at least an AnnData object, a matrix containing the latent space, and a list of genes per cluster.

required
ad AnnData

the data to run the method on.

required
resolution_sweep Sequence[float]

list of resolution parameters to use.

required

Returns:

Name Type Description
outs Mapping

dictionary containing all the outputs from the method across all resolutions. Keys: ["latents", "signatures", "assignments", "scores", "sizes", "simplified_hierarchy"]