API Reference
Model classes
scdef.scDEF
Bases: object
Single-cell Deep Exponential Families model.
This model learns multi-level gene signatures describing the input scRNA-seq data from an AnnData object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata |
AnnData
|
AnnData object containing the gene expression data. scDEF learns a model from counts, so they must be present in either adata.X or in adata.layers. |
required |
counts_layer |
Optional[str]
|
layer from adata.layers to get the count data from. |
None
|
layer_sizes |
Optional[list]
|
number of factors per scDEF layer. |
[100, 60, 30, 10, 1]
|
batch_key |
Optional[str]
|
key in adata.obs containing batch annotations for batch correction. If None, or not found, no batch correction is performed. |
'batch'
|
seed |
Optional[int]
|
random seed for JAX |
1
|
logginglevel |
Optional[int]
|
verbosity level for logger |
INFO
|
layer_shapes |
Optional[list]
|
prior parameters for the z shape to use in each scDEF layer |
None
|
brd_strength |
Optional[float]
|
BRD prior concentration parameter |
1000.0
|
brd_mean |
Optional[float]
|
BRD prior mean parameter |
0.01
|
use_brd |
Optional[bool]
|
whether to use the BRD prior for factor relevance estimation |
True
|
cell_scale_shape |
Optional[float]
|
concentration level in the cell scale prior |
1.0
|
gene_scale_shape |
Optional[float]
|
concentration level in the gene scale prior |
1.0
|
factor_shapes |
Optional[list]
|
prior parameters for the W shape to use in each scDEF layer |
None
|
factor_rates |
Optional[list]
|
prior parameters for the W rate to use in each scDEF layer |
None
|
layer_diagonals |
Optional[list]
|
prior diagonal strengths for the W parameters in each scDEF layer |
None
|
batch_cpal |
Optional[str]
|
default color palette for batch annotations |
'Dark2'
|
layer_cpal |
Optional[list]
|
default color palettes for scDEF layers |
None
|
lightness_mult |
Optional[float]
|
multiplier to define lightness of color palette at each scDEF layer |
0.15
|
filter_factors(thres=None, iqr_mult=0.0, min_cells=0.005, filter_up=True)
Filter our irrelevant factors based on the BRD posterior or the cell attachments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
thres |
Optional[float]
|
minimum factor BRD value |
None
|
iqr_mult |
Optional[float]
|
multiplier of the difference between the third quartile and the median BRD values to set the threshold |
0.0
|
min_cells |
Optional[float]
|
minimum number of cells that each factor must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value |
0.005
|
filter_up |
Optional[bool]
|
whether to remove factors in upper layers via inter-layer attachments |
True
|
get_hierarchy(simplified=True)
Get a dictionary containing the polytree contained in the scDEF graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
simplified |
Optional[bool]
|
whether to collapse single-child nodes |
True
|
Returns:
Name | Type | Description |
---|---|---|
hierarchy |
Mapping[str, Sequence[str]]
|
the dictionary containing the hierarchy |
learn(n_epoch=[1000, 1000], lr=0.1, annealing=1.0, num_samples=10, batch_size=None, layerwise=False)
Fit a variational approximation to the posterior over scDEF parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_epoch |
Optional[Union[int, list]]
|
number of epochs (full passes of the data). Can be a list of ints for multi-step learning. |
[1000, 1000]
|
lr |
Optional[Union[float, list]]
|
learning rate. Can be a list of floats for multi-step learning. |
0.1
|
annealing |
Optional[Union[float, list]]
|
scale factor for the entropy term. Can be a list of floats for multi-step learning. |
1.0
|
num_samples |
Optional[int]
|
number of Monte Carlo samples to use in the ELBO approximation. |
10
|
batch_size |
Optional[int]
|
number of data points to use per iteration. If None, uses all. Useful for data sets that do not fit in GPU memory. |
None
|
layerwise |
Optional[bool]
|
whether to optimize the model parameters in a step-wise manner: first learn only Layer 0 and 1, and then 2, and then 3, and so on. The size of the n_epoch or lr schedules will be ignored, only the first value will be used and each step will use that n_epoch value. |
False
|
make_graph(hierarchy=None, show_all=False, factor_annotations=None, top_factor=None, show_signatures=True, enrichments=None, top_genes=None, show_batch_counts=False, filled=None, wedged=None, color_edges=True, show_confidences=False, mc_samples=100, n_cells_label=False, n_cells=False, node_size_max=2.0, node_size_min=0.05, scale_level=False, show_label=True, gene_score=None, gene_cmap='viridis', **fontsize_kwargs)
Make Graphviz-formatted scDEF graph.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hierarchy |
Optional[dict]
|
a dictionary containing the polytree to draw instead of the whole graph |
None
|
show_all |
Optional[bool]
|
whether to show all factors even post filtering |
False
|
factor_annotations |
Optional[dict]
|
factor annotations to include in the node labels |
None
|
top_factor |
Optional[str]
|
only include factors below this factor |
None
|
show_signatures |
Optional[bool]
|
whether to show the ranked gene signatures in the node labels |
True
|
enrichments |
Optional[DataFrame]
|
enrichment results from gseapy to include in the node labels |
None
|
top_genes |
Optional[int]
|
number of genes from each signature to be shown in the node labels |
None
|
show_batch_counts |
Optional[bool]
|
whether to show the number of cells from each batch that attach to each factor |
False
|
filled |
Optional[str]
|
key from self.adata.obs to use to fill the nodes with |
None
|
wedged |
Optional[str]
|
key from self.adata.obs to use to wedge the nodes with |
None
|
color_edges |
Optional[bool]
|
whether to color the graph edges according to the upper factors |
True
|
show_confidences |
Optional[bool]
|
whether to show the confidence score for each signature |
False
|
mc_samples |
Optional[int]
|
number of Monte Carlo samples to take from the posterior to compute signature confidences |
100
|
n_cells_label |
Optional[bool]
|
wether to show the number of cells that attach to the factor |
False
|
n_cells |
Optional[bool]
|
wether to scale the node sizes by the number of cells that attach to the factor |
False
|
node_size_max |
Optional[int]
|
maximum node size when scaled by cell numbers |
2.0
|
node_size_min |
Optional[int]
|
minimum node size when scaled by cell numbers |
0.05
|
scale_level |
Optional[bool]
|
wether to scale node sizes per level instead of across all levels |
False
|
show_label |
Optional[bool]
|
wether to show labels on nodes |
True
|
gene_score |
Optional[str]
|
color the nodes by the score they attribute to a gene, normalized by layer. Overrides filled and wedged |
None
|
gene_cmap |
Optional[str]
|
colormap to use for gene_score |
'viridis'
|
**fontsize_kwargs |
keyword arguments to adjust the fontsizes according to the gene scores |
{}
|
plot_multilevel_paga(neighbors_rep='X_factors', layers=None, figsize=(16, 4), reuse_pos=True, fontsize=12, show=True, **paga_kwargs)
Plot a PAGA graph from each scDEF layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
neighbors_rep |
Optional[str]
|
the self.obsm key to use to compute the PAGA graphs |
'X_factors'
|
layers |
Optional[list]
|
which layers to plot |
None
|
figsize |
Optional[tuple]
|
figure size |
(16, 4)
|
reuse_pos |
Optional[bool]
|
whether to initialize each PAGA graph with the graph from the layer above |
True
|
show |
Optional[bool]
|
whether to show the plot |
True
|
**paga_kwargs |
keyword arguments to adjust the PAGA layouts |
{}
|
plot_obs_scores(obs_keys, hierarchy=None, mode='fracs', **kwargs)
Plot the association between a set of cell annotations and factors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs_keys |
Sequence[str]
|
the keys in self.adata.obs to use |
required |
hierarchy |
Optional[dict]
|
the polytree to restrict the associations to |
None
|
mode |
Literal['f1', 'fracs', 'weights']
|
whether to compute scores based on assignments or weights |
'fracs'
|
**kwargs |
plotting keyword arguments |
{}
|
plot_pathway_scores(pathways, top_genes=20, **kwargs)
Plot the association between a set of cell annotations and a set of gene signatures.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs_keys |
the keys in self.adata.obs to use |
required | |
pathways |
DataFrame
|
a pandas DataFrame containing PROGENy pathways |
required |
**kwargs |
plotting keyword arguments |
{}
|
plot_signatures_scores(obs_keys, markers, top_genes=10, hierarchy=None, **kwargs)
Plot the association between a set of cell annotations and a set of gene signatures.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obs_keys |
Sequence[str]
|
the keys in self.adata.obs to use |
required |
markers |
Mapping[str, Sequence[str]]
|
a dictionary with keys corresponding to self.adata.obs[obs_keys] and values to gene lists |
required |
top_genes |
Optional[int]
|
number of genes to consider in the score computations |
10
|
hierarchy |
Optional[dict]
|
the polytree to restrict the associations to |
None
|
**kwargs |
plotting keyword arguments |
{}
|
scdef.iscDEF
Bases: scDEF
Informed scDEF model.
This model extends the basic scDEF by using gene sets to guide the factors. iscDEF can either set the given sets as top layer factors and learn higher-resolution structure, or use them as the lowest resolution and learn a hierarchy that relates them. All the methods from scDEF are available in iscDEF.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata |
AnnData
|
AnnData object containing the gene expression data. scDEF learns a model from counts, so they must be present in either adata.X or in adata.layers. |
required |
markers_dict |
Mapping[str, Sequence[str]]
|
dictionary containing named gene lists. |
required |
add_other |
Optional[bool]
|
whether to add factors for cells which don't express any of the sets in markers_dict. |
False
|
markers_layer |
Optional[int]
|
scDEF layer at which the gene sets are defined. If > 0, this defines the number of layers. |
0
|
n_factors_per_set |
Optional[int]
|
number of lower level factors per gene set. |
3
|
n_layers |
Optional[int]
|
default number of scDEF layers, including a top layer of size 1 if markers_layer is 0. |
4
|
cn_small_scale |
scale for low connectivity |
required | |
cn_big_scale |
scale for large connectivity |
required | |
cn_small_strength |
Optional[float]
|
strength for weak connectivity |
100.0
|
cn_big_strength |
Optional[float]
|
strength for large connectivity |
1.0
|
gs_small_scale |
Optional[float]
|
scale for genes not in set |
0.1
|
gs_big_scale |
Optional[float]
|
scale for genes in set |
10.0
|
marker_strength |
Optional[float]
|
strength for marker genes |
100.0
|
nonmarker_strength |
Optional[float]
|
strength for non-marker genes |
1.0
|
other_strength |
Optional[float]
|
strength for marker genes of other sets |
100.0
|
**kwargs |
keyword arguments for base scDEF. |
{}
|
Benchmarking utilities
scdef.benchmark.run_multiple_resolutions(method, ad, resolution_sweep, layer_prefix='h', **kwargs)
Run a clustering and gene signature learning method at multiple resolutions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
Callable
|
the function that runs the method. Must take a resolution parameter as argument and return a list containing at least an AnnData object, a matrix containing the latent space, and a list of genes per cluster. |
required |
ad |
AnnData
|
the data to run the method on. |
required |
resolution_sweep |
Sequence[float]
|
list of resolution parameters to use. |
required |
Returns:
Name | Type | Description |
---|---|---|
outs |
Mapping
|
dictionary containing all the outputs from the method across all resolutions. Keys: ["latents", "signatures", "assignments", "scores", "sizes", "simplified_hierarchy"] |