Skip to content

API Reference

Model classes

scdef.scDEF

Bases: object

Single-cell Deep Exponential Families (scDEF) model.

scDEF learns hierarchical, multi-level gene expression signatures from single-cell RNA-seq data provided in an AnnData object. This model can be used for a variety of analyses including dimensionality reduction, batch correction, clustering, and visualization of cell states and gene programs.

The model fits multiple layers of latent factors ("gene signatures") to describe cellular heterogeneity at different resolutions. It supports batch correction, prior specification, and generation of corrected gene expression matrices.

Model fitting, inference routines, and additional plotting utilities are implemented as methods of this class. The stored AnnData object is updated with model results during training.

Parameters:

Name Type Description Default
adata AnnData

AnnData object containing the single-cell gene expression count matrix. Counts should be present in either adata.X or in the specified adata.layers.

required
counts_layer Optional[str]

key for adata.layers specifying which layer to use as expression counts (if not adata.X).

None
batch_key Optional[str]

key in adata.obs containing batch annotations; if provided, batch correction is performed. If None or not found, no batch correction is used.

None
seed Optional[int]

random seed for model initialization and stochastic routines (uses JAX's pseudo-random number generator).

42
n_factors Optional[int]

number of latent factors at the lowest layer (can be overridden by layer_sizes).

100
decay_factor Optional[float]

size decay multiplier for the number of factors at each subsequent layer if layer_sizes not provided.

2.0
max_n_layers Optional[float]

maximum number of hierarchical layers in the model.

5
layer_sizes Optional[list]

explicit list of the number of factors in each scDEF layer. If None, layer sizes are set automatically.

None
layer_names Optional[list]

list of custom names for the layers. If None, layer names are enumerated as ["L0", "L1", ...].

None
logginglevel Optional[int]

verbosity level for the logger.

INFO
layer_concentration Optional[float]

concentration parameter of the top-level Dirichlet prior over cell usage of factors.

1.0
shrinkage_shape Optional[float]

shape parameter for shrinkage prior controlling factor usage.

1.0
shrinkage_rate Optional[float]

rate parameter for shrinkage prior controlling factor usage.

1.0
top_alpha Optional[float]

concentration parameter for the top layer Dirichlet prior over factor proportions.

1.0
factor_shape Optional[float]

shape of the prior distribution for factor-gene loadings matrix W.

0.1
brd_strength Optional[float]

BRD (Batch Relevance Determination) prior concentration parameter for factor relevance estimation.

1.0
brd_mean Optional[float]

mean of the BRD prior for factor relevance estimation.

1.0
use_brd Optional[bool]

if True, use BRD prior for automatic selection of active factors.

True
cell_scale_shape Optional[float]

precision/concentration parameter for cell-specific scaling priors.

1.0
gene_scale_shape Optional[float]

precision/concentration parameter for gene-specific scaling priors.

1.0
batch_cpal Optional[str]

default matplotlib color palette name used for batches.

'Dark2'
layer_cpal Optional[str]

matplotlib color palette for factors/colors at each scDEF layer.

'tab10'
lightness_mult Optional[float]

lightness multiplier to define the base color for each new scDEF layer.

0.15

attach_factors_to_obs(obs_key)

Attach factors to observation categories.

Parameters:

Name Type Description Default
obs_key str

key in model.adata.obs to use for attachment

required

Returns:

Type Description
List[List[str]]

list of attachment lists, one per layer

compute_factor_obs_assignment_fracs(layer_idx, factor_name, obs_key, obs_val)

Compute assignment fraction between a factor and an observation category.

Parameters:

Name Type Description Default
layer_idx int

layer index of the factor

required
factor_name str

name of the factor

required
obs_key str

key in model.adata.obs

required
obs_val str

value in obs_key to compute fraction with

required

Returns:

Type Description
float

assignment fraction value

compute_factor_obs_association_score(layer_idx, factor_name, obs_key, obs_val)

Compute association score between a factor and an observation category.

Parameters:

Name Type Description Default
layer_idx int

layer index of the factor

required
factor_name str

name of the factor

required
obs_key str

key in model.adata.obs

required
obs_val str

value in obs_key to compute association with

required

Returns:

Type Description
float

association score value

compute_factor_obs_weight_score(layer_idx, factor_name, obs_key, obs_val)

Compute weight score between a factor and an observation category.

Parameters:

Name Type Description Default
layer_idx int

layer index of the factor

required
factor_name str

name of the factor

required
obs_key str

key in model.adata.obs

required
obs_val str

value in obs_key to compute weight with

required

Returns:

Type Description
float

weight score value

compute_weight(upper_factor_name, lower_factor_name)

Compute the weight between two factors across any number of layers.

Parameters:

Name Type Description Default
upper_factor_name str

name of the upper factor

required
lower_factor_name str

name of the lower factor

required

Returns:

Type Description
float

weight value between the two factors

filter_factors(brd_min=1.0, ard_min=0.001, clarity_min=0.5, min_cells_upper=0.001, min_cells_lower=0.0, filter_up=True, annotate=True, upper_only=False)

Filter our irrelevant factors based on the BRD posterior or the cell attachments.

Parameters:

Name Type Description Default
thres

minimum factor BRD value

required
iqr_mult

multiplier of the difference between the third quartile and the median BRD values to set the threshold

required
min_cells_upper Optional[float]

minimum number of cells that each factor in upper layers must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value

0.001
min_cells_lower Optional[float]

minimum number of cells that each factor in layer 0 must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value

0.0
filter_up Optional[bool]

whether to remove factors in upper layers via inter-layer attachments

True
upper_only Optional[bool]

whether to only filter factors in upper layers

False

fit(nmf_init=True, max_cells_init=5000, n_rounds=1, **kwargs)

Fit scDEF, warm-starting from a previous fit when available.

On the first call, parameters are initialized from priors (or NMF if enabled). On subsequent calls, the model is re-initialized from the current posterior quantities and the current factor_lists, enabling a fit -> filter -> fit workflow. During refit, upper-layer sizes are clipped to respect decay_factor before rebuilding the hierarchy.

get_annotations(marker_reference, gene_rankings=None)

Get annotations for factors based on marker gene reference.

Parameters:

Name Type Description Default
marker_reference Mapping[str, Sequence[str]]

dictionary mapping annotation names to gene lists

required
gene_rankings Optional[List[List[str]]]

gene rankings for each factor, if None will be computed

None

Returns:

Type Description
List[List[str]]

list of annotation lists, one per factor

get_enrichments(libs=['KEGG_2019_Human'], gene_rankings=None)

Get gene set enrichments for factor signatures using gseapy.

Parameters:

Name Type Description Default
libs List[str]

list of gene set library names to use

['KEGG_2019_Human']
gene_rankings Optional[List[List[str]]]

gene rankings for each factor, if None will be computed

None

Returns:

Type Description
List[Any]

list of enrichment results, one per factor

get_layer_factor_orders()

Get the ordering of factors in each layer for plotting.

Returns:

Type Description
List[ndarray]

list of arrays, one per layer, containing factor indices in plotting order

get_nmf_init(max_cells=None)

Use NMF on the data to initialize the first layer and then recursively for the other layers.

Parameters:

Name Type Description Default
max_cells

maximum number of cells to use for NMF initialization

None

Returns:

Type Description

tuple of (init_z, init_W) initialization values

get_rankings(layer_idx=0, top_genes=None, genes=True, return_scores=False, sorted_scores=True, drop_factors=None)

Get gene or factor rankings for each factor in a layer.

Parameters:

Name Type Description Default
layer_idx int

layer index to get rankings for

0
top_genes Optional[int]

number of top genes/factors to return

None
genes bool

whether to return gene rankings (True) or factor rankings (False)

True
return_scores bool

whether to return scores along with rankings

False
sorted_scores bool

whether to return scores sorted by ranking

True
drop_factors Optional[List[str]]

list of factors to drop from rankings

None

Returns:

Type Description
Union[List[List[str]], Tuple[List[List[str]], List[List[float]]]]

list of rankings per factor, or tuple of (rankings, scores) if return_scores is True

get_relevances_dict()

Get dictionary of factor relevance scores.

Returns:

Type Description
Dict[str, float]

dictionary mapping factor names to relevance scores

get_signature_confidence(factor_idx, layer_idx, mc_samples=100, top_genes=10, pairwise=False)

Get confidence score for a factor signature using Monte Carlo sampling.

Parameters:

Name Type Description Default
factor_idx int

index of the factor

required
layer_idx int

layer index of the factor

required
mc_samples int

number of Monte Carlo samples to take

100
top_genes int

number of top genes to consider in each sample

10
pairwise bool

whether to compute pairwise Jaccard similarities

False

Returns:

Type Description
float

confidence score as Jaccard similarity

get_signature_sample(rng, factor_idx, layer_idx, top_genes=10, return_scores=False)

Get a single signature sample from the posterior for a factor.

Parameters:

Name Type Description Default
rng Any

JAX random number generator key

required
factor_idx int

index of the factor

required
layer_idx int

layer index of the factor

required
top_genes int

number of top genes to return

10
return_scores bool

whether to return scores along with gene names

False

Returns:

Type Description
Union[List[str], Tuple[List[str], ndarray]]

list of gene names, or tuple of (gene_names, scores) if return_scores is True

get_signatures_dict(top_genes=None, scores=False, sorted_scores=False, layer_normalize=False, drop_factors=None)

Get dictionary of gene signatures for all factors across all layers.

Parameters:

Name Type Description Default
top_genes Optional[int]

number of top genes per signature

None
scores bool

whether to return scores along with signatures

False
sorted_scores bool

whether to return scores sorted by ranking

False
layer_normalize bool

whether to normalize scores within each layer

False
drop_factors Optional[List[str]]

list of factors to exclude

None

Returns:

Type Description
Union[Dict[str, List[str]], Tuple[Dict[str, List[str]], Dict[str, ndarray]]]

dictionary mapping factor names to gene lists, or tuple of (signatures, scores) if scores is True

get_sizes_dict()

Get dictionary of factor sizes (number of cells per factor).

Returns:

Type Description
Dict[str, float]

dictionary mapping factor names to cell counts

get_summary(top_genes=10, reindex=True)

Get a text summary of the model factors and their top genes.

Parameters:

Name Type Description Default
top_genes int

number of top genes to show per factor

10
reindex bool

whether to reindex factors

True

Returns:

Type Description
str

string summary of the model

identify_mixture_factors(max_n_genes=20, thres=0.5)

Identify factors that might be better if broken apart.

Parameters:

Name Type Description Default
max_n_genes int

maximum number of genes per factor

20
thres float

threshold for identifying mixture factors

0.5

Returns:

Type Description
ndarray

array of factor indices that are mixture factors

make_corrected_data(layer_name='scdef_corrected')

Compute and store the low-rank reconstruction of the UMI count matrix.

The reconstructed matrix is saved to adata.layers[layer_name], providing a denoised, batch-corrected version of the expression data.

Parameters:

Name Type Description Default
layer_name str

name for the AnnData layer where the reconstructed matrix is stored

'scdef_corrected'

update_model_size(max_n_factors, max_n_layers=None, layer_sizes=None)

Update latent hierarchy dimensions.

Parameters:

Name Type Description Default
max_n_factors

maximum number of factors when layer_sizes is not provided.

required
max_n_layers

maximum number of layers when layer_sizes is not provided.

None
layer_sizes

explicit per-layer sizes. If provided, sizes are sanitized to be non-increasing and consecutive duplicates are collapsed.

None

scdef.iscDEF

Bases: scDEF

Informed Single-cell Deep Exponential Families (iscDEF) model.

iscDEF extends the scDEF framework by incorporating prior biological knowledge in the form of gene sets ("markers"). This model can guide the discovery of factors along known biology, either by using gene sets as the highest-resolution (top) factors and learning finer substructure beneath them or as the coarsest layer to learn how they relate hierarchically.

All methods and functionality available in scDEF are inherited by iscDEF. Additional logic allows for flexible integration of marker sets at a chosen model layer, custom prior settings for marker versus non-marker genes, and automatic handling of cells/gene sets that do not fall into any marker category (via the add_other option).

Parameters:

Name Type Description Default
adata AnnData

AnnData object containing the gene expression count matrix. Counts must be present in either adata.X or a specified layer.

required
markers_dict Mapping[str, Sequence[str]]

dictionary mapping marker/factor names to gene lists (gene sets). These guide the formation of factors in the chosen layer.

required
add_other Optional[int]

if > 0, adds one or more "other" factors for cells/observations not matching any marker set. Only one "other" factor is supported for markers_layer > 0.

0
markers_layer Optional[int]

index of the layer at which gene sets are enforced as factors (0 = lowest/finest, higher = top layer). If > 0, total layers determined by this value.

0
cn_small_mean Optional[float]

mean prior connectivity for "small" (weakly-connected) genes between factors and gene sets.

0.01
cn_big_mean Optional[float]

mean prior connectivity for "big" (strongly-connected) genes between factors and gene sets.

1.0
cn_small_strength Optional[float]

concentration parameter for low connectivity (see scDEF prior specification).

1.0
cn_big_strength Optional[float]

concentration parameter for high connectivity.

0.1
gs_small_scale Optional[float]

scale parameter for genes not in the marker gene set.

1.0
gs_big_scale Optional[float]

scale parameter for genes in the marker gene set (encourages large factor loadings).

100.0
marker_strength Optional[float]

multiplier for the prior strength for marker genes.

10.0
nonmarker_strength Optional[float]

multiplier for non-marker gene prior strength.

0.1
other_strength Optional[float]

prior strength for marker genes belonging to "other" sets.

0.1
**kwargs

additional arguments passed to the scDEF base model.

{}

filter_factors(brd_min=1.0, ard_min=0.001, clarity_min=0.5, min_cells_upper=0.001, min_cells_lower=0.0, filter_up=True, annotate=True, upper_only=False)

Filter factors while preserving existing marker-based factor names.

This override keeps the base filtering behavior but restores names by subsetting the previous factor_names. This avoids marker-prefix relabeling across filter/refit workflows.

fit(nmf_init=False, max_cells_init=1024, z_init_concentration=100.0, **kwargs)

Fit iscDEF, warm-starting from previous fit when available.

On refit, all layers are initialized from the previous posterior means (z and W), while BRD/ARD are initialized from layer 0. Existing marker-aware names are preserved through the refit path.

Tools

scdef.tl

Tooling utilities for scDEF.

compute_hierarchy_scores(model, use_filtered=False, filter_upper_layers=True, factor_weight='uniform', eps=1e-12)

Compute per-factor and global hierarchy scores from learned W matrices.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
use_filtered bool

whether to use model.factor_lists / model.factor_names

False
filter_upper_layers bool

when use_filtered is False, whether to still use filtered factors for layers > 0 (both as parents and children)

True
factor_weight str

weighting scheme for factors, either "uniform" or "usage"

'uniform'
eps float

small epsilon value for numerical stability

1e-12

Returns:

Type Description
Dict[str, Any]

dict containing per_factor DataFrame, per_transition DataFrame, global_score, and global_ambiguity

factor_diagnostics(model, recompute=False)

Compute/store factor diagnostics in model.adata.uns['factor_obs'].

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
recompute bool

if True, force recomputation of the cached fixed upper-layer factor subset used for clarity scores, even if the fit revision did not change.

False

get_hierarchy(model, simplified=True, drop_factors=None)

Get a dictionary containing the polytree contained in the scDEF graph.

Parameters:

Name Type Description Default
simplified Optional[bool]

whether to collapse single-child nodes

True
drop_factors Optional[Sequence[str]]

factors to drop from the hierarchy

None

Returns: hierarchy: the dictionary containing the hierarchy

make_biological_hierarchy(model)

Make the biological hierarchy of the model.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required

Returns:

Name Type Description
biological_hierarchy Dict[str, Sequence[str]]

dictionary containing the biological hierarchy

make_hierarchies(model)

Store the biological and technical hierarchies of the model.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required

make_technical_hierarchy(model)

Make the technical hierarchy of the model.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required

Returns:

Name Type Description
technical_hierarchy Dict[str, Sequence[str]]

dictionary containing the technical hierarchy

set_technical_factors(model, factors=None)

Set the technical factors of the model.

Technical factors must be layer 0 factors.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
factors Optional[Sequence[str]]

list of factor names to mark as technical

None

umap(model, layers=None, use_log=False, metric='euclidean')

Compute UMAP embeddings for each scDEF layer.

The resulting embeddings are stored in model.adata.obsm[f"X_umap_{layer_name}"] for each layer.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
layers Optional[List[int]]

which layers to compute UMAPs for. If None, all layers with more than one factor are used (in descending order).

None
use_log bool

whether to use log-transformed cell-factor weights for the neighbor graph computation.

False
metric str

distance metric for neighbors computation.

'euclidean'

Plotting

scdef.pl

Plotting utilities for scDEF.

biological_hierarchy(model, **kwargs)

Plot the biological hierarchy of the model.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
**kwargs Any

keyword arguments passed to make_graph

{}

Returns:

Type Description
Graph

Graphviz Graph object

cell_entropies(model, thres=0.9, show=True)

Plot cell entropies and factor numbers across layers.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
thres float

Threshold for cumulative sum calculation

0.9
show bool

Whether to show the plot

True

continuous_obs_scores(model, obs_keys, mode='correlations', vmax=None, vmin=None, **kwargs)

Plot the correlations between a set of cell annotations and factors.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_keys Sequence[str]

the keys in model.adata.obs to use

required
mode Literal['correlations']

how to compute scores

'correlations'
**kwargs Any

plotting keyword arguments

{}

factor_diagnostics(model, brd_min=1.0, ard_min=0.001, clarity_min=0.5, figsize=(6, 4), ax=None, annotate_factors=False, annotation_fontsize=8, annotation_alpha=0.8, show=True)

Diagnostic scatter plot of factors: BRD vs Effective parents colored by ARD.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
brd_min float

minimum BRD filter threshold

1.0
ard_min float

minimum ARD filter threshold (fraction of total ARD)

0.001
clarity

clarity threshold for effective parents calculation

required
figsize tuple

Figure size (if ax is None)

(6, 4)
ax Optional[Axes]

matplotlib Axes to plot on

None
annotate_factors bool

whether to annotate each point with its factor label

False
annotation_fontsize int

fontsize for factor text annotations

8
annotation_alpha float

alpha value for factor text annotations

0.8
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Axes]

Axes object if show is False, None otherwise.

factor_genes(model, thres=0.9, show=True)

Plot number of genes in factors across layers.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
thres float

threshold for cumulative sum calculation

0.9
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Figure]

Figure object if show is False, None otherwise

factor_gini(model, idx, thres=0.9, show=True)

Plot Gini coefficient for a specific factor.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
idx int

Factor index to plot

required
thres float

Threshold for cumulative sum calculation

0.9
show bool

Whether to show the plot

True

factors_bars(model, obs_keys, sort_layer_factors=True, orders=None, sharey=True, layers=None, vmax=None, vmin=None, fontsize=12, title_fontsize=12, legend_fontsize=8, figsize=(10, 4), total=False, show=True)

Plot factor scores as bar charts.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_keys Union[str, List[str]]

observation keys to plot

required
sort_layer_factors bool

whether to sort factors by layer

True
orders Optional[List[ndarray]]

custom factor orders

None
sharey bool

whether to share y-axis across subplots

True
layers Optional[List[int]]

which layers to plot

None
vmax Optional[float]

maximum value for y-axis

None
vmin Optional[float]

minimum value for y-axis

None
fontsize int

font size for labels

12
title_fontsize int

title font size

12
legend_fontsize int

legend font size

8
figsize Tuple[float, float]

figure size

(10, 4)
total bool

whether to plot total scores

False
show bool

whether to show the plot

True

gini_brd(model, normalize=False, figsize=(4, 4), alpha=0.6, fontsize=12, legend_fontsize=10, show=True, ax=None)

Plot Gini coefficient vs BRD scores.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
normalize bool

whether to normalize BRD scores

False
figsize Tuple[float, float]

figure size

(4, 4)
alpha float

transparency level

0.6
fontsize int

font size for labels

12
legend_fontsize int

font size for legend

10
show bool

whether to show the plot

True
ax Optional[Axes]

matplotlib axes to plot on

None

Returns:

Type Description
Optional[Axes]

Axes object if show is False, None otherwise

layers_obs(model, obs_keys, obs_mats, obs_clusters, obs_vals_dict, sort_layer_factors=True, orders=None, layers=None, vmax=None, vmin=None, cb_title='', cb_title_fontsize=10, fontsize=12, title_fontsize=12, pad=0.1, shrink=0.7, figsize=(10, 4), xticks_rotation=90.0, cmap=None, show=True, rasterized=False, **kwargs)

Plot observation matrices across layers.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_keys Union[str, List[str]]

observation keys to plot

required
obs_mats Dict[str, Dict[int, ndarray]]

observation matrices dictionary

required
sort_layer_factors bool

whether to sort factors by layer

True
orders Optional[List[ndarray]]

custom factor orders

None
layers Optional[List[int]]

which layers to plot

None
vmax Optional[float]

maximum value for colormap

None
vmin Optional[float]

minimum value for colormap

None
cb_title str

colorbar title

''
cb_title_fontsize int

colorbar title font size

10
fontsize int

font size for labels

12
title_fontsize int

title font size

12
pad float

padding for colorbar

0.1
shrink float

shrink factor for colorbar

0.7
figsize Tuple[float, float]

figure size

(10, 4)
xticks_rotation float

rotation angle for x-axis ticks

90.0
cmap Optional[str]

colormap name

None
show bool

whether to show the plot

True
rasterized bool

whether to rasterize the plot

False
**kwargs Any

additional plotting keyword arguments

{}

loss(model, figsize=(4, 4), fontsize=12, ax=None, show=True)

Plot training loss over epochs.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
figsize Tuple[float, float]

figure size

(4, 4)
fontsize int

font size for labels

12
ax Optional[Axes]

matplotlib axes to plot on

None
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Axes]

Axes object if show is False, None otherwise

make_graph(model, hierarchy=None, show_all=False, factor_annotations=None, top_factor=None, show_signatures=True, drop_factors=None, root_signature=None, root_ranking=None, enrichments=None, top_genes=None, show_batch_counts=False, filled=None, wedged=None, assignments=True, color_edges=True, show_confidences=False, mc_samples=100, n_cells_label=False, n_cells=False, node_size_max=2.0, node_size_min=0.05, scale_level=False, show_label=True, gene_score=None, gene_cmap='viridis', shell=False, r=2.0, r_decay=0.8, **fontsize_kwargs)

Make Graphviz-formatted scDEF graph.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
hierarchy Optional[Dict[str, Sequence[str]]]

dictionary containing the polytree to draw instead of the whole graph

None
show_all Optional[bool]

whether to show all factors even post filtering

False
factor_annotations Optional[Dict[str, str]]

factor annotations to include in the node labels

None
top_factor Optional[str]

only include factors below this factor

None
show_signatures Optional[bool]

whether to show the ranked gene signatures in the node labels

True
drop_factors Optional[List[str]]

list of factors to drop from the graph

None
root_signature Optional[List[str]]

root signature to display

None
root_ranking Optional[List[str]]

root ranking to display

None
enrichments Optional[DataFrame]

enrichment results from gseapy to include in the node labels

None
top_genes Optional[Union[int, List[int]]]

number of genes from each signature to be shown in the node labels

None
show_batch_counts Optional[bool]

whether to show the number of cells from each batch that attach to each factor

False
filled Optional[Union[str, Dict[str, float]]]

key from model.adata.obs to use to fill the nodes with, or dictionary of factor scores

None
wedged Optional[str]

key from model.adata.obs to use to wedge the nodes with

None
assignments Optional[bool]

whether to use the assignments of cells to factors to wedge the nodes, rather than the scores

True
color_edges Optional[bool]

whether to color the graph edges according to the upper factors

True
show_confidences Optional[bool]

whether to show the confidence score for each signature

False
mc_samples Optional[int]

number of Monte Carlo samples to take from the posterior to compute signature confidences

100
n_cells_label Optional[bool]

whether to show the number of cells that attach to the factor

False
n_cells Optional[bool]

whether to scale the node sizes by the number of cells that attach to the factor

False
node_size_max Optional[float]

maximum node size when scaled by cell numbers

2.0
node_size_min Optional[float]

minimum node size when scaled by cell numbers

0.05
scale_level Optional[bool]

whether to scale node sizes per level instead of across all levels

False
show_label Optional[bool]

whether to show labels on nodes

True
gene_score Optional[str]

color the nodes by the score they attribute to a gene, normalized by layer. Overrides filled and wedged

None
gene_cmap Optional[str]

colormap to use for gene_score

'viridis'
shell Optional[bool]

whether to use shell layout

False
r Optional[float]

radius parameter for shell layout

2.0
r_decay Optional[float]

radius decay parameter for shell layout

0.8
**fontsize_kwargs Any

keyword arguments to adjust the fontsizes according to the gene scores

{}

Returns:

Type Description
Graph

Graphviz Graph object

multilevel_paga(model, neighbors_rep='X_L0', layers=None, figsize=(16, 4), reuse_pos=True, fontsize=12, show=True, **paga_kwargs)

Plot a PAGA graph from each scDEF layer.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
neighbors_rep Optional[str]

the model.obsm key to use to compute the PAGA graphs

'X_L0'
layers Optional[List[int]]

which layers to plot

None
figsize Optional[Tuple[float, float]]

figure size

(16, 4)
reuse_pos Optional[bool]

whether to initialize each PAGA graph with the graph from the layer above

True
fontsize Optional[int]

font size for labels

12
show Optional[bool]

whether to show the plot

True
**paga_kwargs Any

keyword arguments to adjust the PAGA layouts

{}

obs_factor_dotplot(model, obs_key, layer_idx, cluster_rows=True, cluster_cols=True, figsize=(8, 2), s_min=100, s_max=500, titlesize=12, labelsize=12, legend_fontsize=12, legend_titlesize=12, cmap='viridis', logged=False, width_ratios=[5, 1, 1], show_ylabel=True, show=True)

Plot dotplot showing factor assignments for observations.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_key str

key in model.adata.obs to use for grouping

required
layer_idx int

layer index to plot

required
cluster_rows bool

whether to cluster rows

True
cluster_cols bool

whether to cluster columns

True
figsize Tuple[float, float]

figure size

(8, 2)
s_min int

minimum circle size

100
s_max int

maximum circle size

500
titlesize int

title font size

12
labelsize int

label font size

12
legend_fontsize int

legend font size

12
legend_titlesize int

legend title font size

12
cmap str

colormap name

'viridis'
logged bool

whether to log transform colors

False
width_ratios List[float]

width ratios for subplots

[5, 1, 1]
show_ylabel bool

whether to show y-axis label

True
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Figure]

Figure object if show is False, None otherwise

obs_scores(model, obs_keys, hierarchy=None, mode='fracs', vmax=None, vmin=None, **kwargs)

Plot the association between a set of cell annotations and factors.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_keys Sequence[str]

the keys in model.adata.obs to use

required
hierarchy Optional[Dict[str, Sequence[str]]]

the polytree to restrict the associations to

None
mode Literal['f1', 'fracs', 'weights']

whether to compute scores based on assignments or weights

'fracs'
**kwargs Any

plotting keyword arguments

{}

pathway_scores(model, pathways, top_genes=20, **kwargs)

Plot the association between a set of cell annotations and a set of gene signatures.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
pathways DataFrame

a pandas DataFrame containing PROGENy pathways

required
top_genes Optional[int]

number of top genes to consider

20
**kwargs Any

plotting keyword arguments

{}

relevance(model, mode='brd', thres=None, iqr_mult=None, show_yticks=False, scale='linear', normalize=False, fontsize=14, legend_fontsize=12, xlabel='Factor', ylabel='Relevance', color=False, show=True, ax=None, **kwargs)

Plot relevance determination scores.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
mode Literal['brd', 'ard']

mode to plot, either "brd" or "ard"

'brd'
thres Optional[float]

threshold value for relevance cutoff

None
iqr_mult Optional[float]

multiplier for IQR-based threshold

None
show_yticks bool

whether to show y-axis ticks

False
scale Literal['linear', 'log']

scale for y-axis, either "linear" or "log"

'linear'
normalize bool

whether to normalize relevance scores

False
fontsize int

font size for labels

14
legend_fontsize int

font size for legend

12
xlabel str

label for x-axis

'Factor'
ylabel str

label for y-axis

'Relevance'
color bool

whether to color bars by factor type

False
show bool

whether to show the plot

True
ax Optional[Axes]

matplotlib axes to plot on

None
**kwargs Any

additional plotting keyword arguments

{}

Returns:

Type Description
Optional[Axes]

Axes object if show is False, None otherwise

scale(model, scale_type, figsize=(4, 4), alpha=0.6, fontsize=12, legend_fontsize=10, ax=None, show=True)

Plot learned scale factors vs observed scales.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
scale_type Literal['cell', 'gene']

type of scale to plot, either "cell" or "gene"

required
figsize Tuple[float, float]

figure size

(4, 4)
alpha float

transparency level

0.6
fontsize int

font size for labels

12
legend_fontsize int

font size for legend

10
ax Optional[Axes]

matplotlib axes to plot on

None
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Axes]

Axes object if show is False, None otherwise

scales(model, figsize=(8, 4), alpha=0.6, fontsize=12, legend_fontsize=10, show=True)

Plot both cell and gene scales.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
figsize Tuple[float, float]

figure size

(8, 4)
alpha float

transparency level

0.6
fontsize int

font size for labels

12
legend_fontsize int

font size for legend

10
show bool

whether to show the plot

True

Returns:

Type Description
Optional[Figure]

Figure object if show is False, None otherwise

signatures_scores(model, obs_keys, markers, top_genes=10, hierarchy=None, **kwargs)

Plot the association between a set of cell annotations and a set of gene signatures.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
obs_keys Sequence[str]

the keys in model.adata.obs to use

required
markers Mapping[str, Sequence[str]]

a dictionary with keys corresponding to model.adata.obs[obs_keys] and values to gene lists

required
top_genes Optional[int]

number of genes to consider in the score computations

10
hierarchy Optional[Dict[str, Sequence[str]]]

the polytree to restrict the associations to

None
**kwargs Any

plotting keyword arguments

{}

technical_hierarchy(model, show_signatures=True, **kwargs)

Plot the technical hierarchy of the model.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
show_signatures bool

whether to show gene signatures

True
**kwargs Any

keyword arguments passed to make_graph

{}

Returns:

Type Description
Graph

Graphviz Graph object

umap(model, color=[], layers=None, figsize=(16, 4), fontsize=12, legend_fontsize=10, rasterized=True, n_legend_cols=1, factor_subset=None, show=True)

Plot pre-computed UMAPs for different layers.

UMAP embeddings must have been computed first via scdef.tl.umap.

Parameters:

Name Type Description Default
model scDEF

scDEF model instance

required
color Union[str, List[str]]

color key(s) to use for coloring

[]
layers Optional[List[int]]

which layers to plot

None
figsize Tuple[float, float]

figure size

(16, 4)
fontsize int

font size for labels

12
legend_fontsize int

legend font size

10
rasterized bool

whether to rasterize the plot

True
n_legend_cols int

number of columns in legend

1
factor_subset Optional[List[str]]

subset of factors to plot

None
show bool

whether to show the plot

True