API Reference
Model classes
scdef.scDEF
Bases: object
Single-cell Deep Exponential Families (scDEF) model.
scDEF learns hierarchical, multi-level gene expression signatures from single-cell RNA-seq data provided in an AnnData object. This model can be used for a variety of analyses including dimensionality reduction, batch correction, clustering, and visualization of cell states and gene programs.
The model fits multiple layers of latent factors ("gene signatures") to describe cellular heterogeneity at different resolutions. It supports batch correction, prior specification, and generation of corrected gene expression matrices.
Model fitting, inference routines, and additional plotting utilities are implemented as methods of this class. The stored AnnData object is updated with model results during training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object containing the single-cell gene expression count matrix. Counts
should be present in either |
required |
counts_layer
|
Optional[str]
|
key for |
None
|
batch_key
|
Optional[str]
|
key in |
None
|
seed
|
Optional[int]
|
random seed for model initialization and stochastic routines (uses JAX's pseudo-random number generator). |
42
|
n_factors
|
Optional[int]
|
number of latent factors at the lowest layer (can be overridden by |
100
|
decay_factor
|
Optional[float]
|
size decay multiplier for the number of factors at each subsequent layer if |
2.0
|
max_n_layers
|
Optional[float]
|
maximum number of hierarchical layers in the model. |
5
|
layer_sizes
|
Optional[list]
|
explicit list of the number of factors in each scDEF layer. If None, layer sizes are set automatically. |
None
|
layer_names
|
Optional[list]
|
list of custom names for the layers. If None, layer names are enumerated as ["L0", "L1", ...]. |
None
|
logginglevel
|
Optional[int]
|
verbosity level for the logger. |
INFO
|
layer_concentration
|
Optional[float]
|
concentration parameter of the top-level Dirichlet prior over cell usage of factors. |
1.0
|
shrinkage_shape
|
Optional[float]
|
shape parameter for shrinkage prior controlling factor usage. |
1.0
|
shrinkage_rate
|
Optional[float]
|
rate parameter for shrinkage prior controlling factor usage. |
1.0
|
top_alpha
|
Optional[float]
|
concentration parameter for the top layer Dirichlet prior over factor proportions. |
1.0
|
factor_shape
|
Optional[float]
|
shape of the prior distribution for factor-gene loadings matrix W. |
0.1
|
brd_strength
|
Optional[float]
|
BRD (Batch Relevance Determination) prior concentration parameter for factor relevance estimation. |
1.0
|
brd_mean
|
Optional[float]
|
mean of the BRD prior for factor relevance estimation. |
1.0
|
use_brd
|
Optional[bool]
|
if True, use BRD prior for automatic selection of active factors. |
True
|
cell_scale_shape
|
Optional[float]
|
precision/concentration parameter for cell-specific scaling priors. |
1.0
|
gene_scale_shape
|
Optional[float]
|
precision/concentration parameter for gene-specific scaling priors. |
1.0
|
batch_cpal
|
Optional[str]
|
default matplotlib color palette name used for batches. |
'Dark2'
|
layer_cpal
|
Optional[str]
|
matplotlib color palette for factors/colors at each scDEF layer. |
'tab10'
|
lightness_mult
|
Optional[float]
|
lightness multiplier to define the base color for each new scDEF layer. |
0.15
|
attach_factors_to_obs(obs_key)
Attach factors to observation categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs_key
|
str
|
key in model.adata.obs to use for attachment |
required |
Returns:
| Type | Description |
|---|---|
List[List[str]]
|
list of attachment lists, one per layer |
compute_factor_obs_assignment_fracs(layer_idx, factor_name, obs_key, obs_val)
Compute assignment fraction between a factor and an observation category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer_idx
|
int
|
layer index of the factor |
required |
factor_name
|
str
|
name of the factor |
required |
obs_key
|
str
|
key in model.adata.obs |
required |
obs_val
|
str
|
value in obs_key to compute fraction with |
required |
Returns:
| Type | Description |
|---|---|
float
|
assignment fraction value |
compute_factor_obs_association_score(layer_idx, factor_name, obs_key, obs_val)
Compute association score between a factor and an observation category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer_idx
|
int
|
layer index of the factor |
required |
factor_name
|
str
|
name of the factor |
required |
obs_key
|
str
|
key in model.adata.obs |
required |
obs_val
|
str
|
value in obs_key to compute association with |
required |
Returns:
| Type | Description |
|---|---|
float
|
association score value |
compute_factor_obs_weight_score(layer_idx, factor_name, obs_key, obs_val)
Compute weight score between a factor and an observation category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer_idx
|
int
|
layer index of the factor |
required |
factor_name
|
str
|
name of the factor |
required |
obs_key
|
str
|
key in model.adata.obs |
required |
obs_val
|
str
|
value in obs_key to compute weight with |
required |
Returns:
| Type | Description |
|---|---|
float
|
weight score value |
compute_weight(upper_factor_name, lower_factor_name)
Compute the weight between two factors across any number of layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
upper_factor_name
|
str
|
name of the upper factor |
required |
lower_factor_name
|
str
|
name of the lower factor |
required |
Returns:
| Type | Description |
|---|---|
float
|
weight value between the two factors |
filter_factors(brd_min=1.0, ard_min=0.001, clarity_min=0.5, min_cells_upper=0.001, min_cells_lower=0.0, filter_up=True, annotate=True, upper_only=False)
Filter our irrelevant factors based on the BRD posterior or the cell attachments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
thres
|
minimum factor BRD value |
required | |
iqr_mult
|
multiplier of the difference between the third quartile and the median BRD values to set the threshold |
required | |
min_cells_upper
|
Optional[float]
|
minimum number of cells that each factor in upper layers must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value |
0.001
|
min_cells_lower
|
Optional[float]
|
minimum number of cells that each factor in layer 0 must have attached to it for it to be kept. If between 0 and 1, fraction. Otherwise, absolute value |
0.0
|
filter_up
|
Optional[bool]
|
whether to remove factors in upper layers via inter-layer attachments |
True
|
upper_only
|
Optional[bool]
|
whether to only filter factors in upper layers |
False
|
fit(nmf_init=True, max_cells_init=5000, n_rounds=1, **kwargs)
Fit scDEF, warm-starting from a previous fit when available.
On the first call, parameters are initialized from priors (or NMF if enabled).
On subsequent calls, the model is re-initialized from the current posterior
quantities and the current factor_lists, enabling a fit -> filter -> fit
workflow. During refit, upper-layer sizes are clipped to respect
decay_factor before rebuilding the hierarchy.
get_annotations(marker_reference, gene_rankings=None)
Get annotations for factors based on marker gene reference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
marker_reference
|
Mapping[str, Sequence[str]]
|
dictionary mapping annotation names to gene lists |
required |
gene_rankings
|
Optional[List[List[str]]]
|
gene rankings for each factor, if None will be computed |
None
|
Returns:
| Type | Description |
|---|---|
List[List[str]]
|
list of annotation lists, one per factor |
get_enrichments(libs=['KEGG_2019_Human'], gene_rankings=None)
Get gene set enrichments for factor signatures using gseapy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
libs
|
List[str]
|
list of gene set library names to use |
['KEGG_2019_Human']
|
gene_rankings
|
Optional[List[List[str]]]
|
gene rankings for each factor, if None will be computed |
None
|
Returns:
| Type | Description |
|---|---|
List[Any]
|
list of enrichment results, one per factor |
get_layer_factor_orders()
Get the ordering of factors in each layer for plotting.
Returns:
| Type | Description |
|---|---|
List[ndarray]
|
list of arrays, one per layer, containing factor indices in plotting order |
get_nmf_init(max_cells=None)
Use NMF on the data to initialize the first layer and then recursively for the other layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_cells
|
maximum number of cells to use for NMF initialization |
None
|
Returns:
| Type | Description |
|---|---|
|
tuple of (init_z, init_W) initialization values |
get_rankings(layer_idx=0, top_genes=None, genes=True, return_scores=False, sorted_scores=True, drop_factors=None)
Get gene or factor rankings for each factor in a layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer_idx
|
int
|
layer index to get rankings for |
0
|
top_genes
|
Optional[int]
|
number of top genes/factors to return |
None
|
genes
|
bool
|
whether to return gene rankings (True) or factor rankings (False) |
True
|
return_scores
|
bool
|
whether to return scores along with rankings |
False
|
sorted_scores
|
bool
|
whether to return scores sorted by ranking |
True
|
drop_factors
|
Optional[List[str]]
|
list of factors to drop from rankings |
None
|
Returns:
| Type | Description |
|---|---|
Union[List[List[str]], Tuple[List[List[str]], List[List[float]]]]
|
list of rankings per factor, or tuple of (rankings, scores) if return_scores is True |
get_relevances_dict()
Get dictionary of factor relevance scores.
Returns:
| Type | Description |
|---|---|
Dict[str, float]
|
dictionary mapping factor names to relevance scores |
get_signature_confidence(factor_idx, layer_idx, mc_samples=100, top_genes=10, pairwise=False)
Get confidence score for a factor signature using Monte Carlo sampling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
factor_idx
|
int
|
index of the factor |
required |
layer_idx
|
int
|
layer index of the factor |
required |
mc_samples
|
int
|
number of Monte Carlo samples to take |
100
|
top_genes
|
int
|
number of top genes to consider in each sample |
10
|
pairwise
|
bool
|
whether to compute pairwise Jaccard similarities |
False
|
Returns:
| Type | Description |
|---|---|
float
|
confidence score as Jaccard similarity |
get_signature_sample(rng, factor_idx, layer_idx, top_genes=10, return_scores=False)
Get a single signature sample from the posterior for a factor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rng
|
Any
|
JAX random number generator key |
required |
factor_idx
|
int
|
index of the factor |
required |
layer_idx
|
int
|
layer index of the factor |
required |
top_genes
|
int
|
number of top genes to return |
10
|
return_scores
|
bool
|
whether to return scores along with gene names |
False
|
Returns:
| Type | Description |
|---|---|
Union[List[str], Tuple[List[str], ndarray]]
|
list of gene names, or tuple of (gene_names, scores) if return_scores is True |
get_signatures_dict(top_genes=None, scores=False, sorted_scores=False, layer_normalize=False, drop_factors=None)
Get dictionary of gene signatures for all factors across all layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
top_genes
|
Optional[int]
|
number of top genes per signature |
None
|
scores
|
bool
|
whether to return scores along with signatures |
False
|
sorted_scores
|
bool
|
whether to return scores sorted by ranking |
False
|
layer_normalize
|
bool
|
whether to normalize scores within each layer |
False
|
drop_factors
|
Optional[List[str]]
|
list of factors to exclude |
None
|
Returns:
| Type | Description |
|---|---|
Union[Dict[str, List[str]], Tuple[Dict[str, List[str]], Dict[str, ndarray]]]
|
dictionary mapping factor names to gene lists, or tuple of (signatures, scores) if scores is True |
get_sizes_dict()
Get dictionary of factor sizes (number of cells per factor).
Returns:
| Type | Description |
|---|---|
Dict[str, float]
|
dictionary mapping factor names to cell counts |
get_summary(top_genes=10, reindex=True)
Get a text summary of the model factors and their top genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
top_genes
|
int
|
number of top genes to show per factor |
10
|
reindex
|
bool
|
whether to reindex factors |
True
|
Returns:
| Type | Description |
|---|---|
str
|
string summary of the model |
identify_mixture_factors(max_n_genes=20, thres=0.5)
Identify factors that might be better if broken apart.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_n_genes
|
int
|
maximum number of genes per factor |
20
|
thres
|
float
|
threshold for identifying mixture factors |
0.5
|
Returns:
| Type | Description |
|---|---|
ndarray
|
array of factor indices that are mixture factors |
make_corrected_data(layer_name='scdef_corrected')
Compute and store the low-rank reconstruction of the UMI count matrix.
The reconstructed matrix is saved to adata.layers[layer_name], providing a denoised, batch-corrected version of the expression data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer_name
|
str
|
name for the AnnData layer where the reconstructed matrix is stored |
'scdef_corrected'
|
update_model_size(max_n_factors, max_n_layers=None, layer_sizes=None)
Update latent hierarchy dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_n_factors
|
maximum number of factors when |
required | |
max_n_layers
|
maximum number of layers when |
None
|
|
layer_sizes
|
explicit per-layer sizes. If provided, sizes are sanitized to be non-increasing and consecutive duplicates are collapsed. |
None
|
scdef.iscDEF
Bases: scDEF
Informed Single-cell Deep Exponential Families (iscDEF) model.
iscDEF extends the scDEF framework by incorporating prior biological knowledge in the form of gene sets ("markers"). This model can guide the discovery of factors along known biology, either by using gene sets as the highest-resolution (top) factors and learning finer substructure beneath them or as the coarsest layer to learn how they relate hierarchically.
All methods and functionality available in scDEF are inherited by iscDEF. Additional logic allows for flexible
integration of marker sets at a chosen model layer, custom prior settings for marker versus non-marker genes,
and automatic handling of cells/gene sets that do not fall into any marker category (via the add_other option).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object containing the gene expression count matrix. Counts must be present
in either |
required |
markers_dict
|
Mapping[str, Sequence[str]]
|
dictionary mapping marker/factor names to gene lists (gene sets). These guide the formation of factors in the chosen layer. |
required |
add_other
|
Optional[int]
|
if > 0, adds one or more "other" factors for cells/observations not matching any
marker set. Only one "other" factor is supported for |
0
|
markers_layer
|
Optional[int]
|
index of the layer at which gene sets are enforced as factors (0 = lowest/finest, higher = top layer). If > 0, total layers determined by this value. |
0
|
cn_small_mean
|
Optional[float]
|
mean prior connectivity for "small" (weakly-connected) genes between factors and gene sets. |
0.01
|
cn_big_mean
|
Optional[float]
|
mean prior connectivity for "big" (strongly-connected) genes between factors and gene sets. |
1.0
|
cn_small_strength
|
Optional[float]
|
concentration parameter for low connectivity (see scDEF prior specification). |
1.0
|
cn_big_strength
|
Optional[float]
|
concentration parameter for high connectivity. |
0.1
|
gs_small_scale
|
Optional[float]
|
scale parameter for genes not in the marker gene set. |
1.0
|
gs_big_scale
|
Optional[float]
|
scale parameter for genes in the marker gene set (encourages large factor loadings). |
100.0
|
marker_strength
|
Optional[float]
|
multiplier for the prior strength for marker genes. |
10.0
|
nonmarker_strength
|
Optional[float]
|
multiplier for non-marker gene prior strength. |
0.1
|
other_strength
|
Optional[float]
|
prior strength for marker genes belonging to "other" sets. |
0.1
|
**kwargs
|
additional arguments passed to the scDEF base model. |
{}
|
filter_factors(brd_min=1.0, ard_min=0.001, clarity_min=0.5, min_cells_upper=0.001, min_cells_lower=0.0, filter_up=True, annotate=True, upper_only=False)
Filter factors while preserving existing marker-based factor names.
This override keeps the base filtering behavior but restores names by
subsetting the previous factor_names. This avoids marker-prefix
relabeling across filter/refit workflows.
fit(nmf_init=False, max_cells_init=1024, z_init_concentration=100.0, **kwargs)
Fit iscDEF, warm-starting from previous fit when available.
On refit, all layers are initialized from the previous posterior means
(z and W), while BRD/ARD are initialized from layer 0. Existing
marker-aware names are preserved through the refit path.
Tools
scdef.tl
Tooling utilities for scDEF.
compute_hierarchy_scores(model, use_filtered=False, filter_upper_layers=True, factor_weight='uniform', eps=1e-12)
Compute per-factor and global hierarchy scores from learned W matrices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
use_filtered
|
bool
|
whether to use model.factor_lists / model.factor_names |
False
|
filter_upper_layers
|
bool
|
when use_filtered is False, whether to still use filtered factors for layers > 0 (both as parents and children) |
True
|
factor_weight
|
str
|
weighting scheme for factors, either "uniform" or "usage" |
'uniform'
|
eps
|
float
|
small epsilon value for numerical stability |
1e-12
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
dict containing per_factor DataFrame, per_transition DataFrame, global_score, and global_ambiguity |
factor_diagnostics(model, recompute=False)
Compute/store factor diagnostics in model.adata.uns['factor_obs'].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
recompute
|
bool
|
if True, force recomputation of the cached fixed upper-layer factor subset used for clarity scores, even if the fit revision did not change. |
False
|
get_hierarchy(model, simplified=True, drop_factors=None)
Get a dictionary containing the polytree contained in the scDEF graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
simplified
|
Optional[bool]
|
whether to collapse single-child nodes |
True
|
drop_factors
|
Optional[Sequence[str]]
|
factors to drop from the hierarchy |
None
|
Returns: hierarchy: the dictionary containing the hierarchy
make_biological_hierarchy(model)
Make the biological hierarchy of the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
Returns:
| Name | Type | Description |
|---|---|---|
biological_hierarchy |
Dict[str, Sequence[str]]
|
dictionary containing the biological hierarchy |
make_hierarchies(model)
Store the biological and technical hierarchies of the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
make_technical_hierarchy(model)
Make the technical hierarchy of the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
Returns:
| Name | Type | Description |
|---|---|---|
technical_hierarchy |
Dict[str, Sequence[str]]
|
dictionary containing the technical hierarchy |
set_technical_factors(model, factors=None)
Set the technical factors of the model.
Technical factors must be layer 0 factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
factors
|
Optional[Sequence[str]]
|
list of factor names to mark as technical |
None
|
umap(model, layers=None, use_log=False, metric='euclidean')
Compute UMAP embeddings for each scDEF layer.
The resulting embeddings are stored in
model.adata.obsm[f"X_umap_{layer_name}"] for each layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
layers
|
Optional[List[int]]
|
which layers to compute UMAPs for. If None, all layers with more than one factor are used (in descending order). |
None
|
use_log
|
bool
|
whether to use log-transformed cell-factor weights for the neighbor graph computation. |
False
|
metric
|
str
|
distance metric for neighbors computation. |
'euclidean'
|
Plotting
scdef.pl
Plotting utilities for scDEF.
biological_hierarchy(model, **kwargs)
Plot the biological hierarchy of the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
**kwargs
|
Any
|
keyword arguments passed to make_graph |
{}
|
Returns:
| Type | Description |
|---|---|
Graph
|
Graphviz Graph object |
cell_entropies(model, thres=0.9, show=True)
Plot cell entropies and factor numbers across layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
thres
|
float
|
Threshold for cumulative sum calculation |
0.9
|
show
|
bool
|
Whether to show the plot |
True
|
continuous_obs_scores(model, obs_keys, mode='correlations', vmax=None, vmin=None, **kwargs)
Plot the correlations between a set of cell annotations and factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_keys
|
Sequence[str]
|
the keys in model.adata.obs to use |
required |
mode
|
Literal['correlations']
|
how to compute scores |
'correlations'
|
**kwargs
|
Any
|
plotting keyword arguments |
{}
|
factor_diagnostics(model, brd_min=1.0, ard_min=0.001, clarity_min=0.5, figsize=(6, 4), ax=None, annotate_factors=False, annotation_fontsize=8, annotation_alpha=0.8, show=True)
Diagnostic scatter plot of factors: BRD vs Effective parents colored by ARD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
brd_min
|
float
|
minimum BRD filter threshold |
1.0
|
ard_min
|
float
|
minimum ARD filter threshold (fraction of total ARD) |
0.001
|
clarity
|
clarity threshold for effective parents calculation |
required | |
figsize
|
tuple
|
Figure size (if ax is None) |
(6, 4)
|
ax
|
Optional[Axes]
|
matplotlib Axes to plot on |
None
|
annotate_factors
|
bool
|
whether to annotate each point with its factor label |
False
|
annotation_fontsize
|
int
|
fontsize for factor text annotations |
8
|
annotation_alpha
|
float
|
alpha value for factor text annotations |
0.8
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Axes]
|
Axes object if show is False, None otherwise. |
factor_genes(model, thres=0.9, show=True)
Plot number of genes in factors across layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
thres
|
float
|
threshold for cumulative sum calculation |
0.9
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Figure]
|
Figure object if show is False, None otherwise |
factor_gini(model, idx, thres=0.9, show=True)
Plot Gini coefficient for a specific factor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
idx
|
int
|
Factor index to plot |
required |
thres
|
float
|
Threshold for cumulative sum calculation |
0.9
|
show
|
bool
|
Whether to show the plot |
True
|
factors_bars(model, obs_keys, sort_layer_factors=True, orders=None, sharey=True, layers=None, vmax=None, vmin=None, fontsize=12, title_fontsize=12, legend_fontsize=8, figsize=(10, 4), total=False, show=True)
Plot factor scores as bar charts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_keys
|
Union[str, List[str]]
|
observation keys to plot |
required |
sort_layer_factors
|
bool
|
whether to sort factors by layer |
True
|
orders
|
Optional[List[ndarray]]
|
custom factor orders |
None
|
sharey
|
bool
|
whether to share y-axis across subplots |
True
|
layers
|
Optional[List[int]]
|
which layers to plot |
None
|
vmax
|
Optional[float]
|
maximum value for y-axis |
None
|
vmin
|
Optional[float]
|
minimum value for y-axis |
None
|
fontsize
|
int
|
font size for labels |
12
|
title_fontsize
|
int
|
title font size |
12
|
legend_fontsize
|
int
|
legend font size |
8
|
figsize
|
Tuple[float, float]
|
figure size |
(10, 4)
|
total
|
bool
|
whether to plot total scores |
False
|
show
|
bool
|
whether to show the plot |
True
|
gini_brd(model, normalize=False, figsize=(4, 4), alpha=0.6, fontsize=12, legend_fontsize=10, show=True, ax=None)
Plot Gini coefficient vs BRD scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
normalize
|
bool
|
whether to normalize BRD scores |
False
|
figsize
|
Tuple[float, float]
|
figure size |
(4, 4)
|
alpha
|
float
|
transparency level |
0.6
|
fontsize
|
int
|
font size for labels |
12
|
legend_fontsize
|
int
|
font size for legend |
10
|
show
|
bool
|
whether to show the plot |
True
|
ax
|
Optional[Axes]
|
matplotlib axes to plot on |
None
|
Returns:
| Type | Description |
|---|---|
Optional[Axes]
|
Axes object if show is False, None otherwise |
layers_obs(model, obs_keys, obs_mats, obs_clusters, obs_vals_dict, sort_layer_factors=True, orders=None, layers=None, vmax=None, vmin=None, cb_title='', cb_title_fontsize=10, fontsize=12, title_fontsize=12, pad=0.1, shrink=0.7, figsize=(10, 4), xticks_rotation=90.0, cmap=None, show=True, rasterized=False, **kwargs)
Plot observation matrices across layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_keys
|
Union[str, List[str]]
|
observation keys to plot |
required |
obs_mats
|
Dict[str, Dict[int, ndarray]]
|
observation matrices dictionary |
required |
sort_layer_factors
|
bool
|
whether to sort factors by layer |
True
|
orders
|
Optional[List[ndarray]]
|
custom factor orders |
None
|
layers
|
Optional[List[int]]
|
which layers to plot |
None
|
vmax
|
Optional[float]
|
maximum value for colormap |
None
|
vmin
|
Optional[float]
|
minimum value for colormap |
None
|
cb_title
|
str
|
colorbar title |
''
|
cb_title_fontsize
|
int
|
colorbar title font size |
10
|
fontsize
|
int
|
font size for labels |
12
|
title_fontsize
|
int
|
title font size |
12
|
pad
|
float
|
padding for colorbar |
0.1
|
shrink
|
float
|
shrink factor for colorbar |
0.7
|
figsize
|
Tuple[float, float]
|
figure size |
(10, 4)
|
xticks_rotation
|
float
|
rotation angle for x-axis ticks |
90.0
|
cmap
|
Optional[str]
|
colormap name |
None
|
show
|
bool
|
whether to show the plot |
True
|
rasterized
|
bool
|
whether to rasterize the plot |
False
|
**kwargs
|
Any
|
additional plotting keyword arguments |
{}
|
loss(model, figsize=(4, 4), fontsize=12, ax=None, show=True)
Plot training loss over epochs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
figsize
|
Tuple[float, float]
|
figure size |
(4, 4)
|
fontsize
|
int
|
font size for labels |
12
|
ax
|
Optional[Axes]
|
matplotlib axes to plot on |
None
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Axes]
|
Axes object if show is False, None otherwise |
make_graph(model, hierarchy=None, show_all=False, factor_annotations=None, top_factor=None, show_signatures=True, drop_factors=None, root_signature=None, root_ranking=None, enrichments=None, top_genes=None, show_batch_counts=False, filled=None, wedged=None, assignments=True, color_edges=True, show_confidences=False, mc_samples=100, n_cells_label=False, n_cells=False, node_size_max=2.0, node_size_min=0.05, scale_level=False, show_label=True, gene_score=None, gene_cmap='viridis', shell=False, r=2.0, r_decay=0.8, **fontsize_kwargs)
Make Graphviz-formatted scDEF graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
hierarchy
|
Optional[Dict[str, Sequence[str]]]
|
dictionary containing the polytree to draw instead of the whole graph |
None
|
show_all
|
Optional[bool]
|
whether to show all factors even post filtering |
False
|
factor_annotations
|
Optional[Dict[str, str]]
|
factor annotations to include in the node labels |
None
|
top_factor
|
Optional[str]
|
only include factors below this factor |
None
|
show_signatures
|
Optional[bool]
|
whether to show the ranked gene signatures in the node labels |
True
|
drop_factors
|
Optional[List[str]]
|
list of factors to drop from the graph |
None
|
root_signature
|
Optional[List[str]]
|
root signature to display |
None
|
root_ranking
|
Optional[List[str]]
|
root ranking to display |
None
|
enrichments
|
Optional[DataFrame]
|
enrichment results from gseapy to include in the node labels |
None
|
top_genes
|
Optional[Union[int, List[int]]]
|
number of genes from each signature to be shown in the node labels |
None
|
show_batch_counts
|
Optional[bool]
|
whether to show the number of cells from each batch that attach to each factor |
False
|
filled
|
Optional[Union[str, Dict[str, float]]]
|
key from model.adata.obs to use to fill the nodes with, or dictionary of factor scores |
None
|
wedged
|
Optional[str]
|
key from model.adata.obs to use to wedge the nodes with |
None
|
assignments
|
Optional[bool]
|
whether to use the assignments of cells to factors to wedge the nodes, rather than the scores |
True
|
color_edges
|
Optional[bool]
|
whether to color the graph edges according to the upper factors |
True
|
show_confidences
|
Optional[bool]
|
whether to show the confidence score for each signature |
False
|
mc_samples
|
Optional[int]
|
number of Monte Carlo samples to take from the posterior to compute signature confidences |
100
|
n_cells_label
|
Optional[bool]
|
whether to show the number of cells that attach to the factor |
False
|
n_cells
|
Optional[bool]
|
whether to scale the node sizes by the number of cells that attach to the factor |
False
|
node_size_max
|
Optional[float]
|
maximum node size when scaled by cell numbers |
2.0
|
node_size_min
|
Optional[float]
|
minimum node size when scaled by cell numbers |
0.05
|
scale_level
|
Optional[bool]
|
whether to scale node sizes per level instead of across all levels |
False
|
show_label
|
Optional[bool]
|
whether to show labels on nodes |
True
|
gene_score
|
Optional[str]
|
color the nodes by the score they attribute to a gene, normalized by layer. Overrides filled and wedged |
None
|
gene_cmap
|
Optional[str]
|
colormap to use for gene_score |
'viridis'
|
shell
|
Optional[bool]
|
whether to use shell layout |
False
|
r
|
Optional[float]
|
radius parameter for shell layout |
2.0
|
r_decay
|
Optional[float]
|
radius decay parameter for shell layout |
0.8
|
**fontsize_kwargs
|
Any
|
keyword arguments to adjust the fontsizes according to the gene scores |
{}
|
Returns:
| Type | Description |
|---|---|
Graph
|
Graphviz Graph object |
multilevel_paga(model, neighbors_rep='X_L0', layers=None, figsize=(16, 4), reuse_pos=True, fontsize=12, show=True, **paga_kwargs)
Plot a PAGA graph from each scDEF layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
neighbors_rep
|
Optional[str]
|
the model.obsm key to use to compute the PAGA graphs |
'X_L0'
|
layers
|
Optional[List[int]]
|
which layers to plot |
None
|
figsize
|
Optional[Tuple[float, float]]
|
figure size |
(16, 4)
|
reuse_pos
|
Optional[bool]
|
whether to initialize each PAGA graph with the graph from the layer above |
True
|
fontsize
|
Optional[int]
|
font size for labels |
12
|
show
|
Optional[bool]
|
whether to show the plot |
True
|
**paga_kwargs
|
Any
|
keyword arguments to adjust the PAGA layouts |
{}
|
obs_factor_dotplot(model, obs_key, layer_idx, cluster_rows=True, cluster_cols=True, figsize=(8, 2), s_min=100, s_max=500, titlesize=12, labelsize=12, legend_fontsize=12, legend_titlesize=12, cmap='viridis', logged=False, width_ratios=[5, 1, 1], show_ylabel=True, show=True)
Plot dotplot showing factor assignments for observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_key
|
str
|
key in model.adata.obs to use for grouping |
required |
layer_idx
|
int
|
layer index to plot |
required |
cluster_rows
|
bool
|
whether to cluster rows |
True
|
cluster_cols
|
bool
|
whether to cluster columns |
True
|
figsize
|
Tuple[float, float]
|
figure size |
(8, 2)
|
s_min
|
int
|
minimum circle size |
100
|
s_max
|
int
|
maximum circle size |
500
|
titlesize
|
int
|
title font size |
12
|
labelsize
|
int
|
label font size |
12
|
legend_fontsize
|
int
|
legend font size |
12
|
legend_titlesize
|
int
|
legend title font size |
12
|
cmap
|
str
|
colormap name |
'viridis'
|
logged
|
bool
|
whether to log transform colors |
False
|
width_ratios
|
List[float]
|
width ratios for subplots |
[5, 1, 1]
|
show_ylabel
|
bool
|
whether to show y-axis label |
True
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Figure]
|
Figure object if show is False, None otherwise |
obs_scores(model, obs_keys, hierarchy=None, mode='fracs', vmax=None, vmin=None, **kwargs)
Plot the association between a set of cell annotations and factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_keys
|
Sequence[str]
|
the keys in model.adata.obs to use |
required |
hierarchy
|
Optional[Dict[str, Sequence[str]]]
|
the polytree to restrict the associations to |
None
|
mode
|
Literal['f1', 'fracs', 'weights']
|
whether to compute scores based on assignments or weights |
'fracs'
|
**kwargs
|
Any
|
plotting keyword arguments |
{}
|
pathway_scores(model, pathways, top_genes=20, **kwargs)
Plot the association between a set of cell annotations and a set of gene signatures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
pathways
|
DataFrame
|
a pandas DataFrame containing PROGENy pathways |
required |
top_genes
|
Optional[int]
|
number of top genes to consider |
20
|
**kwargs
|
Any
|
plotting keyword arguments |
{}
|
relevance(model, mode='brd', thres=None, iqr_mult=None, show_yticks=False, scale='linear', normalize=False, fontsize=14, legend_fontsize=12, xlabel='Factor', ylabel='Relevance', color=False, show=True, ax=None, **kwargs)
Plot relevance determination scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
mode
|
Literal['brd', 'ard']
|
mode to plot, either "brd" or "ard" |
'brd'
|
thres
|
Optional[float]
|
threshold value for relevance cutoff |
None
|
iqr_mult
|
Optional[float]
|
multiplier for IQR-based threshold |
None
|
show_yticks
|
bool
|
whether to show y-axis ticks |
False
|
scale
|
Literal['linear', 'log']
|
scale for y-axis, either "linear" or "log" |
'linear'
|
normalize
|
bool
|
whether to normalize relevance scores |
False
|
fontsize
|
int
|
font size for labels |
14
|
legend_fontsize
|
int
|
font size for legend |
12
|
xlabel
|
str
|
label for x-axis |
'Factor'
|
ylabel
|
str
|
label for y-axis |
'Relevance'
|
color
|
bool
|
whether to color bars by factor type |
False
|
show
|
bool
|
whether to show the plot |
True
|
ax
|
Optional[Axes]
|
matplotlib axes to plot on |
None
|
**kwargs
|
Any
|
additional plotting keyword arguments |
{}
|
Returns:
| Type | Description |
|---|---|
Optional[Axes]
|
Axes object if show is False, None otherwise |
scale(model, scale_type, figsize=(4, 4), alpha=0.6, fontsize=12, legend_fontsize=10, ax=None, show=True)
Plot learned scale factors vs observed scales.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
scale_type
|
Literal['cell', 'gene']
|
type of scale to plot, either "cell" or "gene" |
required |
figsize
|
Tuple[float, float]
|
figure size |
(4, 4)
|
alpha
|
float
|
transparency level |
0.6
|
fontsize
|
int
|
font size for labels |
12
|
legend_fontsize
|
int
|
font size for legend |
10
|
ax
|
Optional[Axes]
|
matplotlib axes to plot on |
None
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Axes]
|
Axes object if show is False, None otherwise |
scales(model, figsize=(8, 4), alpha=0.6, fontsize=12, legend_fontsize=10, show=True)
Plot both cell and gene scales.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
figsize
|
Tuple[float, float]
|
figure size |
(8, 4)
|
alpha
|
float
|
transparency level |
0.6
|
fontsize
|
int
|
font size for labels |
12
|
legend_fontsize
|
int
|
font size for legend |
10
|
show
|
bool
|
whether to show the plot |
True
|
Returns:
| Type | Description |
|---|---|
Optional[Figure]
|
Figure object if show is False, None otherwise |
signatures_scores(model, obs_keys, markers, top_genes=10, hierarchy=None, **kwargs)
Plot the association between a set of cell annotations and a set of gene signatures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
obs_keys
|
Sequence[str]
|
the keys in model.adata.obs to use |
required |
markers
|
Mapping[str, Sequence[str]]
|
a dictionary with keys corresponding to model.adata.obs[obs_keys] and values to gene lists |
required |
top_genes
|
Optional[int]
|
number of genes to consider in the score computations |
10
|
hierarchy
|
Optional[Dict[str, Sequence[str]]]
|
the polytree to restrict the associations to |
None
|
**kwargs
|
Any
|
plotting keyword arguments |
{}
|
technical_hierarchy(model, show_signatures=True, **kwargs)
Plot the technical hierarchy of the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
show_signatures
|
bool
|
whether to show gene signatures |
True
|
**kwargs
|
Any
|
keyword arguments passed to make_graph |
{}
|
Returns:
| Type | Description |
|---|---|
Graph
|
Graphviz Graph object |
umap(model, color=[], layers=None, figsize=(16, 4), fontsize=12, legend_fontsize=10, rasterized=True, n_legend_cols=1, factor_subset=None, show=True)
Plot pre-computed UMAPs for different layers.
UMAP embeddings must have been computed first via scdef.tl.umap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
scDEF
|
scDEF model instance |
required |
color
|
Union[str, List[str]]
|
color key(s) to use for coloring |
[]
|
layers
|
Optional[List[int]]
|
which layers to plot |
None
|
figsize
|
Tuple[float, float]
|
figure size |
(16, 4)
|
fontsize
|
int
|
font size for labels |
12
|
legend_fontsize
|
int
|
legend font size |
10
|
rasterized
|
bool
|
whether to rasterize the plot |
True
|
n_legend_cols
|
int
|
number of columns in legend |
1
|
factor_subset
|
Optional[List[str]]
|
subset of factors to plot |
None
|
show
|
bool
|
whether to show the plot |
True
|