scDEF on 3k PBMCs¶
scDEF is a statistical model that learns signatures of gene expression at multiple levels of resolution. The model enables dimensionality reduction, clustering, and de novo signature identification from scRNA-seq data. scDEF can be run in a fully unsupervised manner, or be guided by known gene sets, which we denote informed scDEF (iscDEF).
Here we apply scDEF in its basic version to the 3k PBMCs data set from 10x Genomics used in the scanpy tutorial, using annotations obtained from Seurat.
Create a ground truth¶
We will use known markers and the true hierarchical structure for the cell types in this data set to evaluate the performance of scDEF.
markers = {'Memory CD4 T': ['IL7R', 'CD3D', 'CD3E', 'IL32', 'CD2'], 'Naive CD4 T': ['IL7R', 'CD3D', 'CD3E', 'TCF7', 'CCR7', 'CD2'],
'CD8 T': ['CD8A', 'CD8B', 'CCL5', 'CD2', 'CD3D', 'CD3E'],
'NK': ['GNLY', 'NKG7', 'CD2'],
'B': ['MS4A1', 'CD79A', 'CD79B', 'CD74'],
'CD14+ Mono': ['CD14', 'LYZ'], 'FCGR3A+ Mono': ['CD14', 'FCGR3A', 'MS4A7', 'LYZ'], 'DC': ['CD14', 'FCER1A', 'CST3', 'CD74'],
'Platelet': ['PPBP', 'PF4']}
true_hierarchy = {'T': ['CD8 T', 'Memory CD4 T', 'Naive CD4 T'],
'Mono': ['FCGR3A+ Mono', 'CD14+ Mono', 'DC'],
'Platelet': [],
'B': [],
'CD8 T': [],
'Memory CD4 T': [],
'Naive CD4 T': [],
'NK': [],
'FCGR3A+ Mono': [],
'CD14+ Mono': [],
'DC': []}
Load packages and data¶
import scdef as scd
import matplotlib.pyplot as plt
import pandas as pd
import scanpy as sc
We apply the usual basic filtering for cells and genes with very low numbers of reads. We also apply liberal highly variable gene filtering to make sure we include all possible marker genes while reducing the runtime required to fit the model.
adata = sc.read_10x_mtx(
'../scdef_notebooks/pbmcs_data/pbmcs3k/data/filtered_gene_bc_matrices/hg19/', # the directory with the `.mtx` file
var_names='gene_symbols', # use gene symbols for the variable names (variables-axis index)
cache=True)
adata.obs.index = adata.obs.index.str.replace('-1', '')
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
adata.var['rb'] = adata.var_names.str.startswith('RPS') | adata.var_names.str.startswith('RPL')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'rb'], percent_top=None, log1p=False, inplace=True)
adata = adata[adata.obs.n_genes_by_counts < 2500, :]
adata = adata[adata.obs.pct_counts_mt < 5, :]
adata.layers['counts'] = adata.X.toarray() # Keep the counts, for scDEF
adata.raw = adata
# Keep only HVGs
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(
adata, flavor="seurat_v3", layer="counts", n_top_genes=4000
) # Not required, but makes scDEF faster
# Make sure marker genes are present
markers_list = list(set([x for sub in [markers[n] for n in markers] for x in sub]))
for marker in markers_list:
adata.var.loc[marker, 'highly_variable'] = True
adata = adata[:, adata.var.highly_variable]
# Process and visualize the data
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
sc.pp.scale(adata, max_value=10)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.pl.umap(adata, color=['leiden'])
/scratch/slurm-job.8595410/ipykernel_118404/2206424982.py:15: ImplicitModificationWarning: Setting element `.layers['counts']` of view, initializing view as actual. adata.layers['counts'] = adata.X.toarray() # Keep the counts, for scDEF /cluster/work/tumorp/analysis/beerenwinkellab/pedrof/miniconda3/envs/scdefrev/lib/python3.10/site-packages/scanpy/preprocessing/_simple.py:729: UserWarning: Received a view of an AnnData. Making a copy. view_to_actual(adata) /scratch/slurm-job.8595410/ipykernel_118404/2206424982.py:36: FutureWarning: In the future, the default backend for leiden will be igraph instead of leidenalg. To achieve the future defaults please pass: flavor="igraph" and n_iterations=2. directed must also be False to work with igraph's implementation. sc.tl.leiden(adata)
Let's add the cell type annotations obtained from Seurat on this data set and add use it to create coarser cell type annotations. This will be useful to evaluate the scDEF hierarchy.
annotations = pd.read_csv('../scdef_notebooks/pbmcs_data/pbmcs3k/annotations.csv', index_col=0)
map_coarse = {}
for c in annotations['seurat_annotations'].astype("category").cat.categories:
if c.endswith(' T'):
map_coarse[c] = 'T'
elif c.endswith('Mono') or c == 'DC':
map_coarse[c] = 'Mono'
else:
map_coarse[c] = c
adata.obs['celltypes'] = annotations['seurat_annotations']
adata.obs['celltypes_coarse'] = (
adata.obs['celltypes']
.map(map_coarse)
.astype('category')
)
# Look at the data
sc.pl.umap(adata, color=['celltypes_coarse', 'celltypes'])
Learn scDEF¶
We start by creating an scDEF object from the AnnData object containing the gene expression counts. scDEF requires raw counts, so we tell it to get them in the counts layer that we created in the pre-processing step. The model is initialized with 5 layers of sizes 100, 50, 25, 12, and 1. Due to the sparsity priors in the factor weights, most of these will be removed in the learned model, which will yield a compact representation of the hierarchical structure of the data.
model = scd.scDEF(adata, counts_layer='counts')
print(model)
scDEF object with 5 layers
Layer names: L0, L1, L2, L3, L4
Layer sizes: 100, 50, 25, 12, 1
Layer concentration parameter: 1.0
Using BRD
Number of batches: 1
Contains AnnData object with n_obs × n_vars = 2638 × 4003
obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rb', 'pct_counts_rb', 'leiden', 'celltypes', 'celltypes_coarse'
var: 'gene_ids', 'n_cells', 'mt', 'rb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'leiden', 'leiden_colors', 'celltypes_coarse_colors', 'celltypes_colors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
layers: 'counts'
obsp: 'distances', 'connectivities'
Fit the model to data.
model.fit()
INFO:scDEF:Initializing the factor weights with NMF. /cluster/work/tumorp/analysis/beerenwinkellab/pedrof/miniconda3/envs/scdefrev/lib/python3.10/site-packages/sklearn/decomposition/_nmf.py:1728: ConvergenceWarning: Maximum number of iterations 200 reached. Increase it to improve convergence. warnings.warn( INFO:scDEF:Initializing optimizer with learning rate 0.01. INFO:scDEF:Each epoch contains 11 batches of size 256 0%| | 0/1000 [00:00<?, ?it/s]2026-03-17 16:41:11.681937: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 64 configs for 3 fusions on a single thread. 2026-03-17 16:41:28.280285: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 83 configs for 4 fusions on a single thread. 2026-03-17 16:42:41.741167: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 58 configs for 3 fusions on a single thread. 2026-03-17 16:43:47.428775: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 80 configs for 4 fusions on a single thread. 64%|██████▍ | 639/1000 [10:29<05:55, 1.02it/s, Loss=2.92e+6, Rel. improvement=-0.0026] INFO:scDEF:Relative improvement of -0.00212 < 1e-05 for 50 step(s) in a row, stopping early. INFO:scDEF:It seems like the loss increased significantly. Consider using a lower learning rate. INFO:scDEF:Updated adata.var: `gene_scale` INFO:scDEF:Updated adata.obs with layer 0: `L0` and `L0_score` for all factors in layer 0 INFO:scDEF:Updated adata.obsm with layer 0: `X_L0` INFO:scDEF:Updated adata.uns with layer 0 signatures: `L0_signatures`. INFO:scDEF:Updated adata.obs with layer 1: `L1` and `L1_score` for all factors in layer 1 INFO:scDEF:Updated adata.obsm with layer 1: `X_L1` INFO:scDEF:Updated adata.uns with layer 1 signatures: `L1_signatures`. INFO:scDEF:Updated adata.obs with layer 2: `L2` and `L2_score` for all factors in layer 2 INFO:scDEF:Updated adata.obsm with layer 2: `X_L2` INFO:scDEF:Updated adata.uns with layer 2 signatures: `L2_signatures`. INFO:scDEF:Updated adata.obs with layer 3: `L3` and `L3_score` for all factors in layer 3 INFO:scDEF:Updated adata.obsm with layer 3: `X_L3` INFO:scDEF:Updated adata.uns with layer 3 signatures: `L3_signatures`. INFO:scDEF:Updated adata.obs with layer 4: `L4` and `L4_score` for all factors in layer 4 INFO:scDEF:Updated adata.obsm with layer 4: `X_L4` INFO:scDEF:Updated adata.uns with layer 4 signatures: `L4_signatures`.
QC and factor diagnosis¶
By default, scDEF learns a large, 5-layer factorization with 100 high-resolution factors. We first check that the model converged and that the per-cell, per-gene and per-factor parameters seem reasonable.
Specifically, we use the scd.pl.qc() function to confirm that:
- The loss curve over optimization steps shows that the model fit converged;
- The learned per-factor BRD values correlate with factor sparsity as measured by Gini index;
- The learned cell and gene size factors correlate with the empirical values;
- The BRD and ARD distributions are peaky over the 100 factors.
scd.pl.qc(model)
Some factors may not be fit the hierarchical structure very well, which might indicate that they capture technical effects. A major benefit of scDEF is that we can use the model parameters to quantify our belief that each factor captures true biology and use that to filter out noisy factors. To evalaute this, we compute the number of effective parents for each factor, or inversely their hierarchical clarity, and plot it alongside the BRD and ARD.
scd.tl.factor_diagnostics(model)
scd.pl.factor_diagnostics(model, brd_min=1., ard_min=0.001, clarity_min=.5, annotate_factors=True)
Actually perform the filtering.
model.filter_factors()
INFO:scDEF:Updated adata.var: `gene_scale` INFO:scDEF:Updated adata.obs with layer 0: `L0` and `L0_score` for all factors in layer 0 INFO:scDEF:Updated adata.obsm with layer 0: `X_L0` INFO:scDEF:Updated adata.uns with layer 0 signatures: `L0_signatures`. INFO:scDEF:Updated adata.obs with layer 1: `L1` and `L1_score` for all factors in layer 1 INFO:scDEF:Updated adata.obsm with layer 1: `X_L1` INFO:scDEF:Updated adata.uns with layer 1 signatures: `L1_signatures`. INFO:scDEF:Updated adata.obs with layer 2: `L2` and `L2_score` for all factors in layer 2 INFO:scDEF:Updated adata.obsm with layer 2: `X_L2` INFO:scDEF:Updated adata.uns with layer 2 signatures: `L2_signatures`. INFO:scDEF:Updated adata.obs with layer 3: `L3` and `L3_score` for all factors in layer 3 INFO:scDEF:Updated adata.obsm with layer 3: `X_L3` INFO:scDEF:Updated adata.uns with layer 3 signatures: `L3_signatures`. INFO:scDEF:Updated adata.obs with layer 4: `L4` and `L4_score` for all factors in layer 4 INFO:scDEF:Updated adata.obsm with layer 4: `X_L4` INFO:scDEF:Updated adata.uns with layer 4 signatures: `L4_signatures`.
sc.pl.umap(model.adata, color=[f'{model.layer_names[i]}' for i in range(model.n_layers)]+ ['celltypes'], frameon=False)
The hierarchy on the large model was reduced to a much smaller number of L0 factors, but the hierarchy is poorly fit. We re-fit the model using the set of kept factors to obtain a high quality compact hierarchy.
Re-fit¶
Re-learn the hierarchy using the selected number of factors from the initial pass.
model.fit()
INFO:scDEF:Continuing scDEF from previous fit with layer sizes [15, 7, 3, 1]. INFO:scDEF:Initializing optimizer with learning rate 0.01. INFO:scDEF:Each epoch contains 11 batches of size 256 0%| | 0/1000 [00:00<?, ?it/s]2026-03-17 16:52:05.006070: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 32 configs for 2 fusions on a single thread. 2026-03-17 16:52:13.531072: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 31 configs for 2 fusions on a single thread. 2026-03-17 16:52:26.144499: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 30 configs for 2 fusions on a single thread. 2026-03-17 16:52:33.524903: W external/xla/xla/service/gpu/gemm_fusion_autotuner.cc:839] Compiling 29 configs for 2 fusions on a single thread. 64%|██████▍ | 639/1000 [06:57<03:55, 1.53it/s, Loss=2.88e+6, Rel. improvement=-0.00447] INFO:scDEF:Relative improvement of -0.003371 < 1e-05 for 50 step(s) in a row, stopping early. INFO:scDEF:It seems like the loss increased significantly. Consider using a lower learning rate. INFO:scDEF:Updated adata.var: `gene_scale` INFO:scDEF:Updated adata.obs with layer 0: `L0` and `L0_score` for all factors in layer 0 INFO:scDEF:Updated adata.obsm with layer 0: `X_L0` INFO:scDEF:Updated adata.uns with layer 0 signatures: `L0_signatures`. INFO:scDEF:Updated adata.obs with layer 1: `L1` and `L1_score` for all factors in layer 1 INFO:scDEF:Updated adata.obsm with layer 1: `X_L1` INFO:scDEF:Updated adata.uns with layer 1 signatures: `L1_signatures`. INFO:scDEF:Updated adata.obs with layer 2: `L2` and `L2_score` for all factors in layer 2 INFO:scDEF:Updated adata.obsm with layer 2: `X_L2` INFO:scDEF:Updated adata.uns with layer 2 signatures: `L2_signatures`. INFO:scDEF:Updated adata.obs with layer 3: `L3` and `L3_score` for all factors in layer 3 INFO:scDEF:Updated adata.obsm with layer 3: `X_L3` INFO:scDEF:Updated adata.uns with layer 3 signatures: `L3_signatures`.
And we QC the fit again to make sure nothing went wrong.
scd.pl.qc(model)
And assess factor quality again with the new hierarchy.
scd.tl.factor_diagnostics(model)
scd.pl.factor_diagnostics(model, brd_min=1., ard_min=0.001, clarity_min=.5, annotate_factors=True)
We see that a few factors have a high effective number of parents, indicating that they might capture technical effects.
Inspect factors and hierarchy¶
We can visualize the learned scDEF using scd.pl.make_graph, which uses Graphviz to plot the scDEF graph. We color each node by the proportion of cells from each cell type that are attached to that factor. We also show the top genes in each factor learned by the model.
scd.pl.make_graph(model, show_signatures=True, wedged='celltypes', color_edges=True, n_cells_label=True, assignments=True)
model.graph
/cluster/work/tumorp/analysis/beerenwinkellab/pedrof/scdef_2025_revisions/repo/src/scdef/plotting/graph.py:191: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` np.count_nonzero(model.adata.obs[wedged][cells] == b) INFO:scDEF:Updated scDEF graph
We see that the hierarchy identifies L0 factors which are strongly associated with one parent in L1, such as L0_1, L0_4, L0_5, and L0_11, and others which have unclear parent-child associations, such as L0_14, L0_0, and L0_9. The signatures of the factors with more effective parents also contain genes which are known to not be very specific to any cell type, such as MALAT1 in L0_0, which is likely a technical artifact, and mitochondrial and ribosomal protein genes in L0_9. The factor_diagnostics plot shows that these factors also have a low BRD.
Note that even though L0_6 has a relatively high number of effective parents, its BRD is also high, indicating that it is likely a biological factor that fits the hierarchy poorly. Indeed, from the graph it seems to capture platelet cells, which are difficult to assign to a hierarchy that has B cells, T cells, and monocytes as the main groups.
Separate technical from biological factors¶
scd.tl.set_technical_factors(model, ["L0_0", "L0_8", "L0_9", "L0_14"])
With scd.tl.make_hierarchies(), we obtain simplified hierarchies in model.adata.uns['biological_hierarchy'] and model.adata.uns['technical_hierarchy'] that we can visualize.
scd.tl.make_hierarchies(model) # saves bio and tech hierarchies in uns
scd.pl.biological_hierarchy(model, show_signatures=True, wedged='celltypes', show_label=True, n_cells=False, n_cells_label=True, color_edges=True) # requires model.adata.uns['biological_hierarchy']
/cluster/work/tumorp/analysis/beerenwinkellab/pedrof/scdef_2025_revisions/repo/src/scdef/plotting/graph.py:191: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` np.count_nonzero(model.adata.obs[wedged][cells] == b) INFO:scDEF:Updated scDEF graph
The biological hierarchy makes the relevant cell states and their hierarchical relationshios more apparent, with a clear monocyte group in L2_0, a T cell group in L2_2 that splits into Naive and Memory T cells in L1_2, and into cytotoxic lymphocytes in L1_4. Note that this hierarchy does not reflect lineage but rather transcriptional similarity.
scd.pl.technical_hierarchy(model, wedged='celltypes')
/cluster/work/tumorp/analysis/beerenwinkellab/pedrof/scdef_2025_revisions/repo/src/scdef/plotting/graph.py:191: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` np.count_nonzero(model.adata.obs[wedged][cells] == b) INFO:scDEF:Updated scDEF graph
The technical hierarchy plot produces a consensus signature of all the factors marked as technical. We can get the complete actual gene list for this consensus technical factor with scd.tl.get_technical_signature().
scd.tl.get_technical_signature(model, top_genes=10)
['MALAT1', 'PGM2L1', 'JUNB', 'GIMAP5', 'SAMD1', 'CTB-113I20.2', 'ABCC10', 'MT-CO2', 'CEP68', 'EIF2B1']
And any other factor signature is stored in a format similar to the result from a scanpy DE in model.adata.uns['L<level>_signatures'], which we can inspect using scanpy with sc.pl.rank_genes_groups():
sc.pl.rank_genes_groups(model.adata, key='L0_signatures', groups=['L0_0'])
And also available through model.get_signatures_dict():
signatures = model.get_signatures_dict()
print(signatures['L0_0'][:20])
['MALAT1', 'PGM2L1', 'GIMAP5', 'SAMD1', 'CTB-113I20.2', 'ABCC10', 'CEP68', 'RP11-141B14.1', 'CTNNB1', 'EIF2B1', 'SLC48A1', 'DHX34', 'PDE6B', 'FAAH', 'STK17A', 'IDUA', 'MUM1', 'RP11-378J18.3', 'NME6', 'CTB-152G17.6']
To proceed, we will again use the factor diagnostics to filter out the factors with high effective number of parents and low BRD.
scd.pl.factor_diagnostics(model, brd_min=1., ard_min=0.001, clarity_min=.6, annotate_factors=True)
# Actually filter
model.filter_factors(brd_min=1., ard_min=1e-3, clarity_min=.6)
INFO:scDEF:Updated adata.var: `gene_scale` INFO:scDEF:Updated adata.obs with layer 0: `L0` and `L0_score` for all factors in layer 0 INFO:scDEF:Updated adata.obsm with layer 0: `X_L0` INFO:scDEF:Updated adata.uns with layer 0 signatures: `L0_signatures`. INFO:scDEF:Updated adata.obs with layer 1: `L1` and `L1_score` for all factors in layer 1 INFO:scDEF:Updated adata.obsm with layer 1: `X_L1` INFO:scDEF:Updated adata.uns with layer 1 signatures: `L1_signatures`. INFO:scDEF:Updated adata.obs with layer 2: `L2` and `L2_score` for all factors in layer 2 INFO:scDEF:Updated adata.obsm with layer 2: `X_L2` INFO:scDEF:Updated adata.uns with layer 2 signatures: `L2_signatures`. INFO:scDEF:Updated adata.obs with layer 3: `L3` and `L3_score` for all factors in layer 3 INFO:scDEF:Updated adata.obsm with layer 3: `X_L3` INFO:scDEF:Updated adata.uns with layer 3 signatures: `L3_signatures`.
Visualize compact hierarchy¶
scd.tl.make_hierarchies(model) # because we filtered factors
scd.pl.biological_hierarchy(model, n_cells=False, # scale nodes by fraction of attached cells
wedged='celltypes', # color the nodes
show_label=True)
/cluster/work/tumorp/analysis/beerenwinkellab/pedrof/scdef_2025_revisions/repo/src/scdef/plotting/graph.py:191: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]` np.count_nonzero(model.adata.obs[wedged][cells] == b) INFO:scDEF:Updated scDEF graph
Visualize factors in UMAP¶
The cell-factor assignments are stored in model.adata.obs, so we can look at them in the UMAP.
sc.pl.umap(model.adata, color=[f'{model.layer_names[i]}' for i in range(model.n_layers)][:-1]+ ['celltypes'], frameon=False)
We may also inspect the cell-factor weights at any layer with <factor_name>_prob.
layer_idx = 2
sc.pl.umap(model.adata, color=[f'{f}_prob' for f in model.factor_names[layer_idx]], frameon=False, ncols=3)
layer_idx = 1
sc.pl.umap(model.adata, color=[f'{f}_prob' for f in model.factor_names[layer_idx]], frameon=False, ncols=3)
layer_idx = 0
sc.pl.umap(model.adata, color=[f'{f}_prob' for f in model.factor_names[layer_idx]], frameon=False, ncols=3)
Use multi-layer factors as embeddings for visualization¶
We can use the factors at each layer of the learned scDEF as embeddings of the cells in lower-dimensional spaces, and visualize them in 2D using a UMAP.
scd.tl.umap(model)
scd.pl.umap(model, color='celltypes')
Association of factors with annotations¶¶
scDEF also has plotting utilities to show association of factors with continuous or discrete values in model.adata.obs. The factors are sorted as in scd.graph.
# With continuous obs, by default shows the Spearman correlation of cell-factor weights with each obs
scd.pl.continuous_obs_scores(model, ['total_counts', 'pct_counts_mt'], figsize=(16,2), show=True,
cmap='RdBu_r', vmax=1, vmin=-1)
# With discrete obs, show the average weight that cells in each obs group give to each factor
scd.pl.obs_scores(model, ["celltypes", 'celltypes_coarse'], mode='weights', hierarchy=true_hierarchy) # use true_hierarchy to collapse repeated cell types in both keys
Because we have a list of marker genes for each cell type in the data, we can check the association of the learned gene signatures with each cell type's markers using the F1 score.
scd.pl.signatures_scores(model, "celltypes", markers, top_genes=20)
Annotate hierarchy¶
We may take a step further and label the factors by the annotations that they most strongly associate with to further validate the hierarchy.
# Assign factors in simplified hierarchy to annotations
assignments, matches = scd.utils.factor_utils.assign_obs_to_factors(model, ['celltypes', 'celltypes_coarse'],
factor_names=scd.utils.hierarchy_utils.get_nodes_from_hierarchy(model.adata.uns['biological_hierarchy']))
# Show graph with new labels
g = scd.pl.make_graph(model, hierarchy=model.adata.uns['biological_hierarchy'], factor_annotations=matches)
g
INFO:scDEF:Updated scDEF graph
With these assignments, we can select a particular lineage to plot. Additionally, here we also show the gene signature confidences by setting show_confidences=True.
g = scd.pl.make_graph(model, hierarchy=model.adata.uns['biological_hierarchy'], top_factor=assignments['Mono'], factor_annotations=matches,
show_confidences=True, mc_samples=10)
g
INFO:scDEF:Updated scDEF graph
g = scd.pl.make_graph(model, hierarchy=model.adata.uns['biological_hierarchy'], top_factor=assignments['T'], factor_annotations=matches,
show_confidences=True, mc_samples=10)
g
INFO:scDEF:Updated scDEF graph