Basic Workflows
PYggdrasil implements several basic workflows for simulated mutation profile experiments.
We originally used these workflows as part of larger experiments to evaluate SCITE’s performance.
Here, we show the workflow to run a SCITE mutation profile simulation and inference experiment. We visualize the evolution of the chains via the log probability and two similarity measures.
In workflows/
, we define several
Snakemake workflows.
These workflows are defined in a modular way so that they can be easily
combined to create more complex workflows.
workflows/tree_inference.smk
implements rules which run a mutation profile simulation and inference experiment.workflows/anayze.smk
implements rules to analyze the results of a simulation and inference experiment.workflows/visualize.smk
implements rules to visualize the results of a simulation and inference experiment.
All the markXX
rules define more complex workflows using these basic
functionalities. These experiments are defined in workflows/markXX.smk
and are part of gordonkoehn’s thesis.
Here, we show how the basic workflows can work together to run a single MCMC chain and visualize the results. All workflow steps are designed to yield intermediate results saved to the disk. Each file is named uniquely to be easily identified and used in other workflows. A filename implies the complete history of its generation! (This results in long filenames but allows us to use pure string matching in snakemake – like magic.)
Run a single MCMC chain
Here is how you would run the mark04 workflow.
# navigate to the workflow directory
cd workflows
# run the mark00 workflow with four cores
snakemake -c 4 mark00
Note: before you can run it, you need to install snakemake at best in a conda environment. See workflows/README.md for more details.
Also, you need to adjust the paths of the DATADIR and REPODIR in
workflows/mark00.smk_
and workflows/tree_inference.smk
!
Once you get it running, Here is what is happening; the diagram below shows the DAG of the mark00 workflow.
This graphic was generated by the following command:
The core rules here are
- gen_cell_simulation to generate a simulated mutation profile given a tree,
- mcmc running the inference and
- analyze_metrics to compute the similarity metrics.
For the rest of the rules, see the individual files:
workflows/tree_inference.smk
workflows/analyze.smk
workflows/visualize.smk
The full workflow generated these three files:
Note the AD is a bad metric to visualize here, as we use a star tree as a ground truth. No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per the definition of a star tree.