Skip to content

Basic Workflows

PYggdrasil implements several basic workflows for simulated mutation profile experiments.

We originally used these workflows as part of larger experiments to evaluate SCITE’s performance.

Here, we show the workflow to run a SCITE mutation profile simulation and inference experiment. We visualize the evolution of the chains via the log probability and two similarity measures.

In workflows/, we define several Snakemake workflows. These workflows are defined in a modular way so that they can be easily combined to create more complex workflows.

  • workflows/tree_inference.smk implements rules which run a mutation profile simulation and inference experiment.
  • workflows/anayze.smk implements rules to analyze the results of a simulation and inference experiment.
  • workflows/visualize.smk implements rules to visualize the results of a simulation and inference experiment.

All the markXX rules define more complex workflows using these basic functionalities. These experiments are defined in workflows/markXX.smk and are part of gordonkoehn’s thesis.

Here, we show how the basic workflows can work together to run a single MCMC chain and visualize the results. All workflow steps are designed to yield intermediate results saved to the disk. Each file is named uniquely to be easily identified and used in other workflows. A filename implies the complete history of its generation! (This results in long filenames but allows us to use pure string matching in snakemake – like magic.)

Run a single MCMC chain

Here is how you would run the mark04 workflow.

 # navigate to the workflow directory
 cd workflows
 # run the mark00 workflow  with four cores
 snakemake -c 4 mark00

Note: before you can run it, you need to install snakemake at best in a conda environment. See workflows/README.md for more details.

Also, you need to adjust the paths of the DATADIR and REPODIR in workflows/mark00.smk_ and workflows/tree_inference.smk!

Once you get it running, Here is what is happening; the diagram below shows the DAG of the mark00 workflow.

mark00 directed acyclic graph of
workflow

This graphic was generated by the following command:

 snakemake --dag mark00 | dot -Tsvg > mark00.svg

The core rules here are

  • gen_cell_simulation to generate a simulated mutation profile given a tree,
  • mcmc running the inference and
  • analyze_metrics to compute the similarity metrics.

For the rest of the rules, see the individual files:

  • workflows/tree_inference.smk
  • workflows/analyze.smk
  • workflows/visualize.smk

The full workflow generated these three files:

mark00 log-prob evolution

mark00 MP3 evolution

Note the AD is a bad metric to visualize here, as we use a star tree as a ground truth. No matter what the inference does, the AD will always be 0, as no ancestor-descendant relationship is present per the definition of a star tree.

mark00 AD evolution