Tree Similarities
In this tutorial we generate a bunch of trees and compute their pairwise similarities and viszalize them.
The visualizations are built with networkX and matplotlib. Quite some specification was done to make the visualizations look nice.
Setting up the envrionment:
Code
Generate trees
Random Tree
tree_type = yg.tree_inference.TreeType.RANDOM
tree_seed = 487
nodes = 10
random_tree = yg.tree_inference.make_tree(nodes, tree_type, tree_seed)
random_tree.print_topo()
9
├── 6
│ ├── 0
│ └── 3
└── 8
├── 4
│ └── 1
├── 5
└── 7
└── 2
Now let’s visualize this properly.
save_dir = Path("tree_sim_figs")
save_dir.mkdir(parents=True, exist_ok=True)
save_name = "random_tree"
yg.visualize.plot_tree_no_print(random_tree, save_name, save_dir)
Star Tree
tree_type = yg.tree_inference.TreeType.STAR
tree_seed = 487
nodes = 10
star_tree = yg.tree_inference.make_tree(nodes, tree_type, tree_seed)
Now let’s visualize this properly.
Deep Tree
tree_type = yg.tree_inference.TreeType.DEEP
tree_seed = 487
nodes = 10
deep_tree = yg.tree_inference.make_tree(nodes, tree_type, tree_seed)
Now let’s visualize this properly.
Note: PYggdrasil inplements two more advanced tree generation methods.
- MCMC tree generation - takes a tree and evolves it by a fixed number of random moves implemnted with SCITE.
- HUNTRESS inference - takes a cell-mutation profile and infers a tree with HUNTRESS.
Compute Similarities
What similarities to care for? We can compute the following similarities:
- Ancestor-Descendant (AD) Similarity
- Different-Lineage (DL) Similarity
# random tree to star tree
AD_star = yg.distances.AncestorDescendantSimilarity().calculate(random_tree, star_tree)
DL_star = yg.distances.DifferentLineageSimilarity().calculate(random_tree, star_tree)
print(f"AD Similarity: {AD_star}")
print(f"DL Similarity: {DL_star}")
AD Similarity: 0.0
DL Similarity: 1.0
- AD : 0.0 makes sense, since the star tree has no internal nodes, so no nodes are ancestors of other nodes. (AD does not consider the root node)
- DL : 1.0 makes sense, since the star tree has no internal nodes, so all nodes are in different lineages.
Code
AD Similarity: 0.11111111111111116
DL Similarity: 0.0
- AD: some chronological order is preserved, but not all.
- DL: 0.0 makes sense, as all nodes are in the same lineage.
Let’s have another random tree for fun:
Code
Code
AD Similarity: 0.0
DL Similarity: 0.8148148148148148
We see a more balanced mix of AD and DL similarities. Here by chance a AD of 0 again. Well, these are small trees.