Distances
pyggdrasil.distances.TreeDistance
Bases: TreeSimilarityMeasure
Interface for distance functions between the trees.
The hyperparameters of the metric should be set at the class initialization stage, similarly as with models in SciKit-Learn.
Note
The distances between trees should be treated as tree dissimilarity measures, rather than mathematical metrics. For example, the triangle inequality does not need to hold.
triangle_inequality()
Returns True
if the triangle inequality
.. math::
d(t_1, t_3) <= d(t_1, t_2) + d(t_2, t_3)
is known to hold for this distance.
Note
If it is not known whether the triangle inequality
holds for a metric, False
should be returned.
pyggdrasil.distances.TreeSimilarity
Bases: TreeSimilarityMeasure
Interface for similarity functions between the trees.
The hyperparameters should be set at the class initialization stage, similarly as with models in SciKit-Learn.
pyggdrasil.distances.TreeSimilarityMeasure
Bases: Protocol
Interface for similarity or distance functions between the trees.
The hyperparameters should be set at the class initialization stage, similarly as with models in SciKit-Learn.
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
_IntegerTreeRoot
|
root of the first tree. The nodes should be labeled with integers. |
required |
tree2 |
_IntegerTreeRoot
|
root of the second tree. The nodes should be labeled with integers. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
pyggdrasil.distances.calculate_distance_matrix(trees1, trees2, /, *, distance)
Calculates a cross-distance matrix
d[i, j] = distance(trees1[i], trees2[j])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trees1 |
Sequence[_IntegerTreeRoot]
|
sequence of trees in one set, length m |
required |
trees2 |
Sequence[_IntegerTreeRoot]
|
sequence of trees in the second set, length n |
required |
distance |
TreeSimilarityMeasure
|
distance or similarity function |
required |
Returns:
Type | Description |
---|---|
ndarray
|
distance matrix, shape (m, n) |
pyggdrasil.distances.AncestorDescendantSimilarity
Bases: TreeSimilarity
Ancestor-descendant accuracy.
- Considers only ancestor-descendant relationships between mutation,
i.e. excludes the root node. For an implementation with the root considered see AncestorDescendantSimilarityInclRoot instead.
Raises:
Type | Description |
---|---|
DivisionByZeroError
|
If first tree is a star tree. Fork of scPhylo's not updated yet. Happens as no pairs of ancestor-descendant nodes can be created, given root is not considered. |
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
using scphylo.tl.ad
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
Node
|
root of the first tree. The nodes should be labeled with integers. |
required |
tree2 |
Node
|
root of the second tree. The nodes should be labeled with integers. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
pyggdrasil.distances.MP3Similarity
Bases: TreeSimilarity
MP3 similarity.
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
using scphulo.tl.mp3
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
Node
|
root of the first tree. The nodes should be labeled with integers. |
required |
tree2 |
Node
|
root of the second tree. The nodes should be labeled with integers. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
pyggdrasil.distances.AncestorDescendantSimilarityInclRoot
Bases: TreeSimilarity
Ancestor-descendant similarity, adopted from @laurabquintas / Laura Quintas
Counts the root as a mutation, i.e. considers pairs of ancestor-descendant nodes between root and nodes - effectivly making comparisons if mutations exist in both trees. May lead a higher similarity score than AncestorDescendantSimilarity.
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
using scphylo.tl.ad
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
Node
|
root of the first tree. The nodes should be labeled with integers. |
required |
tree2 |
Node
|
root of the second tree. The nodes should be labeled with integers. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
pyggdrasil.distances.DifferentLineageSimilarity
Bases: TreeSimilarity
Different-Lineage similarity.
For each pair of mutations in ground truth tree that are in different-lineages relation we check whether the same relationship is preserved in the inferred tree.
Similarity out of one.
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
using scphulo.tl.dl
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
Node
|
root of the first tree. The nodes should be labeled with integers. Considered the ground truth tree. |
required |
tree2 |
Node
|
root of the second tree. The nodes should be labeled with integers. Considered the inferred tree to be compared to the ground truth. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
Known to be asymmetric.
pyggdrasil.distances.MLTDSimilarity
Bases: TreeSimilarity
Multi-labeled tree dissimilarity measure (MLTD), normalized to [0,1].
Similarity out of one.
Raises: Segmentation faults sometimes, unknown why. - scyphylo's issue.
calculate(tree1, tree2)
Calculates similarity between tree1
and tree2
using scphulo.tl.dl
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree1 |
Node
|
root of the first tree. The nodes should be labeled with integers. |
required |
tree2 |
Node
|
root of the second tree. The nodes should be labeled with integers. |
required |
Returns:
Type | Description |
---|---|
float
|
similarity from |
is_symmetric()
Returns True
if the similarity function is symmetric,
i.e., :math:s(t_1, t_2) = s(t_2, t_1)
for all pairs of trees.
Note
If it is not known whether the similarity function is symmetric,
False
should be returned.
Unknown, but probably not symmetric.