Generally speaking
The global analysis is performed by running shorah.py
on the input sorted bam file. This will perform a shotgun local analysis, followed by a global haplotype reconstruction and a frequency estimation. The output is a file with extension .global_haps.fasta
. It is a fasta file with all the reconstructed haplotype sequences, with the header indicating the frequency after the underscore. So, for example
>HAP0_0.264857
CCTCAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGG
means that the haplotype was estimated to have a frequency of 26.5%.
This file is a selection of the most frequent among all reconstructed haplotypes. These are in the file with extension .popl
.
A word of caution
Inferring haplotypes over a region longer than the reads is hard. Many false positives can be introduced if reads are shorter than the region one would need to observe to capture enough diversity. See the references
The .popl
file will typically contain many haplotypes, most of which at very low frequencies. You are advised not to give high confidence to haplotypes at frequency below a certain threshold that depends from case to case. Other software you could use for global reconstruction
Go back home