View on GitHub

ShoRAH

Short Reads Assembly into Haplotypes

input local global help
Download this project as a .zip file Download this project as a tar.gz file

Prepare the input

The input for any analysis, local or global, is a sorted bam file. Here below the basic steps to produce such an alignment.

First of all, align the reads

Since the mass adoption of next-generation sequencing, the bioinformatics community has produced an enormous number of read mappers (aligners). A list is here.

Whatever is your chosen aligner, it will most likely have the option to output a file in SAM format. From the alignment in SAM format, you need to create a sorted bam alignment.

Use samtools to convert and sort the alignment

The most common set of tools to manipulate SAM alignments is samtools. This is shipped with ShoRAH, so you can navigate to the samtools directory and install it from there, or check if there is a more recent release, download and install it.

In the following, we assume that a sam alignment file my_reads.sam has been created by aligning reads to a reference file reference.fasta. The following converts to sorted bam

[user@host]$ samtools view -b -T reference.fasta my_reads.sam | samtools sort - my_reads_sorted

The output is a sorted bam file called my_reads_sorted.bam.

The above code is a shortcut to run conversion and sorting. The two commands would read

[user@host]$ samtools view -b -T reference.fasta -o my_reads.bam my_reads.sam
[user@host]$ samtools sort my_reads.bam my_reads_sorted

Go back home