Pipeline

Dependencies

  • Conda Shell installer

    Conda is an open source package management system and environment management system. V-pipe uses it to automatically obtain reproducible environments and simplify installation of the individual components of the pipeline, thanks to the Bioconda channel - a distribution of bioinformatics software.

    See the documentation of conda to install it.

  • Snakemake Bioconda package Snakemake

    Snakemake is the central workflow and dependency manager of V-pipe. It determines the order in which individual tools are invoked and checks that programs do not exit unexpectedly.

    Once you have conda installed, you can in turn use it to obtain Snakemake (This is the recommended way to install it). Snakemake will subsequently obtain all the necessary components to V-pipe.

  • FastQC

    FastQC gives an overview of the raw sequencing data. Flowcells that have been overloaded or otherwise fail during sequencing can easily be determined with FastQC.

  • PRINSEQ Bioconda package

    Trimming and clipping of reads is performed by PRINSEQ. It is currently the most versatile raw read processor with many customization options.

  • Vicuna Bioconda package

    Vicuna is a de novo assembler designed for generating rough reference contigs of viral NGS data. It can deal with the inherent heterogeneity such as high single-base heterogeneity and structural variants.

  • InDelFixer Bioconda package

    InDelFixer is a sensitive aligner employing a full Smith-Waterman alignment against a reference, used to polish up consensus.

  • ConsensusFixer Bioconda package

    ConsensusFixer is also used to polish up consensus. It computes a consensus sequence with wobbles, ambiguous bases, and in-frame insertions, from a NGS read alignment.

  • ngshmmalign Bioconda package

    We perform the alignment of the curated NGS data using our custom ngshmmalign that takes structural variants into account. It produces multiple consensus sequences that include either majority bases or ambiguous bases.

  • bwa Bioconda package

    In order to detect specific cross-contaminations with other probes, the Burrows-Wheeler aligner is used. It quickly yields estimates for foreign genomic material in an experiment. Additionally, It can be used as an alternative aligner to ngshmmalign.

  • MAFFT Bioconda package

    To standardise multiple samples to the same reference genome (say HXB2 for HIV-1), the multiple sequence aligner MAFFT is employed. The multiple sequence alignment helps in determining regions of low conservation and thus makes standardisation of alignments more robust.

  • ShoRAH Bioconda package

    The Short Reads Assembly into Haplotypes (ShoRAH) program for inferring viral haplotypes from NGS data is used to perform local haplotype reconstruction for heterogeneous viral populations by using a Gibbs sampler.

  • LoFreq Bioconda package

    LoFreq (version 2) is SNVs and indels caller from next-generation sequencing data, and can be used as an alternative engine for SNV calling.

  • SAVAGE Bioconda package

    SAVAGE is a tool for viral haplotype reconstruction. It can be executed in two modes: (1) using a reference sequence, or (2) assembling viral haplotypes de novo. We employ the latter.

  • Haploclique Bioconda package

    Viral quasispecies assembly via maximal clique finding is used as another selectable engine for global haplotype reconstruction for heterogeneous viral populations.

  • QuasiRecomb

    QuasiRecomb performs local and global haplotype reconstruction for heterogeneous viral populations by using a hidden Markov model.

  • SmallGenomeUtilities Bioconda package

    We perform genomic liftovers to standardised reference genomes using our in-house developed python library of utilities for rewriting alignments.

  • Samtools Bioconda package

    The Swiss Army knife of alignment postprocessing and diagnostics.

  • picard Bioconda package

    Java tools for working with NGS data in the BAM format