Skip to content

sr2silo

Convert BAM nucleotide alignments to cleartext alignments for LAPIS-SILO

sr2silo processes short-read nucleotide alignments from .bam files, translates and aligns reads in amino acids, and outputs JSON compatible with LAPIS-SILO v0.8.0+.

Features

  • BAM to JSON conversion: Transform CIGAR alignments to cleartext format
  • Amino acid translation: Translate and align reads using Diamond/BlastX
  • Insertion/deletion handling: Gracefully extract and represent indels
  • Multi-organism support: Process COVID-19, RSV-A, and other pathogens
  • Loculus integration: Direct submission to Loculus backend

Quick Start

# Install from Bioconda
conda install -c bioconda sr2silo

# Process BAM data
sr2silo process-from-vpipe \
    --input-file input.bam \
    --sample-id SAMPLE_001 \
    --timeline-file timeline.tsv \
    --organism covid \
    --output-fp output.ndjson.zst

# Submit to Loculus
sr2silo submit-to-loculus \
    --processed-file output.ndjson.zst

Input/Output

Input: BAM/SAM file with CIGAR alignments:

294 163 NC_045512.2 79  60  31S220M =   197 400 CTCTTGTAGAT FGGGHHHHLMM ...

Output: Newline-delimited JSON (.ndjson.zst) compatible with LAPIS-SILO v0.8.0+:

{
  "readId": "AV233803:AV044:2411515907:1:10805:5199:3294",
  "sampleId": "A1_05_2024_10_08",
  "batchId": "20241024_2411515907",
  "samplingDate": "2024-10-08",
  "locationName": "Lugano (TI)",
  "locationCode": "5",
  "sr2siloVersion": "1.3.0",
  "main": {
    "sequence": "CGGTTTCGTCCGTGTTGCAGCCG...GTGTCAACATCTTAAAGATGGCACTTGTG",
    "insertions": ["10:ACTG", "456:TACG"],
    "offset": 4545
  },
  "S": {
    "sequence": "MESLVPGFNEKTHVQLSLPVLQVRVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV",
    "insertions": ["23:A", "145:KLM"],
    "offset": 78
  },
  "ORF1a": {
    "sequence": "XXXMESLVPGFNEKTHVQLSLPVLQVRVRGFGDSVEEVLSEARQHLKDGTCGLV",
    "insertions": ["2323:TG", "2389:CA"],
    "offset": 678
  },
  "E": null,
  "M": null,
  "N": null
}

See Data Schemas for details on the output format and how to customize it.

Documentation