Multi-Organism Support
sr2silo supports processing samples from multiple organisms with organism-specific reference sequences.
Supported Organisms
| Organism | Identifier | Description |
|---|---|---|
| COVID-19 | covid |
SARS-CoV-2 / Severe acute respiratory syndrome coronavirus 2 |
| RSV-A | rsva |
Respiratory Syncytial Virus A |
Reference Resolution
sr2silo resolves reference sequences in the following priority order:
- Local References (fastest):
resources/references/{organism}/ - LAPIS Instance: Fetched from specified
--lapis-urlif available - Fallback: Local references as final fallback if LAPIS fetch fails
This allows for both static local references and dynamic references from a LAPIS instance.
Usage
For detailed usage, see:
The --organism parameter specifies which organism to process. Can also be set via ORGANISM environment variable (CLI argument takes precedence).
Adding New Organisms
To add support for a new organism:
- Add reference files to
resources/references/{organism_id}/: nuc_ref.fasta- Nucleotide reference sequence(s)-
aa_ref.fasta- Amino acid reference sequences for gene annotations -
Use GenBank parser (if starting from GenBank format):
-
Update configuration (optional):
- Update workflow
config.yamlwith your new organism ID -
Update documentation with organism details
-
Test (optional):
- Add test fixtures in
tests/conftest.py - Add parameterized tests in
tests/test_main.py
Workflow Integration
Specify organism in workflow/config.yaml:
Troubleshooting
Reference files not found:
- Verify files exist: ls resources/references/{organism}/
- Check organism identifier spelling
- Use --lapis-url to fetch from LAPIS
- Generate from GenBank: python scripts/extract_gbk_references.py --help