Recovery Procedures¶

srSILO API Recovery¶

Restart API with Latest Index¶

# Replace <virus> with covid, rsva, etc.
cd /opt/srsilo/<virus>/config
docker compose down
docker compose up -d

Restart with Specific Index¶

To use an older index, update the symlink that points to the active output:

cd /opt/srsilo/<virus>
# List available indexes
ls -la output/
# The compose file mounts 'output/' which should symlink to the active index

List Available Indexes¶

# Each virus has its own output directory
ls -la /opt/srsilo/covid/output/
ls -la /opt/srsilo/rsva/output/

Verify API Health¶

curl http://localhost:8083/sample/info  # COVID
curl http://localhost:8084/sample/info  # RSV-A

Clean Failed Run Artifacts¶

If a pipeline run failed mid-way (replace <virus> with covid, rsva, etc.):

# Remove temporary processing files for a specific virus
sudo rm -rf /opt/srsilo/<virus>/sorted_chunks/*
sudo rm -rf /opt/srsilo/<virus>/tmp/*

# Remove in-progress marker for that virus
sudo rm /opt/srsilo/<virus>/output/.preprocessing_in_progress

Manual Pipeline Run¶

After cleanup, re-run the pipeline:

ansible-playbook playbooks/srsilo/update-pipeline.yml -i inventory.ini --become --ask-become-pass

Loculus Recovery¶

Check Pod Status¶

kubectl get pods -A | grep loculus

Restart Pods¶

kubectl rollout restart deployment/<deployment-name> -n <namespace>

View Pod Logs¶

kubectl logs <pod-name> -n <namespace> -f

Monitoring Recovery¶

Restart Services¶

sudo systemctl restart prometheus
sudo systemctl restart grafana-server

Check Service Status¶

systemctl status prometheus
systemctl status grafana-server
journalctl -u prometheus -n 50
journalctl -u grafana-server -n 50

Quick Reference¶

All Manual Restart Commands¶

# SILO/LAPIS (per-virus; run from the appropriate config directory)
cd /opt/srsilo/covid/config && docker compose up -d   # COVID
cd /opt/srsilo/rsva/config && docker compose up -d    # RSV-A

# V-Pipe Scout (if deployed)
cd /opt/v-pipe-scout && docker compose up -d

# Check status
kubectl get pods -A  # Loculus
docker ps -a         # All containers

Full System Check¶

# Docker containers
docker ps -a

# Kubernetes
kubectl get pods -A

# Systemd services
systemctl status srsilo-update.timer
systemctl status prometheus
systemctl status grafana-server

# API endpoints
curl -s http://localhost:8083/sample/info
curl -s http://localhost:8084/sample/info

Recovery Procedures¶

srSILO API Recovery¶

Restart API with Latest Index¶

Restart with Specific Index¶

List Available Indexes¶

Verify API Health¶

Clean Failed Run Artifacts¶

Manual Pipeline Run¶

Loculus Recovery¶

Check Pod Status¶

Restart Pods¶

View Pod Logs¶

Monitoring Recovery¶

Restart Services¶

Check Service Status¶

Quick Reference¶

All Manual Restart Commands¶

Full System Check¶

See Also¶