Node Graceful Shutdown and Recovery
Migrated from Original Docs/Node/Node-ShutdownRecovery.md
Node Graceful Shutdown and Recovery
1. Graceful Shutdown Sequence
When asked to shut down via POST /shutdown or systemd systemctl stop node:
- FastAPI publishes
{event: "shutdown", grace_period_seconds: 5}to control bus - Telemetry router immediately stops writing to output port 9002
- DSP pipeline completes current frame processing and seals data streams
- Antenna control parks beam (el=0, az=0) and de-energizes motors if motorized
- After 5 seconds, all daemons exit cleanly
- Systemd logs
node-001 shutdown complete
Systemd unit template:
[Unit]
Description=Antenna Node SDR Service
After=network.target
[Service]
Type=simple
User=sdr
ExecStart=/usr/bin/python3 /opt/node/main.py
ExecStop=/bin/bash -c 'curl http://127.0.0.1:8080/shutdown'
TimeoutStopSec=10
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target2. Crash Recovery
If a subsystem crashes:
- Supervisor (systemd or custom watchdog) detects missing process
- Alert logged to
/var/log/node.log - Automatic restart via
Restart=on-failure - Status = subsystem not_responding until recovery
- Central should alert operator if critical subsystem missing >30 seconds
Related: See Node-Testing-Validation for validation procedures.