Rocket Telemetry Project Docs

Node Graceful Shutdown and Recovery

Migrated from Original Docs/Node/Node-ShutdownRecovery.md

Node Graceful Shutdown and Recovery

1. Graceful Shutdown Sequence

When asked to shut down via POST /shutdown or systemd systemctl stop node:

  1. FastAPI publishes {event: "shutdown", grace_period_seconds: 5} to control bus
  2. Telemetry router immediately stops writing to output port 9002
  3. DSP pipeline completes current frame processing and seals data streams
  4. Antenna control parks beam (el=0, az=0) and de-energizes motors if motorized
  5. After 5 seconds, all daemons exit cleanly
  6. Systemd logs node-001 shutdown complete

Systemd unit template:

[Unit]
Description=Antenna Node SDR Service
After=network.target

[Service]
Type=simple
User=sdr
ExecStart=/usr/bin/python3 /opt/node/main.py
ExecStop=/bin/bash -c 'curl http://127.0.0.1:8080/shutdown'
TimeoutStopSec=10
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

2. Crash Recovery

If a subsystem crashes:

  1. Supervisor (systemd or custom watchdog) detects missing process
  2. Alert logged to /var/log/node.log
  3. Automatic restart via Restart=on-failure
  4. Status = subsystem not_responding until recovery
  5. Central should alert operator if critical subsystem missing >30 seconds

Related: See Node-Testing-Validation for validation procedures.

On this page