Control Plane and Data Plane
This document defines the transport split between control and telemetry paths, plus policy rules for TM-over-IP in the rocket telemetry system.
1. Purpose and Scope
The system carries two fundamentally different traffic classes:
- control traffic: low-rate, stateful, auditable commands that change system behavior
- data traffic: continuous, time-ordered telemetry and health streams
Separating these paths improves safety, observability, and failure isolation during launch operations.
This policy applies to:
- antenna node to central aggregator links
- central aggregator to frontend links
- archival and replay ingest paths
Node telemetry may be raw IQ samples, binary frame bytes, structured JSON telemetry, or health/status events. Source selection is a central-server routing concern and does not change the formats a node emits.
2. Plane Definitions
2.1 Control plane
Control plane includes commands and configuration changes such as:
- radio configuration (frequency, gain, bandwidth)
- pointing and tracking mode control
- frame and payload definition updates
- stream lifecycle control (start, stop, route)
- mission and node mode changes (standby, arm, run, recover)
Control traffic is:
- low bandwidth
- request-response oriented
- idempotency-aware
- fully auditable per operator/session
2.2 Data plane
Data plane includes runtime outputs such as:
- raw IQ samples
- raw binary telemetry frames
- optional decoded telemetry objects or JSON events
- lock and signal quality metrics
- node status and health events
- optional reduced IQ snapshots/captures
Data traffic is:
- continuous and time-series oriented
- fan-out capable (multiple consumers)
- tolerant of heterogeneous payload rates by target
3. Transport Split Policy
3.1 Required transport roles
| Flow | Primary transport | Notes |
|---|---|---|
| Frontend/central control APIs | HTTPS REST or gRPC | Commands, config, and status queries |
| Central to node command channel | HTTPS REST or gRPC | Strong authN/authZ, explicit command IDs |
| Node to central live telemetry | WebSocket or message bus bridge | MVP default for node telemetry streams |
| Node to central high-rate binary TM | UDP or TCP stream profile | Chosen per mission rate and loss tolerance |
| Central to frontend live updates | WebSocket | Subscriptions by mission, node, and target |
| Archive ingest | Object storage + metadata API | Large capture files and replay assets |
3.2 Hard separation requirements
- Control endpoints must not depend on data-plane throughput to remain responsive.
- Data-plane congestion must not block emergency control commands.
- Separate queues/processors should be used for control and data handling.
- Rate limiting for data publishers must not apply to critical control routes.
3.3 Failure isolation
- If data plane degrades, control plane remains available for fallback actions.
- If control plane is unavailable, data plane may continue in last-known-safe mode.
- Plane health must be exposed separately in operator UI and logs.
4. TM-over-IP Policy
Telemetry over IP is the baseline production strategy. Serial links may exist inside a node, but node egress to the platform is IP-native.
4.1 Allowed TM-over-IP profiles
Profile A: Decoded telemetry stream (MVP)
- Transport: WebSocket from node to central
- Payload: structured JSON (or equivalent schema-based object)
- Use: operator displays, mission timeline, alerting, and storage of decoded values
Profile B: Encoded/binary telemetry frames
- Transport: UDP for low-latency best-effort, or TCP when guaranteed delivery is required
- Payload: framed binary records plus minimal metadata envelope
- Use: high-rate feeds, offline decode, and replay-quality capture
4.2 Metadata requirements for every telemetry record
Each telemetry message/frame must include:
- mission_id
- node_id
- source_id or target_id
- stream_type (decoded, encoded, metrics, status)
- monotonic sequence number per stream
- timestamp from node clock
- optional central receive timestamp (added at ingress)
4.3 Ordering and loss policy
- Consumers must treat sequence numbers as ordering authority within a stream.
- Out-of-order packets may be reassembled within a bounded time window.
- Gaps must be detected and reported as loss events.
- The system must avoid silent drop behavior in all production modes.
4.4 Clock and timestamp policy
- Node timestamps remain authoritative for signal-domain timing.
- Central timestamps are authoritative for platform processing audit.
- Both timestamps should be retained where possible.
- Clock authority and drift handling follow the active ADR decision.
5. Reliability and QoS Classes
Define QoS by message class, not by single transport choice.
| Class | Examples | Delivery expectation | Typical transport |
|---|---|---|---|
| C0 Critical control | emergency stop, safe-mode, tracking halt | at-least-once + idempotent command handling | HTTPS/gRPC |
| C1 Control config | radio/frame updates, mode switches | acknowledged request-response | HTTPS/gRPC |
| D0 Live decoded TM | mission telemetry values | low-latency stream, occasional loss acceptable if flagged | WebSocket |
| D1 Binary TM frames | encoded frame feed | mission-configurable (loss-tolerant or loss-intolerant) | UDP or TCP |
| D2 Metrics/health/events | lock metrics, node status | near-real-time, durable logging preferred | WebSocket/message bus |
6. Security and Access Rules
- All external plane traffic should run over TLS in deployment.
- Mutual authentication and certificate lifecycle follow the ADR security decision.
- Control endpoints require role-based authorization and immutable audit logs.
- Data subscriptions must be scoped by mission, role, and least privilege.
- Stream tokens/credentials must be short-lived and revocable.
7. Operational Guidance
7.1 Degraded network behavior
When bandwidth drops or jitter rises:
- preserve control plane first
- reduce optional high-rate data streams
- keep minimal health/status telemetry active
- surface degradation status to operators within one update interval
7.2 Replay and traceability
For incident analysis and validation:
- persist sequence, timestamps, and source identifiers
- store command timeline and resulting telemetry timeline
- enable correlation from operator action to node effect
7.3 Observability requirements
At minimum, expose per-plane:
- throughput and backlog
- end-to-end latency percentiles
- drop/retry rates
- connection/session churn
- last command success/failure and reason
8. Conformance Checklist
An implementation is compliant with this document when it satisfies all items below:
- control and data handlers are logically separated
- emergency and critical control remain available under data load
- all telemetry carries required identity and timing metadata
- loss/out-of-order detection is explicit and observable
- audit trail links command IDs to resulting state changes
- mission operator can see independent health for both planes
Related:
APIs and Data Contracts
This section does not lock the exact schema, but it defines the contract families the implementation must support. Detailed endpoint examples and payload structures are provided to guide implementation.
ADR Master Index
This is the canonical registry for all unresolved and resolved architecture decisions across the rocket telemetry system.