Control Plane and Data Plane

This document defines the transport split between control and telemetry paths, plus policy rules for TM-over-IP in the rocket telemetry system.

1. Purpose and Scope

The system carries two fundamentally different traffic classes:

control traffic: low-rate, stateful, auditable commands that change system behavior
data traffic: continuous, time-ordered telemetry and health streams

Separating these paths improves safety, observability, and failure isolation during launch operations.

This policy applies to:

antenna node to central aggregator links
central aggregator to frontend links
archival and replay ingest paths

Node telemetry may be raw IQ samples, binary frame bytes, structured JSON telemetry, or health/status events. Source selection is a central-server routing concern and does not change the formats a node emits.

2. Plane Definitions

2.1 Control plane

Control plane includes commands and configuration changes such as:

radio configuration (frequency, gain, bandwidth)
pointing and tracking mode control
frame and payload definition updates
stream lifecycle control (start, stop, route)
mission and node mode changes (standby, arm, run, recover)

Control traffic is:

low bandwidth
request-response oriented
idempotency-aware
fully auditable per operator/session

2.2 Data plane

Data plane includes runtime outputs such as:

raw IQ samples
raw binary telemetry frames
optional decoded telemetry objects or JSON events
lock and signal quality metrics
node status and health events
optional reduced IQ snapshots/captures

Data traffic is:

continuous and time-series oriented
fan-out capable (multiple consumers)
tolerant of heterogeneous payload rates by target

3. Transport Split Policy

3.1 Required transport roles

Flow	Primary transport	Notes
Frontend/central control APIs	HTTPS REST or gRPC	Commands, config, and status queries
Central to node command channel	HTTPS REST or gRPC	Strong authN/authZ, explicit command IDs
Node to central live telemetry	WebSocket or message bus bridge	MVP default for node telemetry streams
Node to central high-rate binary TM	UDP or TCP stream profile	Chosen per mission rate and loss tolerance
Central to frontend live updates	WebSocket	Subscriptions by mission, node, and target
Archive ingest	Object storage + metadata API	Large capture files and replay assets

3.2 Hard separation requirements

Control endpoints must not depend on data-plane throughput to remain responsive.
Data-plane congestion must not block emergency control commands.
Separate queues/processors should be used for control and data handling.
Rate limiting for data publishers must not apply to critical control routes.

3.3 Failure isolation

If data plane degrades, control plane remains available for fallback actions.
If control plane is unavailable, data plane may continue in last-known-safe mode.
Plane health must be exposed separately in operator UI and logs.

4. TM-over-IP Policy

Telemetry over IP is the baseline production strategy. Serial links may exist inside a node, but node egress to the platform is IP-native.

4.1 Allowed TM-over-IP profiles

Profile A: Decoded telemetry stream (MVP)

Transport: WebSocket from node to central
Payload: structured JSON (or equivalent schema-based object)
Use: operator displays, mission timeline, alerting, and storage of decoded values

Profile B: Encoded/binary telemetry frames

Transport: UDP for low-latency best-effort, or TCP when guaranteed delivery is required
Payload: framed binary records plus minimal metadata envelope
Use: high-rate feeds, offline decode, and replay-quality capture

4.2 Metadata requirements for every telemetry record

Each telemetry message/frame must include:

mission_id
node_id
source_id or target_id
stream_type (decoded, encoded, metrics, status)
monotonic sequence number per stream
timestamp from node clock
optional central receive timestamp (added at ingress)

4.3 Ordering and loss policy

Consumers must treat sequence numbers as ordering authority within a stream.
Out-of-order packets may be reassembled within a bounded time window.
Gaps must be detected and reported as loss events.
The system must avoid silent drop behavior in all production modes.

4.4 Clock and timestamp policy

Node timestamps remain authoritative for signal-domain timing.
Central timestamps are authoritative for platform processing audit.
Both timestamps should be retained where possible.
Clock authority and drift handling follow the active ADR decision.

5. Reliability and QoS Classes

Define QoS by message class, not by single transport choice.

Class	Examples	Delivery expectation	Typical transport
C0 Critical control	emergency stop, safe-mode, tracking halt	at-least-once + idempotent command handling	HTTPS/gRPC
C1 Control config	radio/frame updates, mode switches	acknowledged request-response	HTTPS/gRPC
D0 Live decoded TM	mission telemetry values	low-latency stream, occasional loss acceptable if flagged	WebSocket
D1 Binary TM frames	encoded frame feed	mission-configurable (loss-tolerant or loss-intolerant)	UDP or TCP
D2 Metrics/health/events	lock metrics, node status	near-real-time, durable logging preferred	WebSocket/message bus

6. Security and Access Rules

All external plane traffic should run over TLS in deployment.
Mutual authentication and certificate lifecycle follow the ADR security decision.
Control endpoints require role-based authorization and immutable audit logs.
Data subscriptions must be scoped by mission, role, and least privilege.
Stream tokens/credentials must be short-lived and revocable.

7. Operational Guidance

7.1 Degraded network behavior

When bandwidth drops or jitter rises:

preserve control plane first
reduce optional high-rate data streams
keep minimal health/status telemetry active
surface degradation status to operators within one update interval

7.2 Replay and traceability

For incident analysis and validation:

persist sequence, timestamps, and source identifiers
store command timeline and resulting telemetry timeline
enable correlation from operator action to node effect

7.3 Observability requirements

At minimum, expose per-plane:

throughput and backlog
end-to-end latency percentiles
drop/retry rates
connection/session churn
last command success/failure and reason

8. Conformance Checklist

An implementation is compliant with this document when it satisfies all items below:

control and data handlers are logically separated
emergency and critical control remain available under data load
all telemetry carries required identity and timing metadata
loss/out-of-order detection is explicit and observable
audit trail links command IDs to resulting state changes
mission operator can see independent health for both planes

Related:

Control Plane and Data Plane

On this page