Rocket Telemetry Project Docs
Shared

Control Plane and Data Plane

This document defines the transport split between control and telemetry paths, plus policy rules for TM-over-IP in the rocket telemetry system.

1. Purpose and Scope

The system carries two fundamentally different traffic classes:

  • control traffic: low-rate, stateful, auditable commands that change system behavior
  • data traffic: continuous, time-ordered telemetry and health streams

Separating these paths improves safety, observability, and failure isolation during launch operations.

This policy applies to:

  • antenna node to central aggregator links
  • central aggregator to frontend links
  • archival and replay ingest paths

Node telemetry may be raw IQ samples, binary frame bytes, structured JSON telemetry, or health/status events. Source selection is a central-server routing concern and does not change the formats a node emits.


2. Plane Definitions

2.1 Control plane

Control plane includes commands and configuration changes such as:

  • radio configuration (frequency, gain, bandwidth)
  • pointing and tracking mode control
  • frame and payload definition updates
  • stream lifecycle control (start, stop, route)
  • mission and node mode changes (standby, arm, run, recover)

Control traffic is:

  • low bandwidth
  • request-response oriented
  • idempotency-aware
  • fully auditable per operator/session

2.2 Data plane

Data plane includes runtime outputs such as:

  • raw IQ samples
  • raw binary telemetry frames
  • optional decoded telemetry objects or JSON events
  • lock and signal quality metrics
  • node status and health events
  • optional reduced IQ snapshots/captures

Data traffic is:

  • continuous and time-series oriented
  • fan-out capable (multiple consumers)
  • tolerant of heterogeneous payload rates by target

3. Transport Split Policy

3.1 Required transport roles

FlowPrimary transportNotes
Frontend/central control APIsHTTPS REST or gRPCCommands, config, and status queries
Central to node command channelHTTPS REST or gRPCStrong authN/authZ, explicit command IDs
Node to central live telemetryWebSocket or message bus bridgeMVP default for node telemetry streams
Node to central high-rate binary TMUDP or TCP stream profileChosen per mission rate and loss tolerance
Central to frontend live updatesWebSocketSubscriptions by mission, node, and target
Archive ingestObject storage + metadata APILarge capture files and replay assets

3.2 Hard separation requirements

  • Control endpoints must not depend on data-plane throughput to remain responsive.
  • Data-plane congestion must not block emergency control commands.
  • Separate queues/processors should be used for control and data handling.
  • Rate limiting for data publishers must not apply to critical control routes.

3.3 Failure isolation

  • If data plane degrades, control plane remains available for fallback actions.
  • If control plane is unavailable, data plane may continue in last-known-safe mode.
  • Plane health must be exposed separately in operator UI and logs.

4. TM-over-IP Policy

Telemetry over IP is the baseline production strategy. Serial links may exist inside a node, but node egress to the platform is IP-native.

4.1 Allowed TM-over-IP profiles

Profile A: Decoded telemetry stream (MVP)

  • Transport: WebSocket from node to central
  • Payload: structured JSON (or equivalent schema-based object)
  • Use: operator displays, mission timeline, alerting, and storage of decoded values

Profile B: Encoded/binary telemetry frames

  • Transport: UDP for low-latency best-effort, or TCP when guaranteed delivery is required
  • Payload: framed binary records plus minimal metadata envelope
  • Use: high-rate feeds, offline decode, and replay-quality capture

4.2 Metadata requirements for every telemetry record

Each telemetry message/frame must include:

  • mission_id
  • node_id
  • source_id or target_id
  • stream_type (decoded, encoded, metrics, status)
  • monotonic sequence number per stream
  • timestamp from node clock
  • optional central receive timestamp (added at ingress)

4.3 Ordering and loss policy

  • Consumers must treat sequence numbers as ordering authority within a stream.
  • Out-of-order packets may be reassembled within a bounded time window.
  • Gaps must be detected and reported as loss events.
  • The system must avoid silent drop behavior in all production modes.

4.4 Clock and timestamp policy

  • Node timestamps remain authoritative for signal-domain timing.
  • Central timestamps are authoritative for platform processing audit.
  • Both timestamps should be retained where possible.
  • Clock authority and drift handling follow the active ADR decision.

5. Reliability and QoS Classes

Define QoS by message class, not by single transport choice.

ClassExamplesDelivery expectationTypical transport
C0 Critical controlemergency stop, safe-mode, tracking haltat-least-once + idempotent command handlingHTTPS/gRPC
C1 Control configradio/frame updates, mode switchesacknowledged request-responseHTTPS/gRPC
D0 Live decoded TMmission telemetry valueslow-latency stream, occasional loss acceptable if flaggedWebSocket
D1 Binary TM framesencoded frame feedmission-configurable (loss-tolerant or loss-intolerant)UDP or TCP
D2 Metrics/health/eventslock metrics, node statusnear-real-time, durable logging preferredWebSocket/message bus

6. Security and Access Rules

  • All external plane traffic should run over TLS in deployment.
  • Mutual authentication and certificate lifecycle follow the ADR security decision.
  • Control endpoints require role-based authorization and immutable audit logs.
  • Data subscriptions must be scoped by mission, role, and least privilege.
  • Stream tokens/credentials must be short-lived and revocable.

7. Operational Guidance

7.1 Degraded network behavior

When bandwidth drops or jitter rises:

  • preserve control plane first
  • reduce optional high-rate data streams
  • keep minimal health/status telemetry active
  • surface degradation status to operators within one update interval

7.2 Replay and traceability

For incident analysis and validation:

  • persist sequence, timestamps, and source identifiers
  • store command timeline and resulting telemetry timeline
  • enable correlation from operator action to node effect

7.3 Observability requirements

At minimum, expose per-plane:

  • throughput and backlog
  • end-to-end latency percentiles
  • drop/retry rates
  • connection/session churn
  • last command success/failure and reason

8. Conformance Checklist

An implementation is compliant with this document when it satisfies all items below:

  • control and data handlers are logically separated
  • emergency and critical control remain available under data load
  • all telemetry carries required identity and timing metadata
  • loss/out-of-order detection is explicit and observable
  • audit trail links command IDs to resulting state changes
  • mission operator can see independent health for both planes

Related:

On this page