Live video punishes every shortcut. A delay you could ignore in a request/response API shows up here as a stutter that thousands of people feel at once. The work is mostly about protecting a latency budget you decided on up front.
The latency budget
Pick a glass-to-glass number and treat it as a contract. Ours was under two seconds. Every component — ingest, transcode, delivery, player — gets a slice of that budget, and any change that overruns its slice is a regression, full stop.
If you cannot say where the milliseconds go, you cannot say why the stream is slow.
Independent scaling
Ingest, transcode, and delivery have wildly different load profiles. Coupling them means scaling all three for the worst case of any one. Splitting them let transcode ride an autoscaling pool while delivery leaned on a CDN edge that got cheaper per viewer as the audience grew.
What broke first
It is never the part you fortified. For us it was connection churn during traffic spikes — thundering reconnects that the control plane could not absorb. The fix was boring: backoff, jitter, and capacity headroom you only appreciate at 2am.
