One Decode Path, Multiple Consumers

A live video stream rarely has just one audience inside an application. The obvious consumer is the preview pane the user looks at. Less obvious consumers are often more important: a motion detector, a recorder that keeps event clips, and an in-process plugin that wants to inspect the same pixels without paying for another decode.

That gives you one source and several consumers, all asking for "the same frame" for different reasons.

The naive design is to give every consumer its own decoder. It sounds tidy. Each subsystem stays independent. If one consumer falls behind, the others keep going.

In practice, that design fails in ways that are more serious than wasted CPU. The real problem is that multiple decoders do not produce one shared reality. They produce several slightly different opinions about the same stream.

This article argues for a different center of gravity: one runtime owns decode, assigns frame identity once, and delivers the result to multiple consumers in different shapes.

Per-consumer decode

source --> decoder A --> preview timeline
       --> decoder B --> analytics timeline
       --> decoder C --> recorder timeline
       --> decoder D --> plugin timeline

One decode path

source --> decode runtime --> frame 18,452
                            |--> preview
                            |--> analytics
                            |--> recorder
                            \--> plugin

Why per-consumer decoding is weak

The first objection is the obvious one: decoding the same stream four times is wasteful. A 1080p30 source that feeds a preview, an analytics worker, a recorder, and a plugin does not become four different videos just because four parts of the process care about it.

But raw cost is not the strongest argument. Modern systems can often absorb more decoding work than they should.

The stronger argument is disagreement.

If the preview decoder reconnects after a network hiccup before the detection decoder does, the operator may be looking at frame 18,452 while the worker is still processing frame 18,438. If the worker raises an event, which frame should the UI highlight? Which image should the recorder attach to the event? If one decoder notices a format change a frame earlier than another, are those consumers still talking about the same stream state?

Those are not edge cases. They are what independent decoding naturally creates: slightly different local timelines that now need to be reconciled after the fact.

A shared decode path avoids that entire class of problem. The runtime decides once what the next frame is, and every consumer receives that same decision.

Design	What gets decided independently	Result
Per-consumer decoding	Reconnect timing, frame numbering, transition interpretation	Several local timelines that must be reconciled later
One decode path	Delivery shape only	One shared frame identity with different outputs

One owner, multiple delivery shapes

In the single-decode design, one runtime owns one decoder for one stream. Every decoded frame gets one monotonic sequence number. That sequence number is the frame's identity inside the application.

This sounds mechanical, but it matters. "Show the detection box for frame 18,452" is a coherent request. "Attach the JPEG generated from frame 18,452 to the event emitted for frame 18,452" is coherent too. Without a shared frame identity, those operations become guesswork.

What changes between consumers is not frame identity but delivery shape.

The preview path may want the image in the renderer's native format so it can upload or display it immediately. An in-process analytics module may want a low-overhead CPU-visible buffer. An out-of-process subscriber may want a shared-memory slot plus a small notification message that says "frame 18,452 is ready in slot 7."

Those shapes are different, but they all describe the same decoded frame. That is the key separation:

decode and frame identity are centralized
delivery mechanics are consumer-specific

decode runtime
    |
    +--> frame 18,452 as renderer-native image --> preview
    +--> frame 18,452 as CPU-visible buffer     --> in-process analytics
    \--> frame 18,452 in slot 7 + notification  --> out-of-process subscriber

Once you keep that boundary clean, performance tradeoffs stay where they belong. Some consumers still cost more to serve than others, but they no longer invent their own version of the stream while doing it.

Discontinuities, format changes, and EOF

A stream runtime has to publish more than frames. It also has to publish meaningful transitions: discontinuities, format changes, and end-of-stream.

With per-consumer decoding, each consumer forms its own opinion about those transitions. One side may treat a transport gap as a clean break. Another may stitch across it. One side may rebuild buffers immediately after a resolution change. Another may not notice until the next frame.

With one decode path, the runtime decides once that a discontinuity happened, once that the frame size changed, and once that the stream has ended. Consumers still hear that news in different forms, because a UI widget and a background worker do not need the same callback surface. But they react to one shared event, not several competing interpretations.

That is a recurring theme in this design: the runtime centralizes meaning, then lets each consumer receive that meaning in the form it can use.

runtime event stream

frame 18,452
   ->
DISCONTINUITY
   ->
FORMAT CHANGE: 1280x720 -> 3840x2160
   ->
frame 18,453
   ->
EOF

Oversized frames

This architecture has one important edge case: delivery shapes do not all have the same capacity.

The out-of-process path is often backed by shared memory or another fixed-size transport buffer. That path has a ceiling. If a decoded frame is larger than the transport slot was sized for, the runtime cannot publish it in that shape.

This can happen for mundane reasons. A stream may switch from 1280x720 to 3840x2160. Metadata may have understated the real frame size. An upstream producer may suddenly emit an absurd frame that still technically decodes.

At that point, the system has to choose between two goals that can no longer both hold:

keep all consumers perfectly aligned on every frame
keep the local pipeline moving for consumers that can still accept the frame

Those goals conflict. If the transport consumer cannot accept frame 18,452 but the preview and in-process worker can, there is no way to preserve both global lockstep and forward progress.

The design choice here should be explicit: prefer liveness over perfect cross-consumer alignment at this boundary.

That means the runtime may still deliver frame 18,452 to the UI and the in-process worker while dropping it on the out-of-process transport path. The result is a temporary alignment break across consumers. That break is not a bug in the identity model; it is the cost of choosing continued service over a global stall.

Because this is a deliberate tradeoff, it must be visible. Oversized transport drops need a counter, a warning, and a sequence number trail that makes the gap diagnosable. Otherwise the system appears healthy while one class of consumers is silently missing data.

The alternative is to stall or fail the entire pipeline whenever one delivery shape cannot carry a frame. That preserves lockstep, but it turns one transport limit into a system-wide outage. For a live preview and local analysis path, that is usually the worse failure.

Choice at the oversized-frame boundary	What you preserve	What you give up
Keep local consumers moving	Preview and local analytics stay live	Temporary alignment with the constrained transport consumer
Stall everything for lockstep	Perfect cross-consumer alignment for that frame	Liveness for consumers that could have continued

frame 18,452 arrives

preview path            -> accept
in-process analytics    -> accept
shared-memory transport -> drop and record gap

Rebuilding the backing storage is the related recovery path, but it is not free. Rebuilds force synchronization across subscribers, so the rule is simple: rebuild when required, not preemptively.

Tradeoffs

Consumers are still coupled in time. A shared publish path has a real serialization point. If a consumer does expensive work in the wrong part of that path, it can delay other consumers for the same frame.

Late subscribers join live. This model is about coherent live delivery, not history replay. A subscriber that attaches now starts at the next frame, not at an earlier one.

One runtime becomes a critical authority. That is the point of the design, but it means the runtime must stay disciplined. If decode or publish logic becomes bloated, every consumer pays for it.

Where "one decode" stops being the right answer

The design works best when consumers want broadly the same content: the same pixels, from the same source, at roughly the same time.

It becomes less attractive at three boundaries.

Different output formats. If one consumer needs GPU-native textures, another needs planar YUV, and a third needs a CPU-side RGB buffer, the runtime either emits several representations for every frame or performs consumer-specific conversion work. At some point that stops being one decode path with multiple shapes and starts becoming several pipelines attached to one decoder.

Different rates. A thumbnail generator that wants one frame every five seconds and a tracker that wants every frame of a high-rate stream do not have similar appetites. One shared live path can serve both, but the mismatch may be wasteful enough that the design loses its elegance.

Different logical sources. One runtime per physical source is straightforward. What does not belong inside this layer is a higher-level concept like "the current best camera for this scene" or "whichever feed is active right now." That is orchestration above the decode runtime.

None of those cases automatically justify per-consumer decoding. They do mark the boundary of the idea. A single decode path is the right tool when consumers mainly disagree about how they want a frame delivered, not about what content should exist in the first place.

The real value in one decode path is not just fan-out efficiency. It is shared truth.

One place decides what frame 18,452 is. One place decides whether frame 18,453 represents a discontinuity, a format change, or end-of-stream. Consumers subscribe to different delivery shapes, but they no longer maintain private interpretations of the same video.

That is the architectural win: decode once, define frame identity once, and let delivery vary only where it actually needs to.