Write Once, Render at Many Zoom Levels

Live time-series plots usually start life with a comforting illusion: just keep appending samples to a ring, and let the renderer decide how much detail to draw.

That works for a while. Then the feed runs for hours, the user zooms out, and every pan or wheel gesture suddenly means "scan a huge chunk of history, decide which points matter at this zoom, build a reduced view, and do it again on the next mouse move." The data did not change. Only the camera changed. But the application keeps paying as if the entire signal needed to be rediscovered on every frame.

This article is about a different arrangement: do the aggregation when the data is written, store a small pyramid of progressively coarser rings, and let the renderer pick the level that already matches the current zoom.

Why full resolution at the frame rate does not work

When a plot becomes sluggish, it is tempting to blame the GPU. Most of the time that is not the real problem.

Drawing a few thousand or even a few hundred thousand segments is often manageable. The more expensive part is the CPU work that happens before the draw call: walk the visible slice of the ring, inspect each sample, decide whether it survives the current zoom level, and build the vertex data to send onward.

That cost repeats every time the view changes. If the user drags the plot left by twenty pixels, the renderer does not just move an existing picture; it often rescans the same history and recomputes a new reduction. On a feed with millions of samples, that turns a simple interaction into a full data pass at interactive rates.

The core mismatch is easy to miss. Rendering is frame-rate work. Aggregation is data-rate work. If you force aggregation into the renderer, you are paying data-rate cost at frame-rate frequency.

Concern	Natural cadence	Bad placement
Rendering	every frame or view change	waiting on fresh aggregation work
Aggregation	when new data arrives	repeating on every pan and zoom

Why read-time aggregation is worse than it looks

The standard rescue attempt is renderer-side decimation. Look at the visible window, bucket samples by screen pixel or time slice, keep one or two representatives per bucket, and send only that reduced set to the GPU.

On static data, that can be perfectly reasonable. On a live feed, it gets expensive in exactly the moments when the user is interacting.

The problem is that the bucket layout belongs to the current view, not to the data itself. A trace that is bucketed into 1200 screen columns at one zoom may need 600 columns after a zoom-out, or 2400 after a zoom-in. Pan by half a screen and most of those buckets change again. There is no stable, reusable structure behind them.

Consider a concrete case. Suppose one hour of data contains 3.6 million samples. The plot is 1600 pixels wide. A renderer-side decimator may reduce those 3.6 million samples to roughly 1600 visual buckets, which sounds cheap, but it still has to inspect the original window to know what belongs in each bucket. If the user pans five times in a second, the application can end up rescanning millions of samples five times in a second just to present five slightly different views of mostly the same history.

That is why renderer-side decimation hurts. It is not that the reduced result is wrong. It is that the reduction work lives on the hottest possible path: the path that should be free to repaint immediately.

Why write-time is the right moment

The writer already sees the data at the one moment when incremental aggregation is natural: when a new sample arrives.

At that point, you do not need to rediscover structure in old data. You only need to decide how the new sample affects the currently open aggregate bucket. If that bucket is now complete, you emit one coarser sample to the next level. If that in turn completes a bucket at the next level, the process repeats.

That is the idea behind an LOD ring: not one ring, but a small stack of rings at different resolutions, all built on the write path. In our case the concrete implementation drives the live strategy-telemetry plots in Lumis.

Level 0 stores raw samples.
Level 1 stores one combined sample for every fixed group from level 0.
Level 2 stores one combined sample for every fixed group from level 1.
Higher levels continue the pattern until the coarsest level covers long spans cheaply.

If the subdivision factor is two, four levels give you 1x, 1/2x, 1/4x, and 1/8x density. If it is four, the pyramid gets coarse much faster. The exact geometry is configurable. The useful part is the ownership: aggregation happens once, when the writer already has the sample in hand.

The write and read surfaces can stay small:

lod_writer writer(config, combiner);

writer.write(samples);
writer.publish();
writer.flush_partial();

The combiner is where domain knowledge lives. For a waveform it might keep min/max or first/last values for a bucket. For another feed it might average, sum, or merge interval state.

On the read side:

lod_reader reader(writer.descriptor());

auto snapshot = reader.try_snapshot(level);
if (snapshot) {
    for (const auto* p = snapshot.begin(); p != snapshot.end(); ++p) {
        // consume p
    }
}

The renderer does not ask "how do I decimate this hour of data?" It asks "which level already matches the density I need?"

The shape is easier to reason about when you draw it as a small pyramid:

Level 3  [ 8 samples per entry ]   o----o----o
Level 2  [ 4 samples per entry ]   o--o--o--o--o--o
Level 1  [ 2 samples per entry ]   o-o-o-o-o-o-o-o-o-o-o-o
Level 0  [ 1 sample  per entry ]   oooooooooooooooooooooooo
          write once  ---------->  promote only when a bucket closes

The renderer then picks a level based on visual density instead of rediscovering that density from scratch.

What the writer actually does

The write path sounds more expensive than it usually is.

Writing to level 0 is still just an append into a ring. Each upper level keeps a small pending group. As raw samples arrive, the writer updates that group's aggregate. Once the group reaches its configured size, the writer emits one combined sample into the next level and starts a new group.

Take a simple example with subdivision by four:

Samples s0 through s3 arrive at level 0.
The writer combines them into one level-1 sample, say A0.
The next four raw samples become A1.
When four level-1 samples exist, they combine into one level-2 sample.

The cost cascades upward, but only occasionally. Most writes touch the lowest level plus a small amount of pending state. Higher levels wake up only when enough lower-level data has accumulated. Over a geometric pyramid, that gives an amortized constant write cost per sample. That is exactly the trade you want: do bounded work once when data arrives, not repeated work every time the viewport wiggles.

For the subdivision-by-four example, the flow looks like this:

raw input:   s0  s1  s2  s3  s4  s5  s6  s7  ...
level 0:     [s0][s1][s2][s3][s4][s5][s6][s7]
level 1:         [   A0   ]    [   A1   ]
level 2:                 [        B0        ]

That gives a clean split of responsibilities:

Part	Responsibility
Writer	append raw samples, maintain open buckets, publish completed aggregates
Reader	acquire a stable snapshot of one level, then iterate what already exists
Renderer	choose the level that matches the current zoom and turn it into draw data

Two details matter here.

Upper levels usually need less capacity than lower ones. If level 1 stores one sample for every four samples at level 0, it can cover the same time span with one quarter of the entries. The memory overhead is real, but it is not a naive duplicate of the raw feed.

And combining is not the same thing as splitting. Imagine a telemetry feed that pauses for ten minutes and then resumes. You may not want a single aggregate bucket to quietly span that gap. A splitter can force a boundary before the combiner runs. For example: "start a new bucket whenever the timestamp gap exceeds 500 milliseconds." That keeps the combiner focused on reduction logic and keeps gap policy explicit.

A boundary worth naming

LOD rings are excellent for one class of query: "show me this history window at an appropriate visual density."

They are not automatically excellent for every other query a plot may need.

Vertical auto-range is the cleanest example. Suppose the user is looking at five minutes of data and the UI wants the exact minimum and maximum over that visible span. A coarser LOD level may be perfect for drawing the line quickly, but if its combiner collapsed several raw extremes into a single representative sample, it may no longer answer "what was the true min and max?" exactly.

That does not mean the LOD design failed. It means auto-ranging is a different query with different information requirements. It may need a separate summary structure, or a reader-side block index over raw data, or a hybrid strategy that scans only the edges of a moving window. The important point is conceptual clarity: LOD rings solve the render-density problem. They should not be stretched into every other aggregation problem by force.

Another way to frame the boundary:

Query	Best source
"What density should I draw at this zoom?"	an LOD level
"What is the exact min and max in this visible span?"	a structure designed for exact range queries
"Where should a long data gap break continuity?"	explicit split policy on the write path

Snapshot semantics matter more than the pyramid

The pyramid is the visible idea. The snapshot is the part that makes it safe in a live system.

When a reader asks for data, it needs a stable view of one chosen level at one moment in time. That view must stay valid for as long as the renderer holds it. If the reader starts iterating a range and the writer overwrites or moves that memory halfway through, the crash will show up inside rendering code even though the real bug is in data lifetime.

That is why snapshot semantics matter so much. A snapshot is not just "some pointers into the ring." It is a promise: this range refers to a specific published sequence, and the underlying storage will remain safe to read until the snapshot is released. In a cross-process setup, that usually means some form of reader lease rather than raw pointers alone.

Imagine a renderer on one thread and a live writer on another. The renderer asks for level 2 because the user is zoomed far out. It gets a snapshot covering samples 5100 through 5411 in that level and spends the next few milliseconds converting them into draw data. While that happens, new incoming samples keep arriving at level 0 and may trigger new aggregates at upper levels. That is fine, as long as the writer knows it may not reclaim or invalidate the region pinned by the active snapshot.

The snapshot also has to be cheap. If taking a snapshot means copying large chunks of the ring into a temporary buffer, the system has merely moved the cost around. The whole point is to let readers hold a stable view without cloning large histories.

The live interaction is roughly:

writer thread                 reader / renderer thread
-------------                 ------------------------
publish level 2  --------->   acquire snapshot(level 2)
keep writing level 0       |  build draw data from pinned range
maybe promote new buckets  |  present frame
reuse old storage only
after snapshots release

The failure mode with a name is eviction. If a reader falls so far behind that the writer must reuse the space the snapshot depends on, acquisition can fail and the reader has to recover. That should be treated as a sizing or scheduling problem, not as routine behavior.

Limits

An LOD pyramid is not a universal summary format. It answers one kind of question very well: "which version of this signal should I draw at this zoom?" It does not answer arbitrary statistical queries unless the chosen combiner was designed for them.

The combiner itself has to compose cleanly. Some reductions survive repeated aggregation. Others become misleading when aggregated again at higher levels. If the combiner is a bad match for the signal, the LOD structure will faithfully preserve that bad choice.

Partial flushes are also real semantics, not a bookkeeping detail. If the writer publishes a partially filled bucket because the stream is paused or the caller requests a flush, that bucket is now part of the observable history. Later samples do not get to retroactively pretend it was still open.

And ring memory still costs money. The pyramid can be sized intelligently, and upper levels are cheaper than raw storage, but a wide multichannel feed can still add up quickly. Not every feed needs multiple levels.

None of that weakens the central point. If a live plot depends on renderer-side decimation to remain interactive, it has put the most expensive reduction work in the worst possible place. An LOD ring moves that work to write time, where it happens once, incrementally, and can then be reused at every zoom level the UI needs.