Streaming responses without losing your UX

Streaming looks simple from the outside: tokens arrive, you append them, the user reads. The first time a partial JSON chunk arrives mid-render, or a code block opens but the closing fence shows up four seconds later, you understand why teams treat streaming as a UX problem rather than a transport problem.

The places streaming gets ugly

Markdown rendering during streaming is the single most common foot-gun. Halfway-rendered headers shift layout, half-open code fences spill formatting into surrounding text, and bold/italic markers that haven’t closed yet render literally. The fix is not to stream raw — it’s to buffer until safe boundaries (line break, closing fence, end of structural element) and flush in semantic units. Users notice this even when they cannot articulate why one stream feels smooth and another doesn’t.

Backpressure and abandonment

The other failure mode is the user navigating away mid-stream. Without an explicit abort, the model keeps generating and your costs keep climbing. Wire AbortController through every layer — fetch, the proxy, the model API — and verify abort actually propagates by reading the cost dashboard, not the code.

Streaming is the rare feature where the engineering and the design have to be solved together. Splitting them produces a stream that ships and a UX that doesn’t.

Streaming responses without losing your UX

The places streaming gets ugly

Backpressure and abandonment

Tags :

Share :

Related Posts

Wiring an SDK call into a Tailwind front-end