Perceived Latency Is a Path-Design Problem, Not a Model-Speed Problem

Back to writing2026-06-105 min read

AILatencySystem design

The setup

IdeaSense AI runs early-stage founders through a staged DVF assessment: structured conversation, an explicit confirmation between stages, scoring, and a generated report. Each stage transition confirms what has been captured before moving on.

The symptom

The market-to-tech confirmation (POST /api/v1/assessments/market/confirm) took around 96 seconds before the user saw any response. That is long enough to read as broken rather than thinking.

The wrong instinct

The reflex is that the model is slow, so pick a faster one. The system already routes across multiple providers — DeepSeek and Qwen as primaries, OpenAI as a fallback — so swapping would have been trivial. That is exactly the trap: it treats a structural problem as a procurement one, and the next slow step comes right back.

What was actually on the critical path

Before returning, the confirmation was carrying post-confirm enrichment the user did not need to wait on: DVF scoring, stage-level verification, per-question QA digests, and project-description enrichment, plus a next-question rewrite. The user only needed the confirmation itself. Everything else was being computed on the critical path for no reason.

The fix

The change was to move that enrichment off the synchronous turn. The confirmation now does only the thin visible path — validate the input, persist the confirmed assessment, advance the stage, insert the next stage's first message, enqueue a finalize job, and commit — then returns as soon as the user-facing decision is ready. The target for the visible path is single-digit seconds.

Scoring, verification, QA digests, and description enrichment now run in a background task (stage_finalize_v0). The next-question rewrite simply no longer sits on the confirm path.

Why this is not “model speed does not matter”

The honest version, because an engineer will catch a lazy one: most of those 96 seconds was model time — a chain of enrichment calls — so a faster model genuinely would have cut a lot. That is precisely why it is a trap.

The point is not that model speed is irrelevant. It is that this enrichment should never have been on the user's critical path. Even if every call were instant, chaining that much work into one confirmation is the wrong path design. Move it to the background and the confirmation is fast regardless of how fast or slow the model is. Model choice optimizes each segment; it does not remove the structural problem of making the user wait on results they do not need.

The principle

When an AI product feels slow, ask what the user is actually waiting on before asking which model is fastest. Most perceived latency in these systems is a path-design decision — what runs synchronously versus what can settle in the background — not a model-selection one.

It is the same bias I bring to accuracy: the system does not claim to know what is true; it keeps judgments as traceable evidence and stays honest about uncertainty. Design the path; do not oversell the model.

Contact

Based in Auckland and open to early-career full-stack, IT systems, health-tech, AI workflow, and data/analytics roles. I am most effective in roles where delivery requires both implementation detail and operational context.

Get in touch