Polaris
Inference at the edge.
Defined what 'fast enough' meant for a real-time inspection rig — then made it stay that way.
✺ — The problem
Polaris runs inspection cameras on manufacturing lines. Their cloud-hosted model was too slow for the conveyor speed. Sending every frame to a GPU farm in another region was costing more than the savings the model was supposed to capture.
Sector
Industrial IoT
Year
2024
Duration
18 weeks
Team
1 Principal · 2 Engineers · 1 SRE
Stack
✺ — Approach
The same arc as every engagement — tuned to this problem.
Define · The latency budget
We sat next to a working line for two days. The frame budget was 84ms per inspection, end-to-end — round trip, model, response. Anything slower meant rejected parts piled up. That number became the design constraint, not a target.
Build · Quantized, on-prem, falls back to cloud
INT8-quantized model running on a ruggedized edge box at every line. A graceful fall-through to cloud inference when the edge is uncertain — so the line never stops, but cloud cost only kicks in for the hard cases.
Operate · Observability everywhere
Every edge node exports latency, confidence histograms, and drift metrics. A misbehaving line surfaces in dashboards before a foreman notices the reject rate. New model versions roll out behind a per-line flag.
✺ — Outcome
Three numbers we’d defend in public.
62ms
median end-to-end inspection latency
−71%
monthly inference cost vs. cloud-only
$0
downtime on rollout — six lines, one weekend
“Most studios would have sold us a bigger GPU bill. They asked what our latency budget actually was, then shipped something a foreman could rack and forget.”
Director of Engineering, Polaris