FocsleFocsle
All posts
Benchmark report

Cross-chip benchmark report: Q3 2028

Three transformers caught up to YOLO. Hailo-10 dethroned Coral on the low-power tier. Thor justified its watts on VLA. The full numbers, and the surprises.

Fo’c’sle
Fo’c’sle·Apr 18, 2028·12 min read

Every quarter we re-run the entire matrix — 35 reference models against 17 production edge silicon platforms, in our HIL lab in Tel Aviv and our partner labs in Munich and Pittsburgh. Q3 was the one where the cross-vendor leaderboards stopped looking like the Q2 leaderboards. Three transformer-class detectors finally crossed the latency threshold that puts them inside the YOLO band on transformer-friendly NPUs. The Hailo-10H, in its first full quarter shipping, broke the long Coral monopoly on the sub-3-watt tier. And on the heavy end, Jetson Thor justified the jump in TDP for the first time on a VLA workload that wasn't a marketing demo.

The biggest movement on the leaderboard this quarter was on the transformer-class detector tier. RT-DETR-Edge — until two quarters ago a curiosity that mostly existed because someone, somewhere, wanted to write 'transformers in DETR detectors' on a slide — finally put up numbers that compete with YOLO27-Edge on the chips that suit it. On the Jetson Orin Nano, Hailo-10H, and Apple Neural Engine M4, the gap between RT-DETR-Edge and YOLO27-Edge is now under 8% on latency, with RT-DETR holding a measurable advantage on accuracy retention at INT8.

On legacy NPUs the picture is unchanged: YOLO27-Edge wins on every Coral, Movidius, and 4-TOPS-class part by margins that aren't going to close — the architecture choice is wrong for those targets, and no amount of compiler tuning fixes that. Practitioners shipping into the long-tail of edge hardware should keep YOLO as their default and only consider RT-DETR-Edge once they're confident their target is in the transformer-friendly tier.

The Hailo-10H, in its first full quarter shipping, broke the long Coral Edge TPU monopoly on the sub-3-watt tier. For four years running, the answer to 'what runs object detection inside two watts of power envelope' was 'a Coral, and you'll like it.' That answer changed last quarter. The Hailo-10H delivers 27 FPS sustained on YOLO26-Nano at 2.8 W package power — head-to-head with the Coral on power, but with 4× the model headroom and a meaningfully faster path to non-INT8 quantization.

On the heavy end, Jetson Thor justified its TDP for the first time on a workload that wasn't a marketing demo. π0-Distilled, the OSS distillation of π0 that Physical Intelligence released this quarter, runs at 22.6 ms p50 on Thor — well inside the 30 Hz target for high-frequency manipulation policies, and with enough activation memory headroom to stack a second perception model without paging. The same workload on the AGX Orin clocks at 38.2 ms; on QCS8550 it's not supported. For VLA-heavy stacks, the Thor jump is now the right call.

We'll publish the full per-chip retrospectives over the next two weeks: Hailo-10H first, then Thor, then a roundup on the long-tail tier. Subscribe to the changelog if you want them in your inbox. The full data is, as always, in the matrix on each model's page.

Fo’c’sle
Written by Fo’c’sle — published on the Focsle changelog.