RT-DETR-Edge

22%

by Baidu Research

Real-time DETR optimized for transformer-friendly NPUs. Wins on transformer-class silicon, struggles on legacy NPUs.

Object detectionApache-2.0INT8FP16transformercocodetection

92K downloads 5.4K deploymentsUpdated Apr 2, 2028

Headline:18.6ms · NVIDIA Jetson Orin Nano · FP16

arXiv

Overview Benchmarks6 Sim Results Deploy7 Files Discussion23

About this model

Real-time DETR optimized for transformer-friendly NPUs. Wins on transformer-class silicon, struggles on legacy NPUs.

Authored by baidu-research. Curated into the Fo’c’sle reference set on 2028-04-02. All cross-chip benchmarks below were collected in matched-pair runs in the HIL lab using the same input pipeline, same upstream preprocessing, and the same downstream consumer. See the methodology page for the full protocol.

Task: Object detection
Parameters: 21.8 M
Benchmarked on: 6 chips
Deployments: 5.4K

Architecture

Detection backbone + neck + head

Inferred from upstream weights · simplified

Headline benchmarks

NNVIDIA Jetson ThorFP16

6.12ms p50

163 FPS99.9% acc102.3 W

NNVIDIA Jetson AGX OrinFP16

9.49ms p50

105 FPS99.8% acc40.8 W

QQualcomm QCS8550INT8

16.1ms p50

62 FPS98.2% acc13.1 W

Training data

Pretrained on the upstream maintainer’s released checkpoint. Edge-distillation pass uses 2.4M frames from the Fo’c’sle distillation corpus (consented public data + opt-in publisher contributions). Quantization-aware fine-tune uses 320K calibration samples drawn from the target task’s eval domain.

Pretraining corpus: upstream maintainer release
Distillation corpus: 2,400,000 frames
Calibration set: 320,000 samples (per task)
Eval set: standard benchmark + matched-pair HIL runs