Whisper-Edge-Tiny

Edge fork of Whisper-Tiny with NPU-friendly attention rewriting. INT4 variant runs on the Pi alone.

Embedded ASRMITINT8INT4asrenglishtiny

412K downloads 25K deploymentsUpdated Mar 18, 2028

Headline:220ms · Raspberry Pi 5 + Hailo HAT · INT8

Overview Benchmarks8 Sim Results Deploy8 Files Discussion23

About this model

Edge fork of Whisper-Tiny with NPU-friendly attention rewriting. INT4 variant runs on the Pi alone.

Authored by openai-edge. Curated into the Fo’c’sle reference set on 2028-03-18. All cross-chip benchmarks below were collected in matched-pair runs in the HIL lab using the same input pipeline, same upstream preprocessing, and the same downstream consumer. See the methodology page for the full protocol.

Task: Embedded ASR
Parameters: 39 M
Benchmarked on: 8 chips
Deployments: 25K

Architecture

Streaming encoder + decoder

Inferred from upstream weights · simplified

Headline benchmarks

Apple Neural Engine (M4)INT8

203.5ms p50

31 tok/s97.7% acc6.8 W

QSnapdragon 8 Gen 3 NPUINT8

216.0ms p50

30 tok/s97.6% acc6.8 W

QQualcomm QCS8550INT8

221.7ms p50

29 tok/s97.3% acc13.3 W

Training data

Pretrained on the upstream maintainer’s released checkpoint. Edge-distillation pass uses 2.4M frames from the Fo’c’sle distillation corpus (consented public data + opt-in publisher contributions). Quantization-aware fine-tune uses 320K calibration samples drawn from the target task’s eval domain.

Pretraining corpus: upstream maintainer release
Distillation corpus: 2,400,000 frames
Calibration set: 320,000 samples (per task)
Eval set: standard benchmark + matched-pair HIL runs