SAM-3-Distilled

28%

by Fo’c’sle

Distilled SAM-3 for VLA stacks that need on-device segmentation prompts. 4× lighter than the reference.

SegmentationFocsle-ResearchINT8FP16samdistilledvla-companion

84K downloads 2.4K deploymentsUpdated Apr 19, 2028

Headline:14.2ms · NVIDIA Jetson Thor · MIXED

Overview Benchmarks5 Sim Results Deploy5 Files Discussion23

About this model

Distilled SAM-3 for VLA stacks that need on-device segmentation prompts. 4× lighter than the reference.

Authored by focsle. Curated into the Fo’c’sle reference set on 2028-04-19. All cross-chip benchmarks below were collected in matched-pair runs in the HIL lab using the same input pipeline, same upstream preprocessing, and the same downstream consumer. See the methodology page for the full protocol.

Task: Segmentation
Parameters: 64.8 M
Benchmarked on: 5 chips
Deployments: 2.4K

Architecture

Mask transformer architecture

Inferred from upstream weights · simplified

Headline benchmarks

NNVIDIA Jetson ThorFP16

13.1ms p50

77 FPS99.5% acc96.4 W

NNVIDIA Jetson AGX OrinFP16

21.9ms p50

46 FPS99.7% acc49.7 W

QSnapdragon 8 Gen 3 NPUINT8

33.2ms p50

30 FPS98.0% acc7.0 W

Training data

Pretrained on the upstream maintainer’s released checkpoint. Edge-distillation pass uses 2.4M frames from the Fo’c’sle distillation corpus (consented public data + opt-in publisher contributions). Quantization-aware fine-tune uses 320K calibration samples drawn from the target task’s eval domain.

Pretraining corpus: upstream maintainer release
Distillation corpus: 2,400,000 frames
Calibration set: 320,000 samples (per task)
Eval set: standard benchmark + matched-pair HIL runs