V-JEPA-AC-Edge

22%

by Meta FAIR-Edge

Action-conditioned V-JEPA for world-model planning at the edge.

Vision-Language-ActionMITINT8FP16world-modelplanning

23K downloads 480 deploymentsUpdated Mar 11, 2028

Headline:18.4ms · NVIDIA Jetson Thor · MIXED

Overview Benchmarks3 Sim Results Deploy4 Files Discussion23

About this model

Action-conditioned V-JEPA for world-model planning at the edge.

Authored by meta-fair-edge. Curated into the Fo’c’sle reference set on 2028-03-11. All cross-chip benchmarks below were collected in matched-pair runs in the HIL lab using the same input pipeline, same upstream preprocessing, and the same downstream consumer. See the methodology page for the full protocol.

Task: Vision-Language-Action
Parameters: 1.8 B
Benchmarked on: 3 chips
Deployments: 480

Architecture

Vision-Language-Action policy

Inferred from upstream weights · simplified

Headline benchmarks

NNVIDIA Jetson ThorFP16

38.5ms p50

467 tok/s99.7% acc94.2 W

NNVIDIA Jetson AGX OrinFP16

65.6ms p50

275 tok/s99.6% acc49.2 W

QQualcomm QCS8550INT8

176.4ms p50

102 tok/s97.7% acc13.9 W

Training data

Pretrained on the upstream maintainer’s released checkpoint. Edge-distillation pass uses 2.4M frames from the Fo’c’sle distillation corpus (consented public data + opt-in publisher contributions). Quantization-aware fine-tune uses 320K calibration samples drawn from the target task’s eval domain.

Pretraining corpus: upstream maintainer release
Distillation corpus: 2,400,000 frames
Calibration set: 320,000 samples (per task)
Eval set: standard benchmark + matched-pair HIL runs