Technical Deep Dives¶
2026-05-06 — Model sourcing, backend semantics, browser compatibility
CPU (pure JS) vs WASM — genuinely different in TF.js¶
TensorFlow.js has two CPU-targeting backends that are not aliases:
| Backend | Execution | Performance |
|---|---|---|
| CPU | Pure JavaScript TF ops (interpreted) | Slowest — no compilation, no threading |
| WASM | Compiled C++ TF ops via WebAssembly + SharedArrayBuffer threads | Significantly faster — multi-threaded C++ runtime |
This is one of the most interesting benchmark dimensions: the perf delta between interpreted JS and compiled WASM with threading. ONNX Runtime Web does not offer a pure-JS CPU path — its cpu option is an alias for wasm.
WebNN — emerging NPU backend (ONNX only)¶
Discovered during review of the ONNX Runtime Web official docs: ORT supports webnn as an execution provider, usable for both GPU and CPU processing via deviceType.
| Browser | Platform | WebNN Backend |
|---|---|---|
| Chrome 113+ / Edge | Windows | DirectML (GPU via MLDeviceType.gpu) |
| Chrome / Edge | macOS | CoreML / ANE (Neural Engine via MLDeviceType.cpu) |
| Chrome / Edge | Linux | TBD — limited support |
| Safari / Firefox | All | Not supported |
WebNN routes inference through the OS-level ML stack — DirectML on Windows, CoreML/ANE on macOS. This is conceptually closer to an NPU backend than a traditional GPU backend, since it uses dedicated ML accelerators where available. Browser support is limited but growing. We include it in the backend matrix as ONNX-only with a navigator.ml feature check gate.
Why include Transformers.js when it wraps ONNX Runtime Web?¶
Transformers.js v4 uses ONNX Runtime Web as its inference engine. Benchmarking both could be seen as comparing ORT against itself. However, Transformers.js adds meaningful differences:
- API level: Pipeline abstraction (
pipeline("image-classification", model)) vs raw session+tensor management in bare ORT - Ecosystem: HuggingFace model hub integration, auto-download, auto-tokenization
- Model variants: Uses
onnx-communityquantized models which may differ from the raw ONNX models we use with bare ORT - Developer experience: Represents the "ease of use" end of the spectrum — relevant to the thesis Integration criterion
The pipeline overhead for a single image classification forward pass is negligible, so performance should be nearly identical to bare ORT. Any difference would come from model quantization differences, not framework overhead.
ONNX model sourcing: GitHub LFS → HuggingFace CDN¶
The ONNX Runtime Web adapter was completely broken — ONNX_MODEL_URL = null with the comment "GitHub LFS doesn't work via CDN." GitHub serves LFS-tracked binary files as pointer files (~130 bytes containing an OID hash) rather than the actual model weights when accessed via jsDelivr or raw URLs.
Solution: Host the model on HuggingFace Hub. HF serves the actual binary files (not LFS pointers) via their /resolve/main/ CDN path with proper CORS headers:
~3.4 MB quantized model. This is the same CDN infrastructure Transformers.js already uses internally. The ONNX adapter will read model URLs from the future ModelsProvider registry rather than hardcoding them.
ModelsProvider — centralizing model distribution¶
Currently each adapter hardcodes model URLs from different sources (jsDelivr, HuggingFace, Google Storage). We plan to create a single ModelsProvider registry that centralizes:
- Model metadata: name, description, input size, supported tasks
- Per-runtime format URLs: one entry per (model, runtime) pair pointing to the appropriate CDN
- Fallback logic: quantized variants, alternative CDN mirrors
Two CDN sources will cover all current needs: TF Hub (via jsDelivr for TF.js models) and HuggingFace Hub (for ONNX and Transformers.js models). MediaPipe remains the exception — EfficientNet-Lite0 is hosted on Google's dedicated storage bucket.
Browser compatibility: SharedArrayBuffer & COOP/COEP¶
ONNX Runtime Web's WASM backend and MediaPipe Tasks require SharedArrayBuffer, which in turn requires specific HTTP response headers:
Without these headers, SharedArrayBuffer is undefined and WASM backends cannot use multi-threading. In local development (opening index.html directly), these headers are absent. The prototype needs either a simple dev server or a runtime detection check with a clear error message. This is a deployment concern, not a code bug — it affects localhost and production hosting equally.