Technical Deep Dives¶

2026-05-06 — Model sourcing, backend semantics, browser compatibility

CPU (pure JS) vs WASM — genuinely different in TF.js¶

TensorFlow.js has two CPU-targeting backends that are not aliases:

Backend	Execution	Performance
CPU	Pure JavaScript TF ops (interpreted)	Slowest — no compilation, no threading
WASM	Compiled C++ TF ops via WebAssembly + SharedArrayBuffer threads	Significantly faster — multi-threaded C++ runtime

This is one of the most interesting benchmark dimensions: the perf delta between interpreted JS and compiled WASM with threading. ONNX Runtime Web does not offer a pure-JS CPU path — its cpu option is an alias for wasm.

WebNN — emerging NPU backend (ONNX only)¶

Discovered during review of the ONNX Runtime Web official docs: ORT supports webnn as an execution provider, usable for both GPU and CPU processing via deviceType.

Browser	Platform	WebNN Backend
Chrome 113+ / Edge	Windows	DirectML (GPU via `MLDeviceType.gpu`)
Chrome / Edge	macOS	CoreML / ANE (Neural Engine via `MLDeviceType.cpu`)
Chrome / Edge	Linux	TBD — limited support
Safari / Firefox	All	Not supported

WebNN routes inference through the OS-level ML stack — DirectML on Windows, CoreML/ANE on macOS. This is conceptually closer to an NPU backend than a traditional GPU backend, since it uses dedicated ML accelerators where available. Browser support is limited but growing. We include it in the backend matrix as ONNX-only with a navigator.ml feature check gate.

Why include Transformers.js when it wraps ONNX Runtime Web?¶

Transformers.js v4 uses ONNX Runtime Web as its inference engine. Benchmarking both could be seen as comparing ORT against itself. However, Transformers.js adds meaningful differences:

API level: Pipeline abstraction (pipeline("image-classification", model)) vs raw session+tensor management in bare ORT
Ecosystem: HuggingFace model hub integration, auto-download, auto-tokenization
Model variants: Uses onnx-community quantized models which may differ from the raw ONNX models we use with bare ORT
Developer experience: Represents the "ease of use" end of the spectrum — relevant to the thesis Integration criterion

The pipeline overhead for a single image classification forward pass is negligible, so performance should be nearly identical to bare ORT. Any difference would come from model quantization differences, not framework overhead.

ONNX model sourcing: GitHub LFS → HuggingFace CDN¶

The ONNX Runtime Web adapter was completely broken — ONNX_MODEL_URL = null with the comment "GitHub LFS doesn't work via CDN." GitHub serves LFS-tracked binary files as pointer files (~130 bytes containing an OID hash) rather than the actual model weights when accessed via jsDelivr or raw URLs.

Solution: Host the model on HuggingFace Hub. HF serves the actual binary files (not LFS pointers) via their /resolve/main/ CDN path with proper CORS headers:

https://huggingface.co/Xenova/mobilenet-v2/resolve/main/onnx/model_quantized.onnx

~3.4 MB quantized model. This is the same CDN infrastructure Transformers.js already uses internally. The ONNX adapter will read model URLs from the future ModelsProvider registry rather than hardcoding them.

ModelsProvider — centralizing model distribution¶

Currently each adapter hardcodes model URLs from different sources (jsDelivr, HuggingFace, Google Storage). We plan to create a single ModelsProvider registry that centralizes:

Model metadata: name, description, input size, supported tasks
Per-runtime format URLs: one entry per (model, runtime) pair pointing to the appropriate CDN
Fallback logic: quantized variants, alternative CDN mirrors

Two CDN sources will cover all current needs: TF Hub (via jsDelivr for TF.js models) and HuggingFace Hub (for ONNX and Transformers.js models). MediaPipe remains the exception — EfficientNet-Lite0 is hosted on Google's dedicated storage bucket.

Browser compatibility: SharedArrayBuffer & COOP/COEP¶

ONNX Runtime Web's WASM backend and MediaPipe Tasks require SharedArrayBuffer, which in turn requires specific HTTP response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Without these headers, SharedArrayBuffer is undefined and WASM backends cannot use multi-threading. In local development (opening index.html directly), these headers are absent. The prototype needs either a simple dev server or a runtime detection check with a clear error message. This is a deployment concern, not a code bug — it affects localhost and production hosting equally.