Skip to content

Technical Deep Dives

2026-05-06 — Model sourcing, backend semantics, browser compatibility


CPU (pure JS) vs WASM — genuinely different in TF.js

TensorFlow.js has two CPU-targeting backends that are not aliases:

Backend Execution Performance
CPU Pure JavaScript TF ops (interpreted) Slowest — no compilation, no threading
WASM Compiled C++ TF ops via WebAssembly + SharedArrayBuffer threads Significantly faster — multi-threaded C++ runtime

This is one of the most interesting benchmark dimensions: the perf delta between interpreted JS and compiled WASM with threading. ONNX Runtime Web does not offer a pure-JS CPU path — its cpu option is an alias for wasm.


WebNN — emerging NPU backend (ONNX only)

Discovered during review of the ONNX Runtime Web official docs: ORT supports webnn as an execution provider, usable for both GPU and CPU processing via deviceType.

Browser Platform WebNN Backend
Chrome 113+ / Edge Windows DirectML (GPU via MLDeviceType.gpu)
Chrome / Edge macOS CoreML / ANE (Neural Engine via MLDeviceType.cpu)
Chrome / Edge Linux TBD — limited support
Safari / Firefox All Not supported

WebNN routes inference through the OS-level ML stack — DirectML on Windows, CoreML/ANE on macOS. This is conceptually closer to an NPU backend than a traditional GPU backend, since it uses dedicated ML accelerators where available. Browser support is limited but growing. We include it in the backend matrix as ONNX-only with a navigator.ml feature check gate.


Why include Transformers.js when it wraps ONNX Runtime Web?

Transformers.js v4 uses ONNX Runtime Web as its inference engine. Benchmarking both could be seen as comparing ORT against itself. However, Transformers.js adds meaningful differences:

  • API level: Pipeline abstraction (pipeline("image-classification", model)) vs raw session+tensor management in bare ORT
  • Ecosystem: HuggingFace model hub integration, auto-download, auto-tokenization
  • Model variants: Uses onnx-community quantized models which may differ from the raw ONNX models we use with bare ORT
  • Developer experience: Represents the "ease of use" end of the spectrum — relevant to the thesis Integration criterion

The pipeline overhead for a single image classification forward pass is negligible, so performance should be nearly identical to bare ORT. Any difference would come from model quantization differences, not framework overhead.


ONNX model sourcing: GitHub LFS → HuggingFace CDN

The ONNX Runtime Web adapter was completely broken — ONNX_MODEL_URL = null with the comment "GitHub LFS doesn't work via CDN." GitHub serves LFS-tracked binary files as pointer files (~130 bytes containing an OID hash) rather than the actual model weights when accessed via jsDelivr or raw URLs.

Solution: Host the model on HuggingFace Hub. HF serves the actual binary files (not LFS pointers) via their /resolve/main/ CDN path with proper CORS headers:

https://huggingface.co/Xenova/mobilenet-v2/resolve/main/onnx/model_quantized.onnx

~3.4 MB quantized model. This is the same CDN infrastructure Transformers.js already uses internally. The ONNX adapter will read model URLs from the future ModelsProvider registry rather than hardcoding them.


ModelsProvider — centralizing model distribution

Currently each adapter hardcodes model URLs from different sources (jsDelivr, HuggingFace, Google Storage). We plan to create a single ModelsProvider registry that centralizes:

  • Model metadata: name, description, input size, supported tasks
  • Per-runtime format URLs: one entry per (model, runtime) pair pointing to the appropriate CDN
  • Fallback logic: quantized variants, alternative CDN mirrors

Two CDN sources will cover all current needs: TF Hub (via jsDelivr for TF.js models) and HuggingFace Hub (for ONNX and Transformers.js models). MediaPipe remains the exception — EfficientNet-Lite0 is hosted on Google's dedicated storage bucket.


Browser compatibility: SharedArrayBuffer & COOP/COEP

ONNX Runtime Web's WASM backend and MediaPipe Tasks require SharedArrayBuffer, which in turn requires specific HTTP response headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Without these headers, SharedArrayBuffer is undefined and WASM backends cannot use multi-threading. In local development (opening index.html directly), these headers are absent. The prototype needs either a simple dev server or a runtime detection check with a clear error message. This is a deployment concern, not a code bug — it affects localhost and production hosting equally.