AI工具Score B (65)

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

1 小时前2 viewsSource: HuggingFace Blog

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Published June 23, 2026
Update on GitHub

(This is a guest post by Developer Relations Engineer Thomas Steiner from the Chrome team at Google.)

Transformers.js provides Web developers with a simple way to use the power of transformers in their Web apps through task-specific pipelines. To run inference in the browser, developers create an instance of pipeline() and specify a task they want to use the pipeline for. As a concrete example, the following snippet shows how to set up an automatic speech recognition (ASR) pipeline.

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0';

const asr = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en',
  { device: 'webgpu' },
);
const result = await asr('jfk.wav');
console.log(result);

A minimalistic example of the automatic speech recognition pipeline.

The cache challenge

You will notice in the source code that I specified Xenova/whisper-tiny.en as the model, which is a very decent choice for common English automatic speech recognition tasks. In fact, it's even the default model according to the Transformers.js default model resolution, as per the linked excerpt.

Model resources

When you run this example in the browser, Transformers.js automatically takes care of downloading and caching the relevant model resources and Wasm files. The following screenshot shows the Chrome DevTools Cache storage section after visiting the app. When you reload the page, the resources are served from the Cache API, and the model returns results almost instantly.

The Chrome DevTools Cache storage section showing Whisper AI model resources and Wasm runtime files after visiting the app.

However, Xenova/whisper-tiny.en being a popular model (and, as mentioned before, even being the ASR default model in Transformers.js), you can well imagine that more than just one app that you visit would use it. To simulate this situation, here's the same example app from before, but served from a different origin. When you visit this different origin app, rather than being usable almost instantly, the browser instead has to download and cache all the model resources again, even if they're byte-by-byte the same as before. Even in this toy example, this adds up to 177 MB of duplicate download and storage, as you can examine in the Storage section of the Chrome DevTools Application panel. You can imagine that this quickly adds up.

The Chrome DevTools Storage overview showing 177 MB of used storage.

Wasm runtime resources

But it gets worse. Let's add a second pipeline to the toy example: sentiment analysis. Sentiment analysis by default uses the Xenova/distilbert-base-uncased-finetuned-sst-2-english model. By not specifying the model, Transformers.js' default model resolution automatically picks it for you.

const classifier = await pipeline('sentiment-analysis');
const sentiment = await classifier(result.text);
pre.append('\n\n' + JSON.stringify(sentiment, null, 2));

image

Two entirely different AI models, but they depend on the same 4,733 kB ort-wasm-simd-threaded.asyncify.wasm WebAssembly (Wasm) runtime file from the underlying ONNX Runtime library that Transformers.js is built on top of. Open the extended demo on a different origin, and you will notice in the Network tab how also the Wasm runtime gets downloaded and cached again.

Chrome DevTools Network panel showing the download of the Wasm runtime resource.

So even if you run apps that don't share the same AI models, your browser still makes redundant requests for shared Wasm resources you already have, and on top of that also caches them again, which consumes space on your hard disk.

Cache isolation

AI model resources serving

By default, AI model resources come from the Hugging Face Hub, and ultimately the Hugging Face CDN. The browser makes a request for a resource like https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json which then gets redirected to the final CDN URL like https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22 in this case.

Wasm runtime resources serving

The Wasm runtime resources are served from the jsDelivr CDN by default. For example, ort-wasm-simd-threaded.asyncify.wasm comes from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm at the time of this writing.

Now you may say that if different apps, even though running on different origins, in the end serve their resources from the same CDN URLs, caching shouldn't be a problem, as long as the final URLs are the same. Unfortunately, this is not how caching works in browsers for a long time. The article Gaining security and privacy by partitioning the cache goes into all the details, but essentially, caches are isolated by origin to prevent timing attacks: the time a website takes to respond to HTTP requests can reveal that the browser has accessed the same resource in the past, which makes the browser vulnerable to security and privacy leaks.

Chrome's implementation

The concrete implementation may vary by browser, but in Chrome, cached resources are keyed using a Network Isolation Key in addition to the resource URL. The Network Isolation Key is composed of the top-level site and the current-frame site. Take the previous toy examples hosted on the origins https://googlechrome.github.io and https://rawcdn.rawgit.net. If they both use the Wasm runtime from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm, their cache keys will look like in the following table.

Network Isolation Key Resource URL
Top-level site Current-frame site

https://googlechrome.github.io

https://googlechrome.github.io

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

https://rawcdn.rawgit.net

https://rawcdn.rawgit.net

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm

So even if the resource URLs are exactly the same, since the Network Isolation Keys don't match, there's no cache hit, which means duplicate download and duplicate storage. This is the challenge that the Cross-Origin Storage proposal aims to tackle.

Enter the Cross-Origin Storage API

💡 Note: The Cross-Origin Storage API is an early-stage proposal that isn't final. While the proposed API is not yet natively implemented in any browser, you don't have to wait to experiment with it. Install the Cross-Origin Storage extension to inject the navigator.crossOriginStorage polyfill on all pages and test the complete flow.

The proposed Cross-Origin Storage (COS) API introduces a dedicated navigator.crossOriginStorage interface through which web apps can store and retrieve large files across origin boundaries, identified not by a URL, but by a cryptographic hash.

The Cross-Origin Storage API logo: a stylized walking person, as typically encountered on crosswalk signs.

That last point about cryptographic hashes is key. Because COS identifies files by their hash rather than by their URL or origin, the same ort-wasm-simd-threaded.asyncify.wasm Wasm runtime you downloaded while visiting https://googlechrome.github.io is recognized as identical to the one https://rawcdn.rawgit.net is about to request, no matter where either of the two origins fetched it from. See the following code snippet that illustrates the basic flow.

const hash = {
  algorithm: 'SHA-256',
  value: '8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4',
};

try {
  const handle = await navigator.crossOriginStorage.requestFileHandle(hash);
  
  const fileBlob = await handle.getFile();
} catch (err) {
  
  const fileBlob = await fetch('https://cdn.jsdelivr.net/.../ort-wasm-simd-threaded.asyncify.wasm')
    .then(r => r.blob());
  const handle = await navigator.crossOriginStorage.requestFileHandle(
    hash,
    { create: true, origins: '*' },
  );
  const writableStream = await handle.createWritable();
  await writableStream.write(fileBlob);
  await writableStream.close();  
}

If the resource is in COS, you get back a FileSystemFileHandle from which you can read the blob directly via getFile() (the resulting File inherits from Blob). If the resource is not in COS, you fall back to the network, and write the resource into COS for the next app that needs it, which could be your app, or another unrelated app, potentially on a completely different origin.

The API is deliberately shaped after the File System Standard's FileSystemDirectoryHandle.getFileHandle() you likely are familiar with from the Origin Private File System (OPFS) API. The hash parameter plays the same role as the name parameter in OPFS: uniquely identifying a resource. The options.create flag works the same way: absent or false for read-only access, true when you intend to write.

Control who can read what

Not every resource should be globally shared. COS gives developers precise control over visibility through the origins option when storing a file.

  • Setting origins: '*' makes a file globally available. Any origin can find it by hash. This is the right choice for AI model resources or the Wasm runtime in the Transformers.js example: the whole point is that every app on the Web benefits from a single cached copy.
  • Passing a specific list of origins, like origins: ['https://write.example.com', 'https://calculate.example.com'], restricts access to those sites. This works well for proprietary resources shared across a company's own properties that shouldn't be discoverable by anyone else, like a proprietary proofreading AI model used in a commercial office suite.
  • Omitting origins entirely makes the file available only to same-site origins. This is a sensible default for resources shared across all of an organization's subdomains, but not intended to cross organizational boundaries.

One important rule: visibility can be upgraded but never downgraded. If a file is already globally available, a later attempt to store it with a restricted origins list is silently ignored. This prevents a malicious actor from re-storing a public resource and narrowing its availability. The reverse is possible: a file initially stored with a restricted origins list can later be made more permissive. Any site, not just the original storer, can call requestFileHandle() for the same hash (hashes are not a secret) with create: true and a broader origins value, and given the browser verifies the hash matches, the resource becomes available to the wider audience from that point on. Note that the upgrading site must still write the full file through the returned handle. This requirement exists to prevent sites from exploiting the upgrade path as a side-channel to detect whether a particular file was already stored in COS.

Integrity by design

A subtle but important property of COS is that the browser verifies the hash when you write a file. If the data you write doesn't match the declared hash, the write fails with an error. This makes integrity checking automatic: an app reading a file from COS can be confident it's getting exactly the bytes it expected. The same guarantee it would have had if it had computed the hash itself after a network download.

This turns out to be doubly useful in the Transformers.js scenario. Today, after downloading model weights, most apps have no practical way to verify that the CDN served the right bytes. With COS, every file in the store is implicitly verified on write, no matter where it came from, the official Hugging Face CDN or a random site's self-hosted mirror.

Privacy without sacrificing utility

Of course a cross-origin shared cache raises the same question as the partitioned HTTP cache in reverse: if any site can probe for the presence of a file by hash, couldn't an attacker learn something about the user's browsing history by checking whether, say, a game engine Wasm module is cached?

COS addresses this through two complementary mechanisms:

  • First, the origins field: proprietary resources that shouldn't be globally probeable simply shouldn't be stored with origins: '*', which, through developer education, developers are encouraged to consider whenever it makes sense.
  • Second, availability gating: even for globally declared files, the browser may suppress confirmation of a file's presence if it hasn't been encountered across a sufficient number of distinct origins. A file that only appears on one or two sites could still serve as a cross-site identifier, so the browser may return an error as if the file weren't there at all, regardless of what's physically on disk. On the Chrome team, we are conscious of the possible privacy leaks uncommon resources could cause and plan generally to mitigate it through restricting which exact resources can be cached. The concrete mitigations are still being fleshed out.

Crucially, this means an error is not a definitive answer. It might mean "not stored", or it might mean "stored, but the browser isn't telling you". Apps should always handle it the same way: fall back to the network.

What this means for the Transformers.js example

Going back to the toy examples from before: the ort-wasm-simd-threaded.asyncify.wasm runtime weighs in at 4,733 kB and is shared by every Transformers.js-powered app regardless of which AI model it uses. With COS, the first app to load it downloads it once and stores it under its SHA-256 hash with origins: '*'. Every subsequent app, whether on https://googlechrome.github.io, on https://rawcdn.rawgit.net, or any other origin, finds it in COS immediately. The 177 MB of duplicate Whisper model weights? Same story: Xenova/whisper-tiny.en gets downloaded once, recognized by hash the second time around, and served from COS in milliseconds. And of course, the same also happens for Xenova/distilbert-base-uncased-finetuned-sst-2-english.

Transformers.js itself is already piloting the COS API at the library level. Pull request #1549 introduced an experimental COS cache backend behind an opt-in flag. Enabling it takes a single line before you set up your pipeline:

import { env, pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0";


env.experimental_useCrossOriginStorage = true;

const asr = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' });
const result = await asr('jfk.wav');
console.log(result);

With that flag set, Transformers.js resolves the SHA-256 hash for each Xet-tracked model file (the large ONNX weight files) by fetching the raw Xet pointer (example raw pointer file) and extracting its oid sha256: field. It then uses that hash as the key for navigator.crossOriginStorage. If the model is already in COS (because another site stored it there first), it's served instantly without a network round-trip. If not, it falls back to a regular download and stores the result in COS for the next caller. With the toy example, the advantage in practice is that Xenova/whisper-tiny.en and Xenova/distilbert-base-uncased-finetuned-sst-2-english (and of course ort-wasm-simd-threaded.asyncify.wasm) only ever need to cross the ether once, regardless of how many different origins ask for them.

Note the experimental_ prefix on the flag. It's intentional and signals that the underlying browser API has not yet been standardized and may change without a major version bump.

Try it today

The COS API is not yet natively implemented in any browser, but you don't have to wait to experiment with it. Install the Cross-Origin Storage ex