[HN Gopher] Transformers.js - Run Transformers directly in the ...
___________________________________________________________________
Transformers.js - Run Transformers directly in the browser
Author : victormustar
Score : 213 points
Date : 2024-04-11 11:57 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jfoster wrote:
| This is super cool, but unfortunately it also seems super
| impractical. Models tend to be quite large, so even if a browser
| can run them, getting them to the browser involves either:
|
| 1. Large downloads on every visit to a website.
|
| 2. Large downloads and high storage consumption for each website
| using large models. (150 websites x 800 MB models => 120 GB of
| storage used)
|
| Both of those options seem terrible.
|
| I think it might make sense for browsers to ship with some models
| built in and be exposed via standardized web APIs in the future,
| but I haven't heard of any efforts to make that happen yet.
| jsheard wrote:
| Basically the same problem that's plagued games on the web ever
| since the first Unreal/Unity asmjs demos a decade ago, and
| pretty much no progress has been made towards a solution in
| that time. You just can't practically make a web app which
| needs gigs of data on the client because there's no reliable
| way to make sure it stays cached for as long as the user wants
| it to, and as you say, even if you could reliably cache it the
| download and storage would still be duplicated per site using
| the same model due to browsers cache partitioning policies.
| jampekka wrote:
| There are ways to do it. E.g.
| https://developer.mozilla.org/en-
| US/docs/Web/API/File_System...
| fauigerzigerk wrote:
| The File System Access API seems promising.
| jsheard wrote:
| I'm not sure more APIs are the solution, LocalStorage could
| already theoretically fill the role of a persistent large
| data store if browsers didn't cap the storage at 5-10MB for
| UX reasons. Removing that cap would require user facing
| changes to allow them to manage the storage used by sites
| and clean it up it manually when it inevitably gets
| bloated. Any new API which lets sites save stuff on the
| client is going to have the same issue.
| fauigerzigerk wrote:
| _> Any new API which lets sites save stuff on the client
| is going to have the same issue._
|
| I don't think it would have the same issues, because the
| files could be stored in a user specified location
| outside the browser's own storage area.
|
| Browser vendors can't just delete stuff that may be used
| by other software on a user's system. And they cannot put
| a cap on it either, because users can store whatever they
| like in those directories, bypassing the browser
| entirely.
|
| But I have never used this API, so maybe I misunderstand
| how it's supposed to work.
| jsheard wrote:
| If that's how it works then it would avoid the problem I
| mentioned, but the UX around using that to cache data
| internal to the site implementation sounds pretty
| terrible. You click on "Listen to this article" on a
| webpage and it opens a file chooser expecting you to open
| t2s-hq-fast-en+pl.model if you already have it? Users
| won't be able to make any sense of that.
| fauigerzigerk wrote:
| The API (or at least the Chrome implementation) appears
| to be unfinished, but the plan seems to be to eventually
| support persistent directory permissions.
|
| So the web app could ask the user to pick a directory for
| model storage and henceforth store and load models from
| there without further interaction.
| binarymax wrote:
| Been beating this drum for years. Wrote this 9 years ago!
| https://max.io/articles/the-state-of-state-in-the-browser/
| pjmlp wrote:
| Actually more like the first Java Applets, Flash, and
| initially Unreal targeted Flash on the Web, and PNaCL, before
| asmjs came to be.
|
| The Unreal 3 demo using Flash is still on YouTube.
|
| And this is why most game studios are playing wait-and-see
| with streaming instead, proper native 3D APIs, with easier to
| debug tooling (Web still hasn't anything better than
| SpectorJS), and big size assets.
| wesbos wrote:
| Some of the models are quite small and worth doing on-device vs
| the opposite of sending all the data to the server to process.
| The other huge benefit here is that transformers run in node.js
| and getting things running is way easier than trying to get
| some odd combination of python snd its dependencies to work
| CapsAdmin wrote:
| If they are single files or directories they could be drag
| dropped on use. Not very convenient though.
|
| Maybe just some sort of api to give the website fine grained
| access to the filesystem might be enough. You'd specify a
| directory or single file the website can read from at any time.
|
| However at some point you will have to download large files. I
| feel when done implicitly it's bad user experience.
|
| On top of that the developer should implement a robust
| downloading system that can resume downloads, check for
| validity, etc. Developers rarerly bother with this, so the user
| experience is that it sucks.
| jampekka wrote:
| There is such API.
|
| https://developer.mozilla.org/en-
| US/docs/Web/API/File_System...
| elpocko wrote:
| Still requires drag&drop on most browsers because the
| File/DirectoryPicker API isn't universally supported.
| jampekka wrote:
| Origin private file system is supported in all modern
| browsers. That does make sharing the models between
| origins difficult at best, but for one origin works fine.
|
| And in any case it's easier to direct users to install
| Chrome (or preferably Chromium) or instruct drag&drop
| than doing the brittle and error and bitrot prone
| virtualenv-pip-docker-git song and dance.
| refulgentis wrote:
| This is pure free-association: models are below 80 MB, the rest
| are LLMs and aren't in scope. Whisper is 40 MB, embeddings are
| 23 MB. (n.b. parts of original comment that actively disclaim
| understanding: " _seems_ super impractical. Models _tend_ to be
| quite large...150 websites x _800 MB_ models ")
| jampekka wrote:
| Browsers can store stuff that's downloaded. Using e.g. the
| Filesystem API. These files can be accessed from multiple
| websites. Browser applications can run offline with service
| workers.
|
| Js/browser based solutions seem to be very often knee-jerk
| dismissed based on decade old understanding of browser
| capabilities.
| fauigerzigerk wrote:
| It's an inherent problem with on-device AI processing, not just
| in the browser. I think this will only get better when
| operating systems start to preinstall models and provide an API
| that browser vendors can use as well.
|
| Even then I think cloud hosted models will probably always be
| far better for most tasks.
| jfoster wrote:
| > I think cloud hosted models will probably always be far
| better for most tasks
|
| It might depend on just how good you need it to be. There are
| lots of use-cases where an LLM like GPT 3.5 might be "good
| enough" such that a better model won't be so noticeable.
|
| Cloud models will likely have the advantage of being more
| cutting-edge, but running "good enough" models locally will
| probably be more economical.
| fauigerzigerk wrote:
| I agree. The economic advantages of a hybrid approach could
| be very significant.
| Vetch wrote:
| This specific problem is certainly not one for all on-devices
| AI processing. As someone else mentioned, there are unique UX
| and browser constraints that come from serving large compute
| intensive binary blobs through the browser (that are almost
| identically shared by games).
|
| Separately, having to rely on preinstallation very likely
| means stagnating on overly sanitized poorly done official
| instruction-tunes. With the exception of mixtral7x8, the
| trend has been the community overtime arrives at finetunes
| which far eclipse official ones.
| echelon wrote:
| Apple's future is predicated on local machine learning instead
| of cloud machine learning. They're betting big on it, and you
| can see the chess pieces being moved into place. They
| desperately do not want to become a thin client for magical
| cloud AI.
|
| I'd look to see Apple doing some stuff here.
| breck wrote:
| This is why I was hoping the startup MightyApp would succeed.
| Then it would be practical to build web apps that operated on
| GB/TB of data in a single tab. Most of the time people would
| use their normal browser, but for big data jobs you would use
| your Mighty browser with persistence and unlimited RAM streamed
| from the cloud. A path to get the best of web apps with the
| power of native apps. Glad they gave it a shot. Definitely was
| an idea worth trying.
| xenova wrote:
| We've put out a ton of demos that use much smaller models
| (10-60 MB), including:
|
| - (44MB) In-browser background removal:
| https://huggingface.co/spaces/Xenova/remove-background-web. (We
| also put out a WebGPU version:
| https://huggingface.co/spaces/Xenova/remove-background-
| webgp...).
|
| - (51MB) Whisper Web for automatic speech recognition:
| https://huggingface.co/spaces/Xenova/whisper-web (just select
| the quantized version in settings).
|
| - (28MB) Depth Anything Web for monocular depth estimation:
| https://huggingface.co/spaces/Xenova/depth-anything-web
|
| - (14MB) Segment Anything Web for image segmentation:
| https://huggingface.co/spaces/Xenova/segment-anything-web
|
| - (20MB) Doodle Dash, an ML-powered sketch detection game:
| https://huggingface.co/spaces/Xenova/doodle-dash
|
| ... and many many more! Check out the Transformers.js demos
| collection for some others:
| https://huggingface.co/collections/Xenova/transformersjs-
| dem....
|
| Models are cached on a per-domain basis (using the Web Cache
| API), meaning you don't need to re-download the model on every
| page load. If you would like to persist the model across
| domains, you can create browser extensions with the library! :)
|
| As for your last point, there are efforts underway, but nothing
| I can speak about yet!
| jfoster wrote:
| Thank you for the reply. Seems like all of the links are down
| at the moment, but it does sound a bit more feasible for some
| applications than I had assumed.
|
| Really glad to hear the last part. Some of the new
| capabilities seem fundamental enough that they ought to be in
| browsers, in my opinion.
| xenova wrote:
| Odd, the links seem to work for me. What error do you see?
| Can you try on a different network (e.g., mobile)?
| jfoster wrote:
| Error is "xenova-segment-anything-web.static.hf.space
| unexpectedly closed the connection."
|
| Works on mobile network, though, so might just be my
| internet connection.
| jph00 wrote:
| Why is only one of them on WebGPU? Is it because there
| additional tricky steps required to make a model work on
| WebGPU, or is there a limitation on what ops are supported
| there?
|
| I'm keen to do more stuff with WebGPU, so very interested to
| learn about challenges and limitations here.
| xenova wrote:
| We have some other WebGPU demos, including:
|
| - WebGPU embedding benchmark:
| https://huggingface.co/spaces/Xenova/webgpu-embedding-
| benchm...
|
| - Real-time object detection:
| https://huggingface.co/spaces/Xenova/webgpu-video-object-
| det...
|
| - Real-time background removal:
| https://huggingface.co/spaces/Xenova/webgpu-video-
| background...
|
| - WebGPU depth estimation:
| https://huggingface.co/spaces/Xenova/webgpu-depth-anything
|
| - Image background removal:
| https://huggingface.co/spaces/Xenova/remove-background-
| webgp...
|
| You can follow the progress for full WebGPU support in the
| v3 development branch
| (https://github.com/xenova/transformers.js/pull/545).
|
| To answer your question, while there are certain ops
| missing, the main limitation at the moment is for models
| with decoders... which are not very fast (yet) due to
| inefficient buffer reuse and many redundant copies between
| CPU and GPU. We're working closely with the ORT team to fix
| these issues though!
| ranulo wrote:
| Might make more sense for web apps and electron applications.
| yeldarb wrote:
| Not using transformers, but we do object detection in the
| browser with small quantized yolo models that are about 7mb and
| run at 30+ fps on modern laptops via tensorflow.js and
| onnxruntime-web.
|
| Lots of cool demos and real world applications you can build
| with it. Eg we powered an AR card ID feature for Magic: The
| Gathering, built a scavenger hunt for SXSW, a test proctoring
| assistant (to warn you if you're likely to get DQ'd for eg
| wearing headphones), and a pill counter for pharmacists. Really
| powerful for distribution to not make users install an app or
| need anything other than their smartphone.
| bilekas wrote:
| This is probably a really stupid question, but can the models
| be streamed as they're being ran? So that you as the browser
| wouldn't need to wait for the entire download first? Or there
| is even the concept of ordered model transformers?
|
| As I ask it seems wrong to me but just to confirm?
| janalsncm wrote:
| Usually the inference time is small compared with download
| time so even if this were technically feasible you wouldn't
| save much time.
|
| For reference I have a 31mb vision transformer I run in my
| browser. Building the inputs, running inference, and parsing
| the response takes less than half a second.
| bilekas wrote:
| > Usually the inference time is small compared with
| download time so even if this were technically feasible you
| wouldn't save much time.
|
| I can understand that but where time is not a factor and
| solely a question of data, can a model be streamed?
| baggachipz wrote:
| Ok so now we can make a browser plugin which will pick out all
| bicycles or bridges in a Google captcha, right?
| 8ig8 wrote:
| The Syntax podcast recently did an episode on Transformers.js and
| the developer...
|
| https://syntax.fm/show/740/local-ai-models-in-javascript-mac...
| simonw wrote:
| This library is so cool. It makes spinning up a quick demo
| incredibly easy - I've used it in Observable notebooks a few
| times:
|
| - CLIP in a browser: https://observablehq.com/@simonw/openai-
| clip-in-a-browser
|
| - Image object detection with detra-resnet-50:
| https://observablehq.com/@simonw/detect-objects-in-images
|
| The size of the models feels limiting at first, but for quite a
| few applications telling a user with a good laptop and connection
| that they have to wait 30s for it to load isn't unthinkable.
|
| The latest releases adds binary embedding quantization support
| which I'm really looking forward to trying out:
| https://github.com/xenova/transformers.js/releases/tag/2.17....
| sroussey wrote:
| Binary embeddings will require additional re-rankings, but will
| be fun to test.
|
| I've made an npm package of transformers.js v3, which I should
| update (not sure if I include this yet).
|
| Mostly, I've had to have a fork so it runs on bun. V3 when
| released will support bun just fine. Although the webgpu won't
| work, but that's optional.
|
| [edit: dm if you want to use it, I don't want to promote a
| fork]
| grey8 wrote:
| transformers.js is such a cool library.
|
| I made a small web app with it that uses it to remove backgrounds
| from images (with BRIA AI's RMBG1.4 model) at
| https://aether.nco.dev
|
| The fact you don't need to send your data to an API and this runs
| even on smartphones is really cool. I foresee lots of projects
| using this in the future, be it small vision, language or other
| utility models (depth estimation, background removal, etc), looks
| like a bright future for the web!
|
| I'm already working on my next project, and it'll definitely use
| transformers.js again!
| sroussey wrote:
| I'm using it for a simple project:
| https://github.com/sroussey/ellmers
|
| My plan is to test embedding and retrieval strategies for
| different RAG strategies to be used by a server or electron
| app.
| Solvency wrote:
| can someone explain what this means I can do with it if I know
| vanilla JavaScript?
|
| for example, I used the image upscaling playground on hugging
| face all the time. But I do it manually here:
| https://huggingface.co/spaces/bookbot/Image-Upscaling-Playgr...
|
| would transformers.js allow me to somehow executive that in my
| own local or online app programmatically?
| karaterobot wrote:
| I think running locally is exactly what it's meant to do. I see
| an example using image upscaling (https://huggingface.co/docs/t
| ransformers.js/main/en/api/pipe...) but I have not used it.
| macrolime wrote:
| Does it support Apple Silicon (accelerated)?
| ctrlaltdylan wrote:
| Also this opens the possibility of running these models on
| Node.js serverless functions no?
|
| That certainly also has to open up possibilities for on-demand
| predictions?
| dajas wrote:
| I'm using this library to generate embeddings with gte-small
| (~0.07gb) and using Upstash Vector for storage.
|
| It's only 384 dimensions but it works surprisingly well with a
| paragraph of text! It also ranks better than text-embedding-
| ada-002 on the leaderboard
|
| https://huggingface.co/spaces/mteb/leaderboard
| Tenoke wrote:
| Is training not possible? I did some stuff years ago where I
| create and train small NNs in the browser and I'm curious if that
| type of thing would work better today with a small custom
| transformer.
| spmurrayzzz wrote:
| In theory its definitely possible, but I suspect that maybe
| performance concerns are probably the reason its not
| implemented (yet). They have a webgpu embeddings benchmark in
| an HF space to give you a sense of the forward pass dynamics:
| https://huggingface.co/spaces/Xenova/webgpu-embedding-benchm...
|
| Its impressive for what it is, but training would be painful at
| those latencies (fp16, batch 32, sequence length 512 generates
| a ~500ms forward pass with a 22M param model)
| dheera wrote:
| There might be applications for much smaller transformers in
| UI design.
|
| Like for example
|
| - did the user tap the wrong location on the screen because
| their device was physically jolted, and can you correct for
| that, considering you have access to accelerometer in HTML5
|
| - does the user keep repeating an action (checking every box
| in a list of e-mails) and can you extrapolate the rest of
| what the user wants to do
|
| - did the user bounce because you popped up a stupid intercom
| box or newsletter popup, and did you learn anything about
| what you need to do if you want to retain this particular
| user in the future
|
| these kinds of things could be done with hundreds or
| thousands of parameters or less
| spmurrayzzz wrote:
| Yea definitely. But in that case, you could train _much_
| faster in pytorch, then convert to ONNX, and load in the
| browser for inference (as the transformers.js docs
| recommend)
|
| EDIT: (I responded before your full edit with the bullet
| list). This next comment is orthogonal to the slow training
| performance topic I think, but the use cases you reference
| there don't seem to be well-suited at all for an
| autoregressive decoder-only model architecture.
___________________________________________________________________
(page generated 2024-04-11 23:01 UTC)