[HN Gopher] Transformers.js -  Run Transformers directly in the ...
       ___________________________________________________________________
        
       Transformers.js -  Run Transformers directly in the browser
        
       Author : victormustar
       Score  : 213 points
       Date   : 2024-04-11 11:57 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jfoster wrote:
       | This is super cool, but unfortunately it also seems super
       | impractical. Models tend to be quite large, so even if a browser
       | can run them, getting them to the browser involves either:
       | 
       | 1. Large downloads on every visit to a website.
       | 
       | 2. Large downloads and high storage consumption for each website
       | using large models. (150 websites x 800 MB models => 120 GB of
       | storage used)
       | 
       | Both of those options seem terrible.
       | 
       | I think it might make sense for browsers to ship with some models
       | built in and be exposed via standardized web APIs in the future,
       | but I haven't heard of any efforts to make that happen yet.
        
         | jsheard wrote:
         | Basically the same problem that's plagued games on the web ever
         | since the first Unreal/Unity asmjs demos a decade ago, and
         | pretty much no progress has been made towards a solution in
         | that time. You just can't practically make a web app which
         | needs gigs of data on the client because there's no reliable
         | way to make sure it stays cached for as long as the user wants
         | it to, and as you say, even if you could reliably cache it the
         | download and storage would still be duplicated per site using
         | the same model due to browsers cache partitioning policies.
        
           | jampekka wrote:
           | There are ways to do it. E.g.
           | https://developer.mozilla.org/en-
           | US/docs/Web/API/File_System...
        
           | fauigerzigerk wrote:
           | The File System Access API seems promising.
        
             | jsheard wrote:
             | I'm not sure more APIs are the solution, LocalStorage could
             | already theoretically fill the role of a persistent large
             | data store if browsers didn't cap the storage at 5-10MB for
             | UX reasons. Removing that cap would require user facing
             | changes to allow them to manage the storage used by sites
             | and clean it up it manually when it inevitably gets
             | bloated. Any new API which lets sites save stuff on the
             | client is going to have the same issue.
        
               | fauigerzigerk wrote:
               | _> Any new API which lets sites save stuff on the client
               | is going to have the same issue._
               | 
               | I don't think it would have the same issues, because the
               | files could be stored in a user specified location
               | outside the browser's own storage area.
               | 
               | Browser vendors can't just delete stuff that may be used
               | by other software on a user's system. And they cannot put
               | a cap on it either, because users can store whatever they
               | like in those directories, bypassing the browser
               | entirely.
               | 
               | But I have never used this API, so maybe I misunderstand
               | how it's supposed to work.
        
               | jsheard wrote:
               | If that's how it works then it would avoid the problem I
               | mentioned, but the UX around using that to cache data
               | internal to the site implementation sounds pretty
               | terrible. You click on "Listen to this article" on a
               | webpage and it opens a file chooser expecting you to open
               | t2s-hq-fast-en+pl.model if you already have it? Users
               | won't be able to make any sense of that.
        
               | fauigerzigerk wrote:
               | The API (or at least the Chrome implementation) appears
               | to be unfinished, but the plan seems to be to eventually
               | support persistent directory permissions.
               | 
               | So the web app could ask the user to pick a directory for
               | model storage and henceforth store and load models from
               | there without further interaction.
        
           | binarymax wrote:
           | Been beating this drum for years. Wrote this 9 years ago!
           | https://max.io/articles/the-state-of-state-in-the-browser/
        
           | pjmlp wrote:
           | Actually more like the first Java Applets, Flash, and
           | initially Unreal targeted Flash on the Web, and PNaCL, before
           | asmjs came to be.
           | 
           | The Unreal 3 demo using Flash is still on YouTube.
           | 
           | And this is why most game studios are playing wait-and-see
           | with streaming instead, proper native 3D APIs, with easier to
           | debug tooling (Web still hasn't anything better than
           | SpectorJS), and big size assets.
        
         | wesbos wrote:
         | Some of the models are quite small and worth doing on-device vs
         | the opposite of sending all the data to the server to process.
         | The other huge benefit here is that transformers run in node.js
         | and getting things running is way easier than trying to get
         | some odd combination of python snd its dependencies to work
        
         | CapsAdmin wrote:
         | If they are single files or directories they could be drag
         | dropped on use. Not very convenient though.
         | 
         | Maybe just some sort of api to give the website fine grained
         | access to the filesystem might be enough. You'd specify a
         | directory or single file the website can read from at any time.
         | 
         | However at some point you will have to download large files. I
         | feel when done implicitly it's bad user experience.
         | 
         | On top of that the developer should implement a robust
         | downloading system that can resume downloads, check for
         | validity, etc. Developers rarerly bother with this, so the user
         | experience is that it sucks.
        
           | jampekka wrote:
           | There is such API.
           | 
           | https://developer.mozilla.org/en-
           | US/docs/Web/API/File_System...
        
             | elpocko wrote:
             | Still requires drag&drop on most browsers because the
             | File/DirectoryPicker API isn't universally supported.
        
               | jampekka wrote:
               | Origin private file system is supported in all modern
               | browsers. That does make sharing the models between
               | origins difficult at best, but for one origin works fine.
               | 
               | And in any case it's easier to direct users to install
               | Chrome (or preferably Chromium) or instruct drag&drop
               | than doing the brittle and error and bitrot prone
               | virtualenv-pip-docker-git song and dance.
        
         | refulgentis wrote:
         | This is pure free-association: models are below 80 MB, the rest
         | are LLMs and aren't in scope. Whisper is 40 MB, embeddings are
         | 23 MB. (n.b. parts of original comment that actively disclaim
         | understanding: " _seems_ super impractical. Models _tend_ to be
         | quite large...150 websites x _800 MB_ models ")
        
         | jampekka wrote:
         | Browsers can store stuff that's downloaded. Using e.g. the
         | Filesystem API. These files can be accessed from multiple
         | websites. Browser applications can run offline with service
         | workers.
         | 
         | Js/browser based solutions seem to be very often knee-jerk
         | dismissed based on decade old understanding of browser
         | capabilities.
        
         | fauigerzigerk wrote:
         | It's an inherent problem with on-device AI processing, not just
         | in the browser. I think this will only get better when
         | operating systems start to preinstall models and provide an API
         | that browser vendors can use as well.
         | 
         | Even then I think cloud hosted models will probably always be
         | far better for most tasks.
        
           | jfoster wrote:
           | > I think cloud hosted models will probably always be far
           | better for most tasks
           | 
           | It might depend on just how good you need it to be. There are
           | lots of use-cases where an LLM like GPT 3.5 might be "good
           | enough" such that a better model won't be so noticeable.
           | 
           | Cloud models will likely have the advantage of being more
           | cutting-edge, but running "good enough" models locally will
           | probably be more economical.
        
             | fauigerzigerk wrote:
             | I agree. The economic advantages of a hybrid approach could
             | be very significant.
        
           | Vetch wrote:
           | This specific problem is certainly not one for all on-devices
           | AI processing. As someone else mentioned, there are unique UX
           | and browser constraints that come from serving large compute
           | intensive binary blobs through the browser (that are almost
           | identically shared by games).
           | 
           | Separately, having to rely on preinstallation very likely
           | means stagnating on overly sanitized poorly done official
           | instruction-tunes. With the exception of mixtral7x8, the
           | trend has been the community overtime arrives at finetunes
           | which far eclipse official ones.
        
         | echelon wrote:
         | Apple's future is predicated on local machine learning instead
         | of cloud machine learning. They're betting big on it, and you
         | can see the chess pieces being moved into place. They
         | desperately do not want to become a thin client for magical
         | cloud AI.
         | 
         | I'd look to see Apple doing some stuff here.
        
         | breck wrote:
         | This is why I was hoping the startup MightyApp would succeed.
         | Then it would be practical to build web apps that operated on
         | GB/TB of data in a single tab. Most of the time people would
         | use their normal browser, but for big data jobs you would use
         | your Mighty browser with persistence and unlimited RAM streamed
         | from the cloud. A path to get the best of web apps with the
         | power of native apps. Glad they gave it a shot. Definitely was
         | an idea worth trying.
        
         | xenova wrote:
         | We've put out a ton of demos that use much smaller models
         | (10-60 MB), including:
         | 
         | - (44MB) In-browser background removal:
         | https://huggingface.co/spaces/Xenova/remove-background-web. (We
         | also put out a WebGPU version:
         | https://huggingface.co/spaces/Xenova/remove-background-
         | webgp...).
         | 
         | - (51MB) Whisper Web for automatic speech recognition:
         | https://huggingface.co/spaces/Xenova/whisper-web (just select
         | the quantized version in settings).
         | 
         | - (28MB) Depth Anything Web for monocular depth estimation:
         | https://huggingface.co/spaces/Xenova/depth-anything-web
         | 
         | - (14MB) Segment Anything Web for image segmentation:
         | https://huggingface.co/spaces/Xenova/segment-anything-web
         | 
         | - (20MB) Doodle Dash, an ML-powered sketch detection game:
         | https://huggingface.co/spaces/Xenova/doodle-dash
         | 
         | ... and many many more! Check out the Transformers.js demos
         | collection for some others:
         | https://huggingface.co/collections/Xenova/transformersjs-
         | dem....
         | 
         | Models are cached on a per-domain basis (using the Web Cache
         | API), meaning you don't need to re-download the model on every
         | page load. If you would like to persist the model across
         | domains, you can create browser extensions with the library! :)
         | 
         | As for your last point, there are efforts underway, but nothing
         | I can speak about yet!
        
           | jfoster wrote:
           | Thank you for the reply. Seems like all of the links are down
           | at the moment, but it does sound a bit more feasible for some
           | applications than I had assumed.
           | 
           | Really glad to hear the last part. Some of the new
           | capabilities seem fundamental enough that they ought to be in
           | browsers, in my opinion.
        
             | xenova wrote:
             | Odd, the links seem to work for me. What error do you see?
             | Can you try on a different network (e.g., mobile)?
        
               | jfoster wrote:
               | Error is "xenova-segment-anything-web.static.hf.space
               | unexpectedly closed the connection."
               | 
               | Works on mobile network, though, so might just be my
               | internet connection.
        
           | jph00 wrote:
           | Why is only one of them on WebGPU? Is it because there
           | additional tricky steps required to make a model work on
           | WebGPU, or is there a limitation on what ops are supported
           | there?
           | 
           | I'm keen to do more stuff with WebGPU, so very interested to
           | learn about challenges and limitations here.
        
             | xenova wrote:
             | We have some other WebGPU demos, including:
             | 
             | - WebGPU embedding benchmark:
             | https://huggingface.co/spaces/Xenova/webgpu-embedding-
             | benchm...
             | 
             | - Real-time object detection:
             | https://huggingface.co/spaces/Xenova/webgpu-video-object-
             | det...
             | 
             | - Real-time background removal:
             | https://huggingface.co/spaces/Xenova/webgpu-video-
             | background...
             | 
             | - WebGPU depth estimation:
             | https://huggingface.co/spaces/Xenova/webgpu-depth-anything
             | 
             | - Image background removal:
             | https://huggingface.co/spaces/Xenova/remove-background-
             | webgp...
             | 
             | You can follow the progress for full WebGPU support in the
             | v3 development branch
             | (https://github.com/xenova/transformers.js/pull/545).
             | 
             | To answer your question, while there are certain ops
             | missing, the main limitation at the moment is for models
             | with decoders... which are not very fast (yet) due to
             | inefficient buffer reuse and many redundant copies between
             | CPU and GPU. We're working closely with the ORT team to fix
             | these issues though!
        
         | ranulo wrote:
         | Might make more sense for web apps and electron applications.
        
         | yeldarb wrote:
         | Not using transformers, but we do object detection in the
         | browser with small quantized yolo models that are about 7mb and
         | run at 30+ fps on modern laptops via tensorflow.js and
         | onnxruntime-web.
         | 
         | Lots of cool demos and real world applications you can build
         | with it. Eg we powered an AR card ID feature for Magic: The
         | Gathering, built a scavenger hunt for SXSW, a test proctoring
         | assistant (to warn you if you're likely to get DQ'd for eg
         | wearing headphones), and a pill counter for pharmacists. Really
         | powerful for distribution to not make users install an app or
         | need anything other than their smartphone.
        
         | bilekas wrote:
         | This is probably a really stupid question, but can the models
         | be streamed as they're being ran? So that you as the browser
         | wouldn't need to wait for the entire download first? Or there
         | is even the concept of ordered model transformers?
         | 
         | As I ask it seems wrong to me but just to confirm?
        
           | janalsncm wrote:
           | Usually the inference time is small compared with download
           | time so even if this were technically feasible you wouldn't
           | save much time.
           | 
           | For reference I have a 31mb vision transformer I run in my
           | browser. Building the inputs, running inference, and parsing
           | the response takes less than half a second.
        
             | bilekas wrote:
             | > Usually the inference time is small compared with
             | download time so even if this were technically feasible you
             | wouldn't save much time.
             | 
             | I can understand that but where time is not a factor and
             | solely a question of data, can a model be streamed?
        
       | baggachipz wrote:
       | Ok so now we can make a browser plugin which will pick out all
       | bicycles or bridges in a Google captcha, right?
        
       | 8ig8 wrote:
       | The Syntax podcast recently did an episode on Transformers.js and
       | the developer...
       | 
       | https://syntax.fm/show/740/local-ai-models-in-javascript-mac...
        
       | simonw wrote:
       | This library is so cool. It makes spinning up a quick demo
       | incredibly easy - I've used it in Observable notebooks a few
       | times:
       | 
       | - CLIP in a browser: https://observablehq.com/@simonw/openai-
       | clip-in-a-browser
       | 
       | - Image object detection with detra-resnet-50:
       | https://observablehq.com/@simonw/detect-objects-in-images
       | 
       | The size of the models feels limiting at first, but for quite a
       | few applications telling a user with a good laptop and connection
       | that they have to wait 30s for it to load isn't unthinkable.
       | 
       | The latest releases adds binary embedding quantization support
       | which I'm really looking forward to trying out:
       | https://github.com/xenova/transformers.js/releases/tag/2.17....
        
         | sroussey wrote:
         | Binary embeddings will require additional re-rankings, but will
         | be fun to test.
         | 
         | I've made an npm package of transformers.js v3, which I should
         | update (not sure if I include this yet).
         | 
         | Mostly, I've had to have a fork so it runs on bun. V3 when
         | released will support bun just fine. Although the webgpu won't
         | work, but that's optional.
         | 
         | [edit: dm if you want to use it, I don't want to promote a
         | fork]
        
       | grey8 wrote:
       | transformers.js is such a cool library.
       | 
       | I made a small web app with it that uses it to remove backgrounds
       | from images (with BRIA AI's RMBG1.4 model) at
       | https://aether.nco.dev
       | 
       | The fact you don't need to send your data to an API and this runs
       | even on smartphones is really cool. I foresee lots of projects
       | using this in the future, be it small vision, language or other
       | utility models (depth estimation, background removal, etc), looks
       | like a bright future for the web!
       | 
       | I'm already working on my next project, and it'll definitely use
       | transformers.js again!
        
         | sroussey wrote:
         | I'm using it for a simple project:
         | https://github.com/sroussey/ellmers
         | 
         | My plan is to test embedding and retrieval strategies for
         | different RAG strategies to be used by a server or electron
         | app.
        
       | Solvency wrote:
       | can someone explain what this means I can do with it if I know
       | vanilla JavaScript?
       | 
       | for example, I used the image upscaling playground on hugging
       | face all the time. But I do it manually here:
       | https://huggingface.co/spaces/bookbot/Image-Upscaling-Playgr...
       | 
       | would transformers.js allow me to somehow executive that in my
       | own local or online app programmatically?
        
         | karaterobot wrote:
         | I think running locally is exactly what it's meant to do. I see
         | an example using image upscaling (https://huggingface.co/docs/t
         | ransformers.js/main/en/api/pipe...) but I have not used it.
        
       | macrolime wrote:
       | Does it support Apple Silicon (accelerated)?
        
       | ctrlaltdylan wrote:
       | Also this opens the possibility of running these models on
       | Node.js serverless functions no?
       | 
       | That certainly also has to open up possibilities for on-demand
       | predictions?
        
       | dajas wrote:
       | I'm using this library to generate embeddings with gte-small
       | (~0.07gb) and using Upstash Vector for storage.
       | 
       | It's only 384 dimensions but it works surprisingly well with a
       | paragraph of text! It also ranks better than text-embedding-
       | ada-002 on the leaderboard
       | 
       | https://huggingface.co/spaces/mteb/leaderboard
        
       | Tenoke wrote:
       | Is training not possible? I did some stuff years ago where I
       | create and train small NNs in the browser and I'm curious if that
       | type of thing would work better today with a small custom
       | transformer.
        
         | spmurrayzzz wrote:
         | In theory its definitely possible, but I suspect that maybe
         | performance concerns are probably the reason its not
         | implemented (yet). They have a webgpu embeddings benchmark in
         | an HF space to give you a sense of the forward pass dynamics:
         | https://huggingface.co/spaces/Xenova/webgpu-embedding-benchm...
         | 
         | Its impressive for what it is, but training would be painful at
         | those latencies (fp16, batch 32, sequence length 512 generates
         | a ~500ms forward pass with a 22M param model)
        
           | dheera wrote:
           | There might be applications for much smaller transformers in
           | UI design.
           | 
           | Like for example
           | 
           | - did the user tap the wrong location on the screen because
           | their device was physically jolted, and can you correct for
           | that, considering you have access to accelerometer in HTML5
           | 
           | - does the user keep repeating an action (checking every box
           | in a list of e-mails) and can you extrapolate the rest of
           | what the user wants to do
           | 
           | - did the user bounce because you popped up a stupid intercom
           | box or newsletter popup, and did you learn anything about
           | what you need to do if you want to retain this particular
           | user in the future
           | 
           | these kinds of things could be done with hundreds or
           | thousands of parameters or less
        
             | spmurrayzzz wrote:
             | Yea definitely. But in that case, you could train _much_
             | faster in pytorch, then convert to ONNX, and load in the
             | browser for inference (as the transformers.js docs
             | recommend)
             | 
             | EDIT: (I responded before your full edit with the bullet
             | list). This next comment is orthogonal to the slow training
             | performance topic I think, but the use cases you reference
             | there don't seem to be well-suited at all for an
             | autoregressive decoder-only model architecture.
        
       ___________________________________________________________________
       (page generated 2024-04-11 23:01 UTC)