[HN Gopher] Show HN: WebGPU enables local LLM in the browser - d...
       ___________________________________________________________________
        
       Show HN: WebGPU enables local LLM in the browser - demo site with
       AI chat
        
       Browser LLM demo working on JavaScript and WebGPU. WebGPU is
       already supported in Chrome, Safari, Firefox, iOS (v26) and
       Android.  Demo, similar to ChatGPT
       https://andreinwald.github.io/browser-llm/  Code
       https://github.com/andreinwald/browser-llm  - No need to use your
       OPENAI_API_KEY - its local model that runs on your device  - No
       network requests to any API  - No need to install any program  - No
       need to download files on your device (model is cached in browser)
       - Site will ask before downloading large files (llm model) to
       browser cache  - Hosted on Github Pages from this repo - secure,
       because you see what you are running
        
       Author : andreinwald
       Score  : 105 points
       Date   : 2025-08-02 14:09 UTC (8 hours ago)
        
 (HTM) web link (andreinwald.github.io)
 (TXT) w3m dump (andreinwald.github.io)
        
       | andsoitis wrote:
       | very cool. improvement would be if the input text box is always
       | on screen, rather than having to manually scroll down as the
       | screen fills.
        
       | maxmcd wrote:
       | Looks like this is a wrapper around: https://github.com/mlc-
       | ai/web-llm
       | 
       | Which has a full web demo: https://chat.webllm.ai/
        
         | refulgentis wrote:
         | I am to see it regardless - projects been very low activity for
         | months. Just last night I was thinking about ripping it out
         | before launch. No observable future.
         | 
         | EDIT: Ah, wrapper ~= using it as a dependency, no more
        
         | swores wrote:
         | Is this correct?
         | 
         | It doesn't seem so to me, either from the way it works or from
         | what little of the code I've looked at...
         | 
         | But I don't have time to do more than the quick glance I just
         | did at a few of the files of each and need to run, so hopefully
         | someone cleverer than me who won't need as much time as me to
         | answer the question could confirm while I'm afk
        
           | refulgentis wrote:
           | Entirely correct, c.f. LLM.ts. The bit to Show HN here, is
           | about ~40 lines of code. Simple Typescript MVP of calling the
           | library.
           | 
           | (source: maintains an LLM client that works across
           | MLC/llama.cpp/3P providers; author of sibling comment that
           | misunderstood initially)
        
       | pjmlp wrote:
       | Beware of opening this on mobile Internet.
        
         | andreinwald wrote:
         | Demo site is asking before download
        
         | lukan wrote:
         | Well, I am on a mobile right now, can someone maybe share
         | anything about the performance?
        
           | pjmlp wrote:
           | Not everyone enjoys unlimited data plans, and 500 MB is schon
           | a lot.
        
           | andreinwald wrote:
           | On my Android device works pretty fast.
           | 
           | But keep in mind that it's small Llama-3.2-1B model,
           | specially for less powerfull GPU.
        
       | asim wrote:
       | What's the performance of a model like vs an OpenAI API? What's
       | the comparable here? Edit: I see it's same models locally that
       | you'd run using Ollama or something else. So basically just
       | constrained by the size of the model, GPU and perf of the
       | machine.
        
         | andreinwald wrote:
         | Yes, its very similar to Ollama app, and Llama-3.2-1B model
         | used
        
       | petermcneeley wrote:
       | This demo only works if you have the webgpu feature "f16". You
       | can find out if you have this by checking for the feature list in
       | https://webgpureport.org/ . The page itself can of course check
       | for this but since f16 support is common they probably just didnt
       | bother.
        
         | andreinwald wrote:
         | Also here https://caniuse.com/webgpu
        
       | cgdl wrote:
       | Which model does the demo use?
        
         | andreinwald wrote:
         | Llama 3.2 1B
        
       | scottfr wrote:
       | There is a Prompt API in development that's available in both
       | Chrome and Edge to give access to a local LLM. Chrome extensions
       | have access to it and I believe websites can request access as
       | part of an origin trial.
       | 
       | The model is fully managed by the browser. It's currently the
       | Gemini Nano model on Chrome, and they are testing a version of
       | the Gemma 3n model in beta channels. Edge uses phi-4-mini.
       | 
       | More information is available here:
       | https://github.com/webmachinelearning/prompt-api
        
       | om8 wrote:
       | To have a gpu inference, you need a gpu. I have a demo that runs
       | 8B llama on any computer with 4 gigs of ram
       | 
       | https://galqiwi.github.io/aqlm-rs/about.html
        
         | adastra22 wrote:
         | Any computer with a display has a GPU.
        
           | om8 wrote:
           | Sure, but integrated graphics usually lacks vram for LLM
           | inference.
        
             | adastra22 wrote:
             | Which means that inference would be approximately the same
             | speed (but compute offloaded) as the suggested CPU
             | inference engine.
        
       | echoangle wrote:
       | Fun demo but the model that's used seems to be pretty stupid:
       | 
       | > What's the best way to get to space?
       | 
       | >> Unfortunately, it's not currently possible for humans to
       | travel to space in the same way that astronauts do. While there
       | have been several manned missions to space, such as those to the
       | International Space Station, the technology and resources
       | required to make interstellar travel feasible are still in the
       | early stages of development.
       | 
       | >> Currently, the fastest spacecraft is the Apollo missions,
       | which took around 77 hours to reach the Moon. From there, they
       | traveled to other planets, but the distances between them are so
       | great that even at high speeds, it would take many years to reach
       | them.
       | 
       | >> As for traveling to space itself, there are no commercial or
       | private companies that offer space travel services, and the cost
       | of such a journey is estimated to be in the tens of millions of
       | dollars per person.
       | 
       | >> However, there are some concepts and technologies being
       | explored that might one day make space travel more accessible and
       | affordable. For example, private companies like SpaceX and Blue
       | Origin are working on developing reusable rockets that could
       | potentially take people to space more quickly and reliably.
       | 
       | >> But for now, let's just say that space travel is still in the
       | future, and we'll have to wait a bit longer for humans to venture
       | into the cosmos!
        
         | andreinwald wrote:
         | It works on small Llama-3.2-1B model, specially for less
         | powerfull GPU devices
        
           | wongarsu wrote:
           | The answer is still terrible for the model size. Maybe it's
           | the 4 bit quantization, smaller models tend to react worse to
           | that
           | 
           | For reference, [1] is what stock quen3-0.6B would answer. Not
           | a perfect answer, but much better at nearly half the number
           | of parameters
           | 
           | 1: https://markdownpastebin.com/?id=7ad4ad9f325d4354a858480ab
           | dc...
        
             | KTibow wrote:
             | It's likely the quantization on
             | "Llama-3.2-1B-Instruct-q4f16_1-MLC". inference.net
             | generated this more coherent answer:
             | https://hst.sh/ovilewofox.md
        
         | tomrod wrote:
         | It shows an interface of what could be.
        
         | fooker wrote:
         | It's broadly correct, if you interpret 'humans' as ordinary
         | humans who are not astronauts.
        
           | dragonwriter wrote:
           | That generous redefinition.of "humans" deals with only the
           | error in the first sentence of the response; the rest is
           | still not "broadly correct" even with it in place.
        
           | wongarsu wrote:
           | Blue Origin is built on taking ordinary humans to space. So
           | is/was Virgin Galactic, though they are in a bit of a
           | transitionary phase right now. SpaceX is also willing, they
           | might even take you on a flyby of the moon if you bring money
           | and patience (iirc the last customer lost patience).
           | Basically just ring up your favorite multi-billionaire's
           | space program. And while the estimated price would be correct
           | for SpaceX, rumored prices for Blue Origin's New Shepard are
           | only in the hundreds of thousands per seat
           | 
           | Edit: also the "but if you do that you are an astronaut, so
           | it's still true that only astronauts can do that" loophole
           | was closed when the FAA redefined the word astronaut in 2021.
           | At least if you follow their definition of the word
        
       | andreinwald wrote:
       | Model used: Llama 3.2 1B (small). Quality hould be similar with
       | running Ollama app with same small model.
        
       | apitman wrote:
       | Does anyone know why all of these WebGPU LLM demos have you
       | download the models to browser storage rather than letting you
       | open a gguf already on your local drive? I have several models
       | downloaded already that I would be interested in trying.
        
         | fooker wrote:
         | Browsers are sandboxed away from user storage.
         | 
         | You can change this by changing settings, command line
         | arguments, build flags, etc. But can't really expect people to
         | do this just to use your website.
        
           | apitman wrote:
           | You can open a file for performant access in all major
           | browsers. It's the same API used for uploading files (<input
           | type="file" />), but you can also just load them into memory
           | and do stuff.
        
       | cat-whisperer wrote:
       | I've been following the development of WebGPU and its potential
       | applications, and this demo is a great example of what's
       | possible.
        
       | AndrewDucker wrote:
       | I asked "Why is the sky blue?" and got back a response of
       | 
       | "coppia RR TalentDevExpressincer+'.//////////////////////////////
       | ////////////////////////////////////////// cha wl.AutoSizesaving
       | prolengkhicate Like"/>
       | 
       | infos eti za cornerback economical (%]\
       | enumligne.execRELEASEPropagation_station Bucks proHEME seas
       | GASPOST[Unit(suffix Gloves"
       | 
       | (and so on, for a few more paragraphs).
       | 
       | Am I missing something?
        
       | dataviz1000 wrote:
       | Very cool!
       | 
       | For some awesome examples of running different types of models in
       | webgpu check out the examples folder here.
       | https://github.com/huggingface/transformers.js
        
       | gulan28 wrote:
       | I did this with mlc @ https://wiz.chat some time ago.
       | 
       | Warning: it has a llama 3.1 7b model and is around 4 gb. It needs
       | either a GPU or a Macand works only on chrome
        
       ___________________________________________________________________
       (page generated 2025-08-02 23:00 UTC)