[HN Gopher] Show HN: WebGPU enables local LLM in the browser - d...
___________________________________________________________________
Show HN: WebGPU enables local LLM in the browser - demo site with
AI chat
Browser LLM demo working on JavaScript and WebGPU. WebGPU is
already supported in Chrome, Safari, Firefox, iOS (v26) and
Android. Demo, similar to ChatGPT
https://andreinwald.github.io/browser-llm/ Code
https://github.com/andreinwald/browser-llm - No need to use your
OPENAI_API_KEY - its local model that runs on your device - No
network requests to any API - No need to install any program - No
need to download files on your device (model is cached in browser)
- Site will ask before downloading large files (llm model) to
browser cache - Hosted on Github Pages from this repo - secure,
because you see what you are running
Author : andreinwald
Score : 105 points
Date : 2025-08-02 14:09 UTC (8 hours ago)
(HTM) web link (andreinwald.github.io)
(TXT) w3m dump (andreinwald.github.io)
| andsoitis wrote:
| very cool. improvement would be if the input text box is always
| on screen, rather than having to manually scroll down as the
| screen fills.
| maxmcd wrote:
| Looks like this is a wrapper around: https://github.com/mlc-
| ai/web-llm
|
| Which has a full web demo: https://chat.webllm.ai/
| refulgentis wrote:
| I am to see it regardless - projects been very low activity for
| months. Just last night I was thinking about ripping it out
| before launch. No observable future.
|
| EDIT: Ah, wrapper ~= using it as a dependency, no more
| swores wrote:
| Is this correct?
|
| It doesn't seem so to me, either from the way it works or from
| what little of the code I've looked at...
|
| But I don't have time to do more than the quick glance I just
| did at a few of the files of each and need to run, so hopefully
| someone cleverer than me who won't need as much time as me to
| answer the question could confirm while I'm afk
| refulgentis wrote:
| Entirely correct, c.f. LLM.ts. The bit to Show HN here, is
| about ~40 lines of code. Simple Typescript MVP of calling the
| library.
|
| (source: maintains an LLM client that works across
| MLC/llama.cpp/3P providers; author of sibling comment that
| misunderstood initially)
| pjmlp wrote:
| Beware of opening this on mobile Internet.
| andreinwald wrote:
| Demo site is asking before download
| lukan wrote:
| Well, I am on a mobile right now, can someone maybe share
| anything about the performance?
| pjmlp wrote:
| Not everyone enjoys unlimited data plans, and 500 MB is schon
| a lot.
| andreinwald wrote:
| On my Android device works pretty fast.
|
| But keep in mind that it's small Llama-3.2-1B model,
| specially for less powerfull GPU.
| asim wrote:
| What's the performance of a model like vs an OpenAI API? What's
| the comparable here? Edit: I see it's same models locally that
| you'd run using Ollama or something else. So basically just
| constrained by the size of the model, GPU and perf of the
| machine.
| andreinwald wrote:
| Yes, its very similar to Ollama app, and Llama-3.2-1B model
| used
| petermcneeley wrote:
| This demo only works if you have the webgpu feature "f16". You
| can find out if you have this by checking for the feature list in
| https://webgpureport.org/ . The page itself can of course check
| for this but since f16 support is common they probably just didnt
| bother.
| andreinwald wrote:
| Also here https://caniuse.com/webgpu
| cgdl wrote:
| Which model does the demo use?
| andreinwald wrote:
| Llama 3.2 1B
| scottfr wrote:
| There is a Prompt API in development that's available in both
| Chrome and Edge to give access to a local LLM. Chrome extensions
| have access to it and I believe websites can request access as
| part of an origin trial.
|
| The model is fully managed by the browser. It's currently the
| Gemini Nano model on Chrome, and they are testing a version of
| the Gemma 3n model in beta channels. Edge uses phi-4-mini.
|
| More information is available here:
| https://github.com/webmachinelearning/prompt-api
| om8 wrote:
| To have a gpu inference, you need a gpu. I have a demo that runs
| 8B llama on any computer with 4 gigs of ram
|
| https://galqiwi.github.io/aqlm-rs/about.html
| adastra22 wrote:
| Any computer with a display has a GPU.
| om8 wrote:
| Sure, but integrated graphics usually lacks vram for LLM
| inference.
| adastra22 wrote:
| Which means that inference would be approximately the same
| speed (but compute offloaded) as the suggested CPU
| inference engine.
| echoangle wrote:
| Fun demo but the model that's used seems to be pretty stupid:
|
| > What's the best way to get to space?
|
| >> Unfortunately, it's not currently possible for humans to
| travel to space in the same way that astronauts do. While there
| have been several manned missions to space, such as those to the
| International Space Station, the technology and resources
| required to make interstellar travel feasible are still in the
| early stages of development.
|
| >> Currently, the fastest spacecraft is the Apollo missions,
| which took around 77 hours to reach the Moon. From there, they
| traveled to other planets, but the distances between them are so
| great that even at high speeds, it would take many years to reach
| them.
|
| >> As for traveling to space itself, there are no commercial or
| private companies that offer space travel services, and the cost
| of such a journey is estimated to be in the tens of millions of
| dollars per person.
|
| >> However, there are some concepts and technologies being
| explored that might one day make space travel more accessible and
| affordable. For example, private companies like SpaceX and Blue
| Origin are working on developing reusable rockets that could
| potentially take people to space more quickly and reliably.
|
| >> But for now, let's just say that space travel is still in the
| future, and we'll have to wait a bit longer for humans to venture
| into the cosmos!
| andreinwald wrote:
| It works on small Llama-3.2-1B model, specially for less
| powerfull GPU devices
| wongarsu wrote:
| The answer is still terrible for the model size. Maybe it's
| the 4 bit quantization, smaller models tend to react worse to
| that
|
| For reference, [1] is what stock quen3-0.6B would answer. Not
| a perfect answer, but much better at nearly half the number
| of parameters
|
| 1: https://markdownpastebin.com/?id=7ad4ad9f325d4354a858480ab
| dc...
| KTibow wrote:
| It's likely the quantization on
| "Llama-3.2-1B-Instruct-q4f16_1-MLC". inference.net
| generated this more coherent answer:
| https://hst.sh/ovilewofox.md
| tomrod wrote:
| It shows an interface of what could be.
| fooker wrote:
| It's broadly correct, if you interpret 'humans' as ordinary
| humans who are not astronauts.
| dragonwriter wrote:
| That generous redefinition.of "humans" deals with only the
| error in the first sentence of the response; the rest is
| still not "broadly correct" even with it in place.
| wongarsu wrote:
| Blue Origin is built on taking ordinary humans to space. So
| is/was Virgin Galactic, though they are in a bit of a
| transitionary phase right now. SpaceX is also willing, they
| might even take you on a flyby of the moon if you bring money
| and patience (iirc the last customer lost patience).
| Basically just ring up your favorite multi-billionaire's
| space program. And while the estimated price would be correct
| for SpaceX, rumored prices for Blue Origin's New Shepard are
| only in the hundreds of thousands per seat
|
| Edit: also the "but if you do that you are an astronaut, so
| it's still true that only astronauts can do that" loophole
| was closed when the FAA redefined the word astronaut in 2021.
| At least if you follow their definition of the word
| andreinwald wrote:
| Model used: Llama 3.2 1B (small). Quality hould be similar with
| running Ollama app with same small model.
| apitman wrote:
| Does anyone know why all of these WebGPU LLM demos have you
| download the models to browser storage rather than letting you
| open a gguf already on your local drive? I have several models
| downloaded already that I would be interested in trying.
| fooker wrote:
| Browsers are sandboxed away from user storage.
|
| You can change this by changing settings, command line
| arguments, build flags, etc. But can't really expect people to
| do this just to use your website.
| apitman wrote:
| You can open a file for performant access in all major
| browsers. It's the same API used for uploading files (<input
| type="file" />), but you can also just load them into memory
| and do stuff.
| cat-whisperer wrote:
| I've been following the development of WebGPU and its potential
| applications, and this demo is a great example of what's
| possible.
| AndrewDucker wrote:
| I asked "Why is the sky blue?" and got back a response of
|
| "coppia RR TalentDevExpressincer+'.//////////////////////////////
| ////////////////////////////////////////// cha wl.AutoSizesaving
| prolengkhicate Like"/>
|
| infos eti za cornerback economical (%]\
| enumligne.execRELEASEPropagation_station Bucks proHEME seas
| GASPOST[Unit(suffix Gloves"
|
| (and so on, for a few more paragraphs).
|
| Am I missing something?
| dataviz1000 wrote:
| Very cool!
|
| For some awesome examples of running different types of models in
| webgpu check out the examples folder here.
| https://github.com/huggingface/transformers.js
| gulan28 wrote:
| I did this with mlc @ https://wiz.chat some time ago.
|
| Warning: it has a llama 3.1 7b model and is around 4 gb. It needs
| either a GPU or a Macand works only on chrome
___________________________________________________________________
(page generated 2025-08-02 23:00 UTC)