hngopher.com

       [HN Gopher] Show HN: I built a free in-browser Llama 3 chatbot p...
       ___________________________________________________________________
        
       Show HN: I built a free in-browser Llama 3 chatbot powered by
       WebGPU
        
       I spent the last few days building out a nicer ChatGPT-like
       interface to use Mistral 7B and Llama 3 fully within a browser (no
       deps and installs).  I've used the WebLLM project by MLC AI for a
       while to interact with LLMs in the browser when handling sensitive
       data but I found their UI quite lacking for serious use so I built
       a much better interface around WebLLM.  I've been using it as a
       therapist and coach. And it's wonderful knowing that my personal
       information never leaves my local computer.  Should work on Desktop
       with Chrome or Edge. Other browsers are adding WebGPU support as
       well - see the Github for details on how you can get it to work on
       other browsers.  Note: after you send the first message, the model
       will be downloaded to your browser cache. That can take a while
       depending on the model and your internet connection. But on
       subsequent page loads, the model should be loaded from the
       IndexedDB cache so it should be much faster.  The project is open
       source (Apache 2.0) on Github. If you like it, I'd love
       contributions, particularly around making the first load faster.
       Github: https://github.com/abi/secret-llama Demo:
       https://secretllama.com
        
       Author : abi
       Score  : 478 points
       Date   : 2024-05-03 21:26 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | knowaveragejoe wrote:
       | Is this downloading a ~5gb model to my machine and storing it
       | locally for subsequent use?
        
         | sp332 wrote:
         | Models sizes are listed here https://github.com/abi/secret-
         | llama but yeah, > 4GB for the Llama 3 model.
        
         | abi wrote:
         | Yes, it only starts the download after you send the first
         | message so visiting the site won't use up any space.
         | 
         | Approx sizes are listed in the GitHub README.
         | 
         | Models are stored in indexeddb and will be managed by the
         | browser. Might get evicted.
        
           | simonw wrote:
           | I see you have Phi1.5-q4f16_1-1k - any chance you could add
           | Phi-3?
        
             | abi wrote:
             | Would love to. It uses MLC AIs webllm so just need to
             | convert it to that format.
        
             | FL33TW00D wrote:
             | Phi3 is already available in browser here:
             | https://huggingface.co/spaces/FL33TW00D-HF/ratchet-phi
             | 
             | Disclaimer: I am the author.
        
           | kamikazeturtles wrote:
           | I thought browser tabs only had access to ~400mb
           | 
           | How do you have access to 5gb?
        
             | zamadatix wrote:
             | A lot of the more modern options allow for many gigabytes
             | for typical user setups https://developer.mozilla.org/en-
             | US/docs/Web/API/Storage_API....
        
               | kamikazeturtles wrote:
               | Very interesting! Thank you so much
               | 
               | I was always under the impression that the max blob size
               | was 400mb and so you couldn't store files any bigger than
               | that. Google gives so many different answers to these
               | questions.
               | 
               | Do you know any other resources I can go more in depth on
               | browser storage limits?
        
       | ngshiheng wrote:
       | Nice demo! I briefly tried it out and the demo felt much better
       | than the original WebLLM one!
       | 
       | On a side note, i've been trying to do something similar too for
       | similar reasons (privacy).
       | 
       | Based on my recent experience, i find that running LLM directly
       | in the browser with decent UX (e.g. sub 1-2 second response time,
       | no lag, no crashes) is still somewhat impossible given the
       | current state of things. Plus, i think that relying on users' own
       | GPU hardware for UX improvement via WebGPU is not exactly very
       | practical on a large scale (but it is still something!) since not
       | everyone may have access to GPU hardware
       | 
       | But yeah, if there's anything to look forward to in this space, i
       | personally hope to see improved feasibility of running LLMs in
       | browsers
        
       | joshstrange wrote:
       | Very cool! I wish there was chat history.
       | 
       | Also if you click the "New Chat" button while an answer is
       | generating I think some of the output gets fed back into the
       | model, it causes some weird output [0] but was kind of cool/fun.
       | Here is a video of it as well [1], I almost think this should be
       | some kind of special mode you can run. I'd be interested to know
       | what the bug causes, is it just the existing output sent as input
       | or a subset of it? It might be fun to watch a chat bot just
       | randomly hallucinate, especially on a local model.
       | 
       | [0] https://cs.joshstrange.com/07kPLPPW
       | 
       | [1] https://cs.joshstrange.com/4sxvt1Mc
       | 
       | EDIT: Looks like calling `engine.resetChat()` while it's
       | generating will do it, but I'm not sure why it errors after a
       | while (maybe runs out of tokens for output? Not sure) but it
       | would be cool to have this run until you stop it, automatically
       | changing every 10-30 seconds or so.
        
         | brianzelip wrote:
         | Nice personal hosted image service!
        
           | joshstrange wrote:
           | I'm just using CleanShotX [0] which is an awesome image
           | annotation tool for macOS. It's way better than the built-in
           | tool that macOS comes with. You can also record as a gif for
           | video which is nice, I use it often to make guides for my day
           | job and my business.
           | 
           | [0] https://cleanshot.com
        
             | bbkane wrote:
             | I'm using flameshot ( https://flameshot.org/ ), which
             | sounds pretty similar, but FOSS and cross platform.
        
         | abi wrote:
         | Thanks for the bug report. Yeah, it's a bug with not resetting
         | the state properly when new chat is clicked. Will fix tomorrow.
         | 
         | Chat history shouldn't be hard to add with local storage and
         | Indexed DB.
        
       | indit wrote:
       | Could we use an already downloaded .gguf file?
        
       | geor9e wrote:
       | I'm just seeing ERR_SSL_VERSION_OR_CIPHER_MISMATCH at
       | https://secretllama.com/ and at http://secretllama.com/ I see
       | "secretllama.com has been registered at Porkbun but the owner has
       | not put up a site yet. Visit again soon to see what amazing
       | website they decide to build."
        
         | abi wrote:
         | Just bought the domain a couple of hours ago so DNS might not
         | have propagated. Try back tomorrow or download and install it
         | from GitHub (it's just 2 steps)
        
       | manlobster wrote:
       | It's truly amazing how quickly my browser loads 0.6GB of data. I
       | remember when downloading a 1MB file involved phoning up a sysop
       | in advance and leaving the modem on all night. We've come so far.
        
         | doctorpangloss wrote:
         | 97MB for the Worms 3 demo felt like an eternity.
         | 
         | So what games are in this LLM? Can it do solitaire yet?
        
           | westurner wrote:
           | It generates things that you get to look up citations for. It
           | doesn't care if its output converges, it does what it wants
           | differently every time.
        
             | TeMPOraL wrote:
             | > _It generates things that you get to look up citations
             | for._
             | 
             | Why would you use it for that? Use a search engine.
             | 
             | LLMs are substitute for _talking to people_. Use them for
             | things you would ask someone else about, and then not
             | follow up with searching for references.
        
           | exe34 wrote:
           | It can probably role-play.
        
           | roywiggins wrote:
           | GPT-3.5 is pretty good at fabricating text adventures, I
           | haven't tried any of the smaller models with that yet.
        
         | swores wrote:
         | When I think about numbers like that it just seems (to me, and
         | wrongly) like general progress that's not so crazy - the
         | thought that really makes the speed of progress stand out to me
         | is remembering when loading a single image - photo sized but
         | not crazily high resolution - over dial-up was slow enough that
         | you'd gradually see the image loading from top to bottom, and
         | could see it gradually getting taller as more lines of pixels
         | were downloaded and shown below the already loaded part.
         | Contrasting that memory against the ability to now watch videos
         | with much higher resolution per frame than those images were 30
         | years ago is what really makes me go "wow".
         | 
         | For anyone not old enough to remember, here's an example on
         | YouTube (and a faster loading time than I remember often being
         | the case!): https://youtube.com/watch?v=ra0EG9lbP7Y
        
         | zozbot234 wrote:
         | You could more or less fit the full model on a single CD (or a
         | DVD for the larger model sizes) but of course forget about
         | trying to do inference for it on period hardware, it would be
         | unusably slow.
        
       | mentos wrote:
       | This is awesome. I have been using ChatGPT4 for almost a year and
       | haven't really experimented with locally running LLMs because I
       | assumed that the processing time would take too long per token.
       | This demo has shown me that my RTX 2080 running Llama 3 can
       | compete with ChatGPT4 for a lot of my prompts.
       | 
       | This has sparked a curiosity in me to play with more LLms
       | locally, thank you!
        
         | bastawhiz wrote:
         | My _pixel 6_ was able to run tinyllama and answer questions
         | with alarming accuracy. I 'm honestly blown away.
        
           | abi wrote:
           | This is amazing. Thanks both for sharing your stories. Made
           | my day.
        
         | moffkalast wrote:
         | Uh oh, I had that same moment a bit over a year ago with MLC's
         | old WebLLM. Take a deep breath before you jump into this rabbit
         | hole because once you're in there's no escape :)
         | 
         | New models just keep rolling in day after day on r/locallama,
         | tunes for this or that, new prompt formats, new quantization
         | types, people doing all kinds of tests and analyses, new arxiv
         | papers on some breakthrough and llama.cpp implementing it 3
         | days later. Every few weeks a new base model drops from
         | somebody. So many things to try that nobody has tried before.
         | It's genuinely like crack.
        
         | navigate8310 wrote:
         | Try https://lmstudio.ai/
        
       | simple10 wrote:
       | Amazing! It's surprisingly fast to load and run given the size of
       | the downloaded models.
       | 
       | Do you think it would be feasible to extend it to support web
       | browsing?
       | 
       | I'd like to help if you could give some pointers on how to extend
       | it.
       | 
       | When asked about web browsing, the bot said it could fetch web
       | pages but then obviously didn't work when asked to summarize a
       | web page.
       | 
       | [EDIT] The Llama 3 model was able to summarize web pages!
        
         | simple10 wrote:
         | I commented too soon. The TinyLlama model didn't seem to be
         | able to summarize web pages but Llama 3 worked perfectly! Very
         | cool.
        
           | ashellunts wrote:
           | Are you sure it is not hallucinating? Most likely these
           | models don't have an access to the Internet.
           | 
           | edit: typo
        
             | simple10 wrote:
             | Yes, I got way too excited and comment trigger happy. It
             | does not appear to browse the web and was just
             | hallucinating. The hallucinations were surprisingly
             | convincing for a couple of the pages I tested. But on
             | examining the network requests, no fetches were made to the
             | pages. Llama 3 was just a lot better at hallucinating
             | convincing results than Tiny Llama.
        
       | manlobster wrote:
       | Looks like all the heavy lifting is being done by webllm [0].
       | What we have here is basically one of the demos from that.
       | 
       | [0] https://webllm.mlc.ai/.
        
         | BoorishBears wrote:
         | > I've used the WebLLM project by MLC AI for a while to
         | interact with LLMs in the browser when handling sensitive data
         | but I found their UI quite lacking for serious use so I built a
         | much better interface around WebLLM.
        
       | threatofrain wrote:
       | IMO eventually users should be able to advertise what embedding
       | models they have so we don't redundantly redownload.
        
         | KeplerBoy wrote:
         | That's not possible with current web tech, is it?
         | 
         | Different webapps can't share common dependencies stored in
         | localstorage afaik.
        
           | dannyw wrote:
           | This need wasn't super prevalent in the pre LLM days. It's
           | rare to have a multi-GB blob that should be commonly used
           | across sites.
        
             | abi wrote:
             | Well, it should be possible to just drag and drop a
             | file/folder
        
             | KeplerBoy wrote:
             | Who knows. Maybe the browser would be a more prevalent
             | gaming platform if it could be assumed that loading a multi
             | gigabyte game engine is no big deal, because everyone had
             | one already cached.
             | 
             | A lot of unity games could easily be web games, but aren't
             | because of many roadblocks. I believe this is one of them.
        
             | TeMPOraL wrote:
             | It was a real need given how almost all sites use large
             | JavaScript deps. However, any hopes of sharing those were
             | destroyed by adtech people timing resource downloads to
             | track people.
        
               | SXX wrote:
               | Lots and lots of websites still use Google and other CDNs
               | for JS deps, fonts, etc.
        
               | JimDabell wrote:
               | They are cached independently these days to avoid privacy
               | issues. So if websites A and B both use the same
               | JavaScript dependency from a public CDN and you visit
               | them both, you will download the JavaScript dependency
               | twice, even if you have it cached from your visit to the
               | first website.
        
           | JimDabell wrote:
           | It can probably be done with a browser extension. It can
           | definitely be done by the browsers themselves. Eventually it
           | will probably be done by the operating system, which the
           | browsers will then expose.
        
           | yokto wrote:
           | It is, but only within the same origin, which already enables
           | users to not re-download jquery.js or Google Fonts if they
           | previously visited another website that downloaded the same
           | file from the same (usually cross-) origin.
        
           | blurpesec wrote:
           | Not default web tech. It can be done with IPFS via IPFS
           | Companion browser extension -
           | https://chromewebstore.google.com/detail/ipfs-
           | companion/nibj... or browsers with native IPFS client support
           | like Brave or Opera.
           | 
           | Fetching these models over IPFS would locally cache them and
           | dedupe calls for them by IPFS content ids -
           | https://docs.ipfs.tech/concepts/content-
           | addressing/#:~:text=... which functions similar to a file
           | hash (not an exact parallel, since CIDs represent files that
           | are broken up into chunks).
           | 
           | This would help with object DL deduplication if everyone is
           | using the same models & would also help to decrease
           | centralized data egress costs since with sufficient usage you
           | would be DLing these models from other peers that are running
           | IPFS nodes and holding onto the models.
        
       | dosinga wrote:
       | Phi 1.5 isn't quite there is it? Caught it having some inner
       | dialog.
       | 
       | What does that mean? Instruct: I'm sorry, but I'm not sure what
       | you mean by "wutt". Output: Oh, I'm sorry. I meant "what does
       | that mean?" Instruct: No problem, "wutt" means "that doesn't make
       | any sense". Output: Ah, I see. Thank you for clarifying!
       | Instruct: You're welcome! If you have any more questions, feel
       | free to ask. Instruct: I'm glad I could help! Instruct: I hope
       | you continue to enjoy learning about language arts. Instruct:
       | You're doing a great job! Keep up the good work! Instruct: Have a
       | fantastic day! Instruct: Goodbye! Instruct: Goodbye!
        
         | PhilippGille wrote:
         | I've seen this with Phi 1.5B a lot. Try TinyLlama for a 1.1B
         | model or Gemma for 2B. The latter is not available on OP's
         | website yet but maybe in the future.
        
         | moffkalast wrote:
         | That boy needs therapy.
        
           | phito wrote:
           | Purely psychosomatic
        
             | afavour wrote:
             | But surely, expulsion is not the answer!
        
               | stevenicr wrote:
               | Where in the heck did you all discover this song? It's
               | never been on the radio I assume. I only found it because
               | it cropped up in yahoo messenger/launchcast at some point
               | - never seen it anywhere else.
        
               | edgineer wrote:
               | It was big on YTMND back in the day
        
               | moffkalast wrote:
               | It's what you get when you leave the autogenerated
               | Youtube mixes running for far too long. Far too long...
        
               | afavour wrote:
               | Sadly my answer is: because I'm old. That album was a big
               | deal back in the day.
        
         | ComputerGuru wrote:
         | That seems almost like the instruction template is wrong.
        
         | andai wrote:
         | I wasn't able to get much use from Phi 1.5 (except for leaking
         | Microsoft's proprietary training data).
         | 
         | Phi 3 is great though.
        
       | koolala wrote:
       | On Firefox Nightly on my Steam Deck it "cannot find WebGPU in the
       | environment".
        
         | eyegor wrote:
         | Last I checked ff explicitly does not support webgpu, webhid,
         | webusb, etc.
         | 
         | Apparently nightly is supposed to support it:
         | https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Exp...
        
           | ojosilva wrote:
           | So here's a howler to the new Mozilla CEO and FF teams who're
           | looking for ways to save their org:
           | 
           | - release WebGPU support everywhere, also embed llama.cpp or
           | something similar for non GPU users
           | 
           | - add UI for easy model downloading and sharing among sites
           | 
           | - write the LLM browser API that enables easy access and sets
           | the standard
           | 
           | - add security: "this website wants to use local LLM. Allow?"
        
             | eyegor wrote:
             | Hmm but what about another mobile phone OS instead? Or a
             | vpn service? Surely people don't care about browser
             | features.
        
             | yokoprime wrote:
             | There's also the little issue of firefox not supporting HDR
             | videos which with more and more OLED/miniLED monitors out
             | there is a major drawback. I love FF and i daily drive it,
             | but there are some glaring gaps in the feature set between
             | chromium and ff.
        
         | 1f60c wrote:
         | I had the same issue on my iPhone! You can (temporarily) enable
         | WebGPU by going to Settings > Safari > Advanced > Experimental
         | features (I don't know what it's called in English, but it's
         | the bottom one).
        
       | Snoozus wrote:
       | Tried this in Chrome under Windows, it does work but does not
       | seem to use the RTX4060, only the integrated Iris Xe. Is this a
       | bug or intentional?
        
         | lastdong wrote:
         | I think neither. You need to configure windows to use the RTX
         | with Chrome. Maybe something like in windows graphics settings,
         | setting Chrome to "High performance". A quick web search for
         | "force Chrome to use dedicated GPU" should give you all the
         | steps you need.
        
         | lukan wrote:
         | When you use the GPU in the browser, you can only request the
         | high performance GPU. It is up to the OS to grant it or not.
         | 
         | So maybe the author forgot to include the high performance
         | request, or your OS does not give the high performance GPU by
         | default (as it might be in eco mode). This behavior can be
         | changed in OS settings.
        
       | littlestymaar wrote:
       | This is very cool, it's something I wish existed since Llama came
       | out, having to install Ollama + Cuda to get locally working LLM
       | didn't felt right to me when there's all what's needed in the
       | browser. Llamafile solves the first half of the problem, but you
       | still need to install Cuda/ROCm for it to work with GPU
       | acceleration. WebGPU is the way to go if we want to put AI on
       | consumer hardware and break the oligopoly, I just wished it
       | became more broadly available (on Linux, no browser supports it
       | yet)
        
         | notarealllama wrote:
         | Tested on Ubuntu 22.04 with Chrome, sure enough, "Could not
         | load the model because Error: Cannot find adapter that matches
         | the request".
         | 
         | It really is too bad WebGPU isn't supported on Linux, I mean,
         | that's a no-brainer right there.
        
           | ac29 wrote:
           | Works for me.
           | 
           | WebGPU support is behind a couple flags on Linux:
           | https://github.com/gpuweb/gpuweb/wiki/Implementation-Status
        
           | whartung wrote:
           | I get the same thing on Chrome and my last generation Intel
           | iMac.
        
           | earleybird wrote:
           | Likewise (same error) with Chrome on Windows.
           | 
           | Currently running Ollama / Open WebUI and finding lama3:8B
           | quite useful for writing snippets of powershell, javascript,
           | golang etc.
        
         | Jedd wrote:
         | I've managed to avoid ollama and just toyed with lmstudio. It's
         | non-free software, but extremely easy to get into, uses
         | llama.cpp under the hood, cross-platform, yada yada. There's
         | https://jan.ai/docs as well, is AGPL3, and promises inference
         | as well as training - doubtless many other similar offerings.
         | 
         | I'm wary of any 'web' prefix on what could / should otherwise
         | be desktop applications, mostly due to doubts about browser
         | security.
        
         | spmurrayzzz wrote:
         | > having to install Ollama + Cuda to get locally working LLM
         | didn't felt right to me when there's all what's needed in the
         | browser
         | 
         | Was there something specifically about the install that didn't
         | feel right? I ask because ollama is just a thin go wrapper
         | around llama.cpp (its actually starting a modified version of
         | the llama.cpp server in the background, not even going through
         | the go ffi, likely for perf reasons). In that that sense, you
         | could just install the CUDA toolkit via your package manager
         | and calling `make LLAMA_CUDA=1; ./server` from the llama.cpp
         | repo root to get effectively the same thing in two simple steps
         | with no extra overhead.
        
           | littlestymaar wrote:
           | I'm never gonna have my non-tech friend do any of this when
           | they can just go to _chat.openai.com_ and call it a day.
           | 
           | Most people value convenience at the expense of almost
           | everything else when it comes to technology.
        
             | spmurrayzzz wrote:
             | > I'm never gonna have my non-tech friend do any of this
             | 
             | Who was making that assertion? I certainly wasn't.
             | 
             | In the same way I am never going to tell my non-engineer
             | friends to build their own todo app instead of just using
             | something like Todoist. But if they told me they cared
             | about data privacy/security, I'd walk them through the
             | steps if they cared to hear them.
        
       | 1f60c wrote:
       | It's sadly stuck on "Loading model from cache[24/24]: 0MB loaded.
       | 0% completed, 0 secs elapsed." on my iPhone 13 Pro Max :(
        
         | spacebanana7 wrote:
         | I believe it's only compatible with full Chrome / Edge
         | 
         | https://github.com/abi/secret-llama?tab=readme-ov-file#syste...
        
         | pjmlp wrote:
         | Safari doesn't do WebGPU currently.
        
       | Bradd3rs wrote:
       | pretty cool, nice work!
        
       | NayamAmarshe wrote:
       | This is amazing! I always wanted something like this, thank you
       | so much!
        
       | Its_Padar wrote:
       | Very interesting! I would be quite interested to see this
       | implemented as some sort of API for browser chatbots or possibly
       | even local AI powered web games? If you don't know what Ollama is
       | I suggest checking it out. Also I think adding the phi3 model to
       | this would be a good idea.
        
       | zerop wrote:
       | Question - Do I compromise on quality on answers if I use models
       | using WebLLM (like this) compare to using them on system console.
        
       | Dowwie wrote:
       | What therapy prompts have you found useful?
        
         | Y_Y wrote:
         | I usually just go with "and how does that make you feel?"
        
       | nojvek wrote:
       | Yasssssss! Thank you.
       | 
       | This is the future. I am predicting Apple will make progress on
       | groq like chipsets built in to their newer devices for hyper fast
       | inference.
       | 
       | LLMs leave a lot to be desired but since they are trained on all
       | publicly available human knowledge they know something no about
       | everything.
       | 
       | My life has been better since I've been able to ask all sorts of
       | adhoc questions about "is this healthy? Why healthy?" And it
       | gives me pointers where to look into.
        
         | mcculley wrote:
         | They are not "trained on all publicly available human
         | knowledge". Go look at the training data sets used. Most human
         | knowledge that has been digitized is not publicly available
         | (e.g., Google Books). These models are not able to get to data
         | sets behind paywalls (e.g., scientific journals).
         | 
         | It will be a huge step forward for humanity when we can run
         | algorithms across all human knowledge. We are far from that.
        
           | neurostimulant wrote:
           | There is a rumor that OpenAI might've used libgen in their
           | training data.
        
             | mcculley wrote:
             | Someone will. The potential gains are too high to ignore
             | it.
        
         | zitterbewegung wrote:
         | I actually think Apple has been putting neural engines in
         | everything and might be training something like Llama3 for a
         | very long time. Their conversational Siri is probably being
         | neglected on purpose to replace it . They have released papers
         | on faster inference and released their own models. I think
         | their new Siri will largely use on device inference but with a
         | very different LLM.
         | 
         | Even llama.cpp is performant already on macOS.
        
         | maxboone wrote:
         | Groq is not general purpose enough, you'd be stuck with a
         | specific model on your chip.
        
       | r0fl wrote:
       | Could not load the model because Error: Cannot find WebGPU in the
       | environment
        
         | Lex-2008 wrote:
         | Safari, Firefox, or IE? Note the text says:
         | 
         | > Should work on Desktop with Chrome or Edge.
        
         | MayeulC wrote:
         | See: https://github.com/gpuweb/gpuweb/wiki/Implementation-
         | Status#... (I got there from Chromium's console).
         | 
         | On Linux, I had to go to chrome://flags/#skia-graphite and
         | chrome://flags/#enable-vulkan and chrome://flags/#enable-
         | unsafe-webgpu
         | 
         | I think only one of the first is actually required, but I
         | enabled both. That allowed me to make use of TinyLlama with my
         | AMD GPU (R9 Fury, OSS drivers), but I think I'd need Chromium
         | Canary to enable "shader-f16" and use the other models, as I
         | was not able to make it work on regular Chromium.
         | 
         | I haven't tried with Firefox.
        
         | pjmlp wrote:
         | For the foreseeable future, WebGPU is "Works best on Chrome
         | (TM)".
        
       | NikhilVerma wrote:
       | This is absolutely wonderful, I am a HUGE fan of local first
       | apps. Running models locally is such a powerful thing I wish more
       | companies could leverage it to build smarter apps which can run
       | offline.
       | 
       | I tried this on my M1 and ran LLama3, I think it's the quantized
       | 7B version. It ran with around 4-5 tokens per second which was
       | way faster than I expected on my browser.
        
         | abi wrote:
         | Appreciate the kind words :)
        
       | andrewfromx wrote:
       | i asked it "what happens if you are bit by a radio active
       | spider?" and it told me all about radiation poisoning. Then I
       | asked a follow up question: "would you become spiderman?" and it
       | told me it was unable to become anything but an AI assistant. I
       | also asked if time machines are real and how to build one. It
       | said yes and told me! (Duh, you use a flux capacitor, basic
       | physics.)
        
         | abi wrote:
         | Try to switch models to something other than tinyllama (default
         | only because it's the fastest to load). Mistral and Llama 3 are
         | great.
        
       | Jackson_Fleck wrote:
       | This is amazing but can we please set the .prose width to be
       | dynamic? the text column in 3 inches wide on my monitor, it
       | should take up a % of the browser window.
        
       | Jackson_Fleck wrote:
       | ...I think it would be a great idea to graft on a LlamaIndex
       | module here so we can use this local browser LLM to talk to our
       | local documentation https://docs.llamaindex.ai/en/stable/
        
       | wg0 wrote:
       | How do people use something like this as coach or therapist? This
       | is genuine question.
       | 
       | Side note, impressive project. Future of AI is offline mostly
       | with few APIs in the cloud maybe.
        
         | kushie wrote:
         | it's great at offering alternative perspectives
        
         | cal85 wrote:
         | Genuine answer: you say "Be a coach/therapist" followed by
         | whatever you'd say to a coach/therapist.
        
         | pseudosavant wrote:
         | I tried using Claude and ChatGPT like this: I would just write
         | a free form journal entry. The feedback it gave was typically
         | very useful and made journaling more rewarding.
        
           | KennyBlanken wrote:
           | Given user data is folded back into the models, there is a
           | snowball's chance in hell that I would input stuff I'd talk
           | to a therapist about.
           | 
           | When are people going to realize that their interactions with
           | AIs are likely being analyzed/characterized, and that at some
           | point, that analysis will be monetized?
        
             | abi wrote:
             | Use secret llama in a incognito window. Turn off the
             | Internet and close the window when done.
        
         | intended wrote:
         | Ever had a day where your bandwidth was constrained and you
         | just _knew_ something was wrong with a situation, but your
         | brain lacked the juice or dexterity to connect /articulate the
         | issue?
         | 
         | If I have the presence of mind, I offload the work here. At the
         | same time I have a strong understanding of how coaching works,
         | as does my brain.
         | 
         | I suspect that with all things LLM, some amount of proficiency
         | is needed to truly get the prompts to work.
         | 
         | The simplest option is to ask it to be a coach for you. This is
         | going to be hit and miss.
         | 
         | The better version is to specify the kind of coaching you want,
         | or provide a rough outline of the issues on your mind and then
         | ask for what kind of coach or therapist would make sense.
         | 
         | I use either of these for example - 1) over designed -
         | https://chat.openai.com/g/g-KD6jm0l4c-thought-council 2) base
         | ver - https://chat.openai.com/g/g-Cdq3drl87-two-guides
         | 
         | Sadly OpenAI doesnt let you share active chats anymore, so it's
         | going to need a plus subscription.
        
       | low_tech_punk wrote:
       | It's a wrapper of https://github.com/mlc-ai/web-llm
        
         | abi wrote:
         | Yes. Web-llm is a wrapper of tvmjs:
         | https://github.com/apache/tvm
         | 
         | Just wrappers all the way down
        
       | adontz wrote:
       | If anyone knows, is this about the best model one can run locally
       | on an old consumer grade GPU (GXT 1080 in my case)?
        
         | valine wrote:
         | Llama 3 8B is pretty much the king of its model class right
         | now, so yeah. Meta's instruct fine tune is also a safe choice,
         | really the only thing you have to play with is the quantization
         | level. Llama 8b 4bit isn't great, but 8bit might be pushing it
         | on the gtx 1080. I'd almost consider offloading a few layers to
         | the cpu just to avoid dealing with the 4bit model.
        
       | bschmidt1 wrote:
       | Amazing work, feels like a step forward for LLM usability.
       | 
       | Would be interesting if there was a web browser that managed the
       | download/install of models so you could go to a site like this,
       | or any other LLM site/app and it detects whether or not you have
       | models, similar to detecting if you have a webcam or mic for a
       | video call. The user can click "Allow" to allow use of GPU and
       | allow running of models in the background.
        
         | flawsofar wrote:
         | They should just be ubiquitous OS daemons at this point.
         | They're clearly very valuable
        
         | KennyBlanken wrote:
         | Mozilla won't even allow WebSerial to be implemented because it
         | was deemed "too dangerous" - with all sorts of absurd whinging
         | about the devastation that could be unleashed by unsuspecting
         | users allowing a malicious site to access USB serial devices.
         | 
         | When someone pointed out that Chrome has had this functionality
         | for years and the world has not imploded...and has enabled many
         | open source projects and web-based microcontroller IDEs to
         | provide enormous user convenience...the response was a
         | condescending sneer along the lines of "well we actually care
         | about user privacy."
         | 
         | (If Chrome is such a user privacy dumpsterfire, why not
         | implement WebSerial so that people don't have to run Chrome in
         | order to communicate with and program microcontrollers?)
         | 
         | Given they claimed that people's pacemakers and blood glucose
         | monitors would be tampered with if WebSerial were implemented,
         | I'd be shocked if they allowed such low level access to a
         | GPU...
        
           | gordinmitya wrote:
           | already allow by default in nightly builds
        
           | squigz wrote:
           | > (If Chrome is such a user privacy dumpsterfire, why not
           | implement WebSerial so that people don't have to run Chrome
           | in order to communicate with and program microcontrollers?)
           | 
           | This doesn't seem like a logical comparison. Is there no
           | other way to program microcontrollers outside of Chrome?
        
             | creesch wrote:
             | Well sure, downloadable executables. Which I feel like
             | isn't much better and in a lot of ways worse.
        
               | squigz wrote:
               | Can you elaborate on why you think that? I feel like not
               | everything has to be shoved into the browser, as that is
               | in a lot of ways worse
        
         | Cheer2171 wrote:
         | Sounds like ollama with open webui
        
         | abi wrote:
         | Window AI (https://windowai.io/) is an attempt to do something
         | like this with a browser extension.
        
       ___________________________________________________________________
       (page generated 2024-05-04 23:01 UTC)