Post AUiRSbfoATDFitAGOG by corbin@defcon.social
(DIR) More posts by corbin@defcon.social
(DIR) Post #AUiIKjrdGtjJBR9EMi by simon@fedi.simonwillison.net
2023-04-16T15:16:44Z
0 likes, 0 repeats
Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it’s very impressivehttps://simonwillison.net/2023/Apr/16/web-llm/
(DIR) Post #AUiIh0qenyW2EKJS4m by simon@fedi.simonwillison.net
2023-04-16T15:20:44Z
0 likes, 0 repeats
This is a full GPT language model that runs /entirely in the browser/ (using WebGPU, which is so new I had to use Chrome Canary on my M2 MacBook) - and it's pretty capable! It can handle summarization, invent puns and even generated me a passable rap battle between an otter and a pelican
(DIR) Post #AUiIt2J7CqwgEKMCMi by simon@fedi.simonwillison.net
2023-04-16T15:21:49Z
0 likes, 0 repeats
This is the latest in my series of posts on running Large Language Models on personal devices https://simonwillison.net/series/llms-on-personal-devices/
(DIR) Post #AUiJnXvsoMNSWyR7dA by simon@fedi.simonwillison.net
2023-04-16T15:33:03Z
0 likes, 0 repeats
Admittedly some answers were better than others!
(DIR) Post #AUiLCPmuKXSsSpf0PA by jeffgreco@indieweb.social
2023-04-16T15:48:17Z
0 likes, 0 repeats
@simon doesn’t seem “wrong” so much as “mocking you”
(DIR) Post #AUiMKxtLB4CuXo1zGq by corbin@defcon.social
2023-04-16T16:01:19Z
0 likes, 0 repeats
@simon I don't understand why this is desirable. As you yourself point out, the amount of data that has to be streamed and cached by the Web browser is unreasonable.
(DIR) Post #AUiMVAFs47kjmhQu8G by simon@fedi.simonwillison.net
2023-04-16T16:03:38Z
0 likes, 0 repeats
@corbin I wrote about that towards the end of my post - the browser has security features that are especially useful when working with LLMs
(DIR) Post #AUiMgmHX1UtOuLkOgK by simon@fedi.simonwillison.net
2023-04-16T16:04:06Z
0 likes, 0 repeats
@corbin you could wrap this whole thing in an Electron app or similar to avoid having to download the model over a network
(DIR) Post #AUiMqy6NvlV2SnOQ3E by corbin@defcon.social
2023-04-16T16:06:42Z
0 likes, 0 repeats
@simon I can trivially prove that my local LLaMA harness won't make any network calls. Doing the same for a browser is a massive headache.Sandboxes are an anti-pattern; they are what we use for untamed software. However, LLMs are brand-new and trivial to tame, so no sandboxes are required.
(DIR) Post #AUiR8H7AtWq6AbGoTY by simon@fedi.simonwillison.net
2023-04-16T16:55:29Z
0 likes, 0 repeats
@corbin the moment you start getting it to generate code for it to execute automatically (an increasingly popular pattern) you're going to want it to have access to a very robust sandbox
(DIR) Post #AUiRSbfoATDFitAGOG by corbin@defcon.social
2023-04-16T16:58:49Z
0 likes, 0 repeats
@simon Generate code in a language which denotes pure total functions. Then automatic execution can't do anything worse than waste a few moments of CPU time or a few GiB/min of RAM, and automatic analysis of code is relatively straightforward.People are mostly generating Python and ECMAScript. EMCAScript technically can be tamed, but Python can't.If you want to generate untrusted code and inspect it, then you need to avoid Turing-completeness. We've known this for like a century.
(DIR) Post #AUiTrPe2E7ipZ5I7W4 by simon@fedi.simonwillison.net
2023-04-16T17:25:51Z
0 likes, 0 repeats
@corbin Python can be tamed if you run it in a WebAssembly sandbox: https://til.simonwillison.net/webassembly/python-in-a-wasm-sandbox
(DIR) Post #AUiWw1XxY8uY8XL8nQ by markus@hachyderm.io
2023-04-16T18:00:04Z
0 likes, 0 repeats
@simon I love how a passable rap battle between two animals has become a measure of software quality.
(DIR) Post #AUiePibtDfCvlxHime by corbin@defcon.social
2023-04-16T19:24:08Z
0 likes, 0 repeats
@simon I guess I ought to write a blog post explaining what taming is. There's an old E document, at least: http://www.erights.org/elib/legacy/taming.htmlYes, WebAssembly is tamed. Yes, emulators written in tamed languages are freely tamed. No, Python's native type theory is not tamed simply by running in a managed runtime; for example, CPython is not tamed, although PyPy has object spaces which are somewhat tame.
(DIR) Post #AUikF6j1oqAf6Uwndw by simon@fedi.simonwillison.net
2023-04-16T20:29:31Z
0 likes, 0 repeats
@corbin that's why I'm very happy to outsource that entire problem to WebAssembly, rather than worrying about it at the level of individual programming languages
(DIR) Post #AUk6GsCjAeqWnS2ulk by StuartGray@mastodonapp.uk
2023-04-17T12:10:50Z
0 likes, 0 repeats
@simon This is some very impressive work, along with their stable diffusion web app. I can't wait until they extend support to more models, preferrably with better (non-academic) licences.The specs say you need a GPU with at least 6.4GB VRAM, presumably to fit the entire LLM model.However, whilst I wouldn't recommend it, I did to get this to work on an old 2008 Quad Core 2 with 8GB RAM & and AMD card with only 4GB VRAM - generates about 2 tokens/sec output.