Post ASMEhB9PHMCGT4zBTM by simon@fedi.simonwillison.net
(DIR) More posts by simon@fedi.simonwillison.net
(DIR) Post #ASMEWNMLXyRf9Bt3ce by simon@fedi.simonwillison.net
2023-02-05T04:04:19Z
0 likes, 0 repeats
Large language models are way too big to run on my own hardware... but I realize I don't actually want one with vast amounts of information baked in (able to answer questions about any topic)I effectively want a calculator for words: I want something that can summarize text and extract terms and maybe refactor code a bit, and that I can feed extra reference information too when it makes sense to do so
(DIR) Post #ASMEhB9PHMCGT4zBTM by simon@fedi.simonwillison.net
2023-02-05T04:07:28Z
0 likes, 0 repeats
An most interesting lessons of large language models so far is that there's this magical size boundary after which they suddenly start growing all kinds of weird and amazing capabilities - so the size really, really does matterEven tasks like text summarization appear to require a vast catalog of "common sense" factual information in order to work effectivelyBut does it really need THAT much encyclopedic information to achieve those effects? What's the smallest training size that would work?
(DIR) Post #ASMEtGGO61p7nd0pqC by simon@fedi.simonwillison.net
2023-02-05T04:09:11Z
0 likes, 0 repeats
So my question is this: what are the chances that a large language model could be trained which is large enough to work as a calculator-for-words, but small enough to run on, say, an M2 Max MacBook Pro with 64GB of RAM?Is that already known to be impossible, or is there research that hints that this could be achieved given the right optimizations and a really well chosen training set?
(DIR) Post #ASMFOFHBoetDiPA6IC by simon@fedi.simonwillison.net
2023-02-05T04:14:46Z
0 likes, 0 repeats
If I was fully up to speed on a hundred or so language model papers I'd probably know the answer to this question already... but I'm not very good at reading papers, so I'm asking here instead!
(DIR) Post #ASMFZa7EfxdbX6uZsW by aaron@social.huslage.com
2023-02-05T04:17:39Z
0 likes, 0 repeats
@simon some of the Flan-T5 models are like 3gb I think and are quite capable.
(DIR) Post #ASMG3K1J5EZGIiJseG by groby@hachyderm.io
2023-02-05T04:22:52Z
0 likes, 1 repeats
@simon What if you asked an LLM to efficiently summarize its entire input corpus and then trained based on that? ;) (I'm not sure if I'm kidding, actually)But if we set that aside, IIRC the smalles model supporting Chain-Of-Thought is currently Flan-T5, and that's 700GB. So about an order of magnitude away. There's some research showing that might happen: https://arxiv.org/abs/2110.08207(OK, it's an order of mag down for ChatGPT3, but..)
(DIR) Post #ASMGKoEvypYuwt24O0 by sqncs@mstdn.social
2023-02-05T04:26:19Z
0 likes, 0 repeats
@simon this isn't a machine learning question directly. It's primarily a linguistics and information theory question wrt determining the number of parameters you need to succinctly produce statistically meaningful inferences given a problem defined by inputs that are solely language. Most NLP papers don't touch this sort of material in my experience, which is limited.
(DIR) Post #ASMGlkmKxOZA25VKIy by TedUnderwood@sigmoid.social
2023-02-05T04:33:07Z
0 likes, 0 repeats
@simon So, I'm not expert here. But I found this thread interesting, and for what it's worth you seem to be thinking in a very similar direction. Getting down to say 70B parameters seems to be the goal; that could fit in 1 GPU (I assume at the high end of the market; idk about a MacBook). And to achieve that, yep, you might offload the factual knowledge. https://twitter.com/spolu/status/1620381471583399936
(DIR) Post #ASMJb4YLOc8Y2SYdd2 by simon@fedi.simonwillison.net
2023-02-05T05:02:59Z
0 likes, 0 repeats
@aaron yeah I've run a T5 embedding model on my machine already, it worked great https://til.simonwillison.net/python/gtr-t5-largeI'm not sure if it's possible to run a GPT-3 style text generation T5 model yet, but I'd be very keen to try one!
(DIR) Post #ASMPIAPPzyqAQKkd5U by drwho@hackers.town
2023-02-05T06:06:50Z
0 likes, 0 repeats
@simon From a similar attempt sixteen days ago, minimal.
(DIR) Post #ASMPSqJH4DH5Gyy4Q4 by cydonian@social.vivaldi.net
2023-02-05T06:07:08Z
0 likes, 0 repeats
@simon You can already run Stable Diffusion on an iPhone Pro 13+ and an iPad Pro. I believe we are only a few months away from LLM’s running natively on an M1 or M2 chip. But more importantly: the calculator analogy is inappropriate with generative models. The output isn’t deterministic. A better analogy (per Ars Technica) is of a slot machine: you get an output that _likely_ to be in the direction of what you want.
(DIR) Post #ASMPeOPOd3beDJL4Mq by simon@fedi.simonwillison.net
2023-02-05T06:11:02Z
0 likes, 0 repeats
@cydonian by calculator I mean "general purpose tool" - I know they're not deterministic, but the thing I'm interested in is a tool I can use to solve natural language processing on my own machineI'm using GPT-3 as a general purpose language manipulation tool at the moment - it's much better at that than it is at looking up facts, where it frequently hallucinates
(DIR) Post #ASMX4Dq3FZTcdi4PSK by DavidObando@hachyderm.io
2023-02-05T07:33:44Z
0 likes, 0 repeats
@simon not just research but actual products are out. I was part of the team that produced Visual Studio IntelliCode line completions, think of it as GMail’s smart compose for code (C#, Python, JS/TS, C++) and runs locally in the IDE.We used a lot of compression techniques to make it product friendly as the size not only determines the memory it consumes but also the latency in providing responses. Here’s a paper by our data science team on GPT-C:https://arxiv.org/pdf/2005.08025.pdf
(DIR) Post #ASN1IEJTHPt0n3QVFY by DavidObando@hachyderm.io
2023-02-05T07:46:05Z
0 likes, 0 repeats
@simon There's a little demo of IntelliCode completions in the video here: https://devblogs.microsoft.com/visualstudio/type-less-code-more-with-intellicode-completions/This all runs in the local computer, consumes 1 GB of memory tops, and produces predictions within a reasonable threshold (varies per computer, but for modern machines it's under 60 ms).The underlying tech is similar to what you're discussing: a LLM focused on a specialized body of knowledge and compressed to fit in a local computer.
(DIR) Post #ASN1IErVEt6mUbLgIK by simon@fedi.simonwillison.net
2023-02-05T13:12:27Z
0 likes, 0 repeats
@DavidObando Wow! I had no idea Visual Studio had a Copilot-like autocomplete feature running entirely on the local machine back in 2021
(DIR) Post #ASN6XzLxATZknSvAfI by zubakskees@mastodon.social
2023-02-05T14:11:08Z
0 likes, 0 repeats
@simon There's been some research on pruning large language models to smaller sizes or making them more efficient. I think it's entirely possible you could get reasonable performance from one small enough to run on a laptop, but I'm not sure about whether it'd feel as smart as ChatGPT.
(DIR) Post #ASNKcTdLvTCZE95sum by simon@fedi.simonwillison.net
2023-02-05T16:49:07Z
0 likes, 0 repeats
@zubakskees that's the thing I really want to understand: how much of the size requirement of GPT-3 is because it knows vast numbers of obscure facts about celebrities and history and geography and suchlike, and how much of that size is necessary to be a really good calculator-for-words?
(DIR) Post #ASNKorD06ISFoofgSu by zubakskees@mastodon.social
2023-02-05T16:51:15Z
0 likes, 0 repeats
@simon You can absolutely boil these things down to be good and more efficient at one task.
(DIR) Post #ASOUkD58Xhk4uPNMFk by rf@mas.to
2023-02-06T06:14:24Z
0 likes, 0 repeats
@simon A fun idea is having a relatively small model search a bigger-than-RAM corpus for content related to the query and effectively 'read up' on the topic before it answers: https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokensSeems believable an LM with tools will look smarter than one without them; dunno if you still get the weird capabilities
(DIR) Post #ASOV5CNCNADmnApeWu by simon@fedi.simonwillison.net
2023-02-06T06:21:18Z
0 likes, 0 repeats
@rf I've been experimenting with that pattern myself, it's surprisingly effective: https://simonwillison.net/2023/Jan/13/semantic-search-answers/
(DIR) Post #ASOW6h4Xorb2FPNqXw by simon@fedi.simonwillison.net
2023-02-06T06:32:44Z
0 likes, 0 repeats
... oh wait, that's not the same as the paper you're referencing there at all (though it does have a faint echo)That Retro paper is absolutely the kind of advance I'm hoping for that might let me run a LLM on my own hardware some day!
(DIR) Post #ASOWU9RsU0CaupU0mW by simon@fedi.simonwillison.net
2023-02-06T06:36:51Z
0 likes, 1 repeats
Wow, this paper is absolutely relevant to my question about running LLMs on my own hardware some day:https://www.deepmind.com/blog/improving-language-models-by-retrieving-from-trillions-of-tokens"In our experiments on the Pile, a standard language modeling benchmark, a 7.5 billion parameter RETRO model outperforms the 175 billion parameter Jurassic-1 on 10 out of 16 datasets and outperforms the 280B Gopher on 9 out of 16 datasets."Via https://mas.to/@rf/109816320108103155
(DIR) Post #ASPA07Hdxr9JgCvfYe by kellogh@hachyderm.io
2023-02-06T13:59:21Z
0 likes, 0 repeats
@simon @rf is there a full paper? that blog was too enticing to just stop reading
(DIR) Post #ASPIuXEhSZXs1Ixw3s by timrburnham@mastodon.social
2023-02-06T15:37:54Z
0 likes, 0 repeats
@simon oh wow. Could this explicit reference back to the training set make it easier to identify the inputs used to generate a specific reply? Basically a Works Cited list along with its response.
(DIR) Post #ASPJ6QrUz7Udj3mpMG by simon@fedi.simonwillison.net
2023-02-06T15:41:25Z
0 likes, 0 repeats
@timrburnham there's a trick for doing that today which a few of the "AI search engines" are using, best illustrated by this leaked prompt: https://simonwillison.net/2023/Jan/22/perplexityai/I don't think it's infallible (nothing in AI land ever is) but it's an interesting direction
(DIR) Post #ASPKMsh3NLF2dCJAMy by timrburnham@mastodon.social
2023-02-06T15:52:36Z
0 likes, 0 repeats
@simon Very cool, but kind of the inverse of what I want. I don't want to force an answer using only specified input data. I want to ask an open-ended question, then read for myself the original sources the LLM used to answer.Less like googling with "site:stackoverflow.com", more like reading a Wikipedia summary then diving into the References.
(DIR) Post #ASPKk5E4wg5qA2oqUS by simon@fedi.simonwillison.net
2023-02-06T15:55:17Z
0 likes, 0 repeats
@timrburnham my hunch is that with current LLMs the accurate list of citations for even a one sentence answer would be thousands of links, each to a three or four word snippet - ditto for image generatorsMaybe the Retro paper architecture would deliver much better results for that kind of thing though?
(DIR) Post #ASPdnZndd1asyspZqK by dahukanna@mastodon.social
2023-02-06T19:33:20Z
0 likes, 0 repeats
@simon @timrburnham isn’t that a “get out of jail” move? Use an external normalised database to generate a reference list. Why not build “provenance” into the LLM functionality? Would be a step in the transparency and explainability direction.
(DIR) Post #ASPh4ObMoach2Knq40 by simon@fedi.simonwillison.net
2023-02-06T20:04:13Z
0 likes, 0 repeats
@dahukanna @timrburnham I've got the impression that building provenance into LLMs is something that people would very much like to be able to do, but is incredibly difficult to actually achieve against the way LLMs work at the momentI'm hoping some brilliant new piece of research will figure it out - or that someone will come up with a new architecture to use in place of LLMs that can solve that problem
(DIR) Post #ASPufndrye1Qoh8hH6 by notsoloud@expressional.social
2023-02-06T22:42:24Z
0 likes, 0 repeats
@simonAs a simple guide to. scale, perhaps try out metas Galactica? It's available in several sizes from 125m to 120B. The smallest is definitely useless, and the largest is definitely too large for your mac but at least it shows what is gained with size.
(DIR) Post #ASPx40veQdBksp89FA by Albot@sigmoid.social
2023-02-06T23:08:44Z
0 likes, 0 repeats
@simon @dahukanna @timrburnham my guess would be QNRs. Graph + embedding . https://www.fhi.ox.ac.uk/qnrs/