Post AqKybocEFYTO9X3g00 by skategyrl@gamerstavern.online
(DIR) More posts by skategyrl@gamerstavern.online
(DIR) Post #AqKwfqW0vdyLIKwgeu by futurebird@sauropods.win
2025-01-22T10:03:35Z
0 likes, 0 repeats
I try not to bad mouth tech without trying it sincerely. At last I've found two ways to use LLMs1. This one is a little risky, since it means giving an LLM your writing. But, if you ask an LLM to scan your short story and summarize it this can be a good first stage of feedback to check if the plot you *thought* you wrote is really there.2. Google's notebook LLM can be fed a stack of science papers in your "read later" folder and the summary it makes is useful for deciding what to read next.
(DIR) Post #AqKwrOOfIZWpYt0vbs by lavaeolus@fedihum.org
2025-01-22T10:05:39Z
0 likes, 0 repeats
@futurebird are you also interested in easy ways to run LLMs locally on middle-grade hardware?(Without needing any coding-knowledge)
(DIR) Post #AqKwsMEooJIr9iruam by futurebird@sauropods.win
2025-01-22T10:05:44Z
0 likes, 0 repeats
My next step is to find out how to do the first one locally. I do not want to give some cloud service all of my short stories (even if they pinky promise not to use them). I also hope the paper summary thing could just run on my own machine. I do think both of these services MUST use the wider database to write the responses. I'm just not giving it enough data to write this variety of sentences. Hmmm.
(DIR) Post #AqKx95WODQWeEW45S4 by futurebird@sauropods.win
2025-01-22T10:08:54Z
0 likes, 0 repeats
@lavaeolus Yes!And I've also been looking for a way to do a CS lesson where we build such a system for some very small limited application, to help students understand exactly what they are doing but so far the code has been too dense and I don't want to use a bunch of opaque libraries.The lessons I've found have ranged from "how to configure an LLM" to one that was just about writing prompts? ugh. These are related but not identical things I'm looking for.
(DIR) Post #AqKxSDfGAEJyzTKuw4 by futurebird@sauropods.win
2025-01-22T10:12:20Z
0 likes, 0 repeats
The summary also identified some common threads in the papers on ants I've been collecting that I had not noticed yet as I have not read all of them closely yet. And this gave me search terms to find even more papers I'm interested in as well as better articulate what the heck it is that I'm interested in. I liked how it had little footnote numbers that linked to the papers I gave it so I could follow up on the bullet points.
(DIR) Post #AqKxkUFvDUGJezCBF2 by lavaeolus@fedihum.org
2025-01-22T10:15:39Z
0 likes, 0 repeats
@futurebird 🫡I haven't found any good write-ups, but have tried it myself:https://lmstudio.ai/(It has three modes: User, Power User, Developer - the UI ranges from easy to advanced)A very well performing model (on a laptop with AMD Ryzen 5 5600U 2,30 GHz, 8GB RAM (of with 5,86 GB are usable):https://huggingface.co/lmstudio-community/Phi-3.1-mini-128k-instruct-GGUF
(DIR) Post #AqKy7iROdv40JZdg36 by futurebird@sauropods.win
2025-01-22T10:18:52Z
0 likes, 0 repeats
@u0421793 How is a local LLM able to write sentences for the summary that are in the style of an accessible outline without having examples of accessible outlines?I would assume that it could only produce more text that sounds like the research papers full of dense language, and with an abstract at the start and all of the features of science papers.
(DIR) Post #AqKyApdSLmO5UrlX8q by crypticcelery@chaos.social
2025-01-22T10:20:23Z
0 likes, 0 repeats
@futurebird first, a small technical question: Why would option one need access to the wider database (if it just about a short story)? Am I misconstruing what the wider database means?Otherwise, the summary could prove more difficult locally, since it uses a lot of context, which might need a lot of time and memory.
(DIR) Post #AqKyUBWtAm9YtzvEVU by crypticcelery@chaos.social
2025-01-22T10:23:55Z
0 likes, 0 repeats
@futurebird and, on another note, I do appreciate people being open to technology.One thing I would like to note (which was not mentioned, but you might already aware of) is the consideration of ethics here. I am somewhat hesitant even using the open weights models, since one does not know how the data was gathered, especially from a labour perspective.Though I have not done any digging into more ethical foundation models...
(DIR) Post #AqKybocEFYTO9X3g00 by skategyrl@gamerstavern.online
2025-01-22T10:25:14Z
0 likes, 0 repeats
@futurebird perhaps a #raspberrypi solution could work for you? Like llama or ollama?
(DIR) Post #AqKyrmRU687S2te4hM by crypticcelery@chaos.social
2025-01-22T10:28:10Z
0 likes, 1 repeats
@futurebird @u0421793 maybe this helps here:The LLM has very likely "seen" examples of an accessible outline as part of its training data.The LLM is just a very fancy autocomplete, or a compressed statistical representation of all the text it was trained on.That body text probably contained many "summarize this text" tasks you would get in an english class with solution, so it is conditioned to produce a shorter summary after "seeing" a dense paper at inference time (= when you use it).
(DIR) Post #AqKzBfQNtorm4Xc6fQ by futurebird@sauropods.win
2025-01-22T10:31:48Z
0 likes, 0 repeats
@crypticcelery @u0421793 So when I have a local LLM I have this data also locally in the compressed form? They must be rather large programs.
(DIR) Post #AqKzUMak8yoHAINFDs by crypticcelery@chaos.social
2025-01-22T10:35:08Z
0 likes, 0 repeats
@futurebird @u0421793 kind of.These large language models are just very large neural networks (actually multiple networks), so the model is just a bunch of matrices in a math sense.They start with random values, and are then told to predict for the training data.Based on the prediction errors and their derivatives, the matrices are altered.This is the training process, the model is the matrices.At inference, the input is converted to numbers, and the math is done once (for each new token).
(DIR) Post #AqL0ukenOMAhHmPrM0 by futurebird@sauropods.win
2025-01-22T10:51:08Z
0 likes, 0 repeats
@bri_seven @crypticcelery @u0421793 can you say some more about this "attention" thing? Is this the current "cue" possibly a set of weighting parameters so that as you ask additional questions it is able to adapt its responses to your most recent cumulative inputs?
(DIR) Post #AqL1HTmIJWNBb3lFA0 by crypticcelery@chaos.social
2025-01-22T10:55:13Z
0 likes, 0 repeats
@futurebird @bri_seven It is more local actually, but essentially.I like to (simplification) compare it to how you or I might read a text. When you read "The king met the prince by the calm lake, he ...", you need to figure out what "he" refers to, likely the prince or the king.And this is kind of vaguely what attention does (we don't actually have a mapping to concepts we understand): For each token in context, figure out what it refers to and "correlate" it. (1/2)
(DIR) Post #AqL49Pjvk9VGcPV86K by futurebird@sauropods.win
2025-01-22T11:26:24Z
0 likes, 0 repeats
@u0421793 @bri_seven @crypticcelery I think this also explains why socializing is so exhausting generally. Keeping so many threads open in real time just in case. Pour some cool water on me and maybe I’ll run more efficiently.
(DIR) Post #AqL4XK5oOHaK3R8Wxc by futurebird@sauropods.win
2025-01-22T11:30:42Z
0 likes, 0 repeats
@u0421793 Hm. So this trading data prep-work is like when you make indices to speed up a search. Or like a restaurant where they cook all night making frozen and nearly ready food for a vast menu so they can deliver the customers order in only a few minutes.
(DIR) Post #AqL5DTDGs79OVFisrY by futurebird@sauropods.win
2025-01-22T11:39:20Z
0 likes, 0 repeats
@IngaLovinde Summarizing and producing a summary aren’t the same thing, that is a subtle but important distinction. I think I drove the “AI expert” they had to come talk to us for professional development a little crazy with such quibbles and distinctions even though I was trying my best to be “not difficult”He did say “using AI like a search engine isn’t a good application for this technology …” but then said “it more for synthesis.” and I was like oh no no no
(DIR) Post #AqL7EJ9fx7FNVTZIq8 by futurebird@sauropods.win
2025-01-22T12:01:54Z
0 likes, 0 repeats
@IngaLovinde “… when will shortening the text be good enough for a reliable summary? Probably only when … volume is a good predictor of importance.”This is just what I needed for the ant papers. A single page summary that grouped them by the things they repeat so, as I do more close reading, I can read related papers together.Still the results look more sophisticated than they are.
(DIR) Post #AqLAizl4Q3PIBiNrEW by janbogar@mastodonczech.cz
2025-01-22T12:41:00Z
0 likes, 0 repeats
@futurebird @bri_seven @crypticcelery @u0421793 I don't think understanding attention gives usefull insights.It's a technical detail of how modern LLMs work.Previous state of the art language models were recurrent neural networks (RNN). Text is a long sequencre of words, and meaning of words depend on all the previous words. To RNN, you fed the words one by one, and after each word, it updated it's internal state, which represented "what the previous text was about".1/3
(DIR) Post #AqLCg3i5TAxFU4iSg4 by janbogar@mastodonczech.cz
2025-01-22T12:44:12Z
0 likes, 0 repeats
@futurebird @bri_seven @crypticcelery @u0421793 2/3The advantage of RNNs was, that in this way, they could in theory process text of any length. The problem was, that it didn't work very well, and they tended to forget what the text was about, because teaching them to update their internal state properly turned to be really hard.Then came paper "Attention is all you need" which introduces completely different way of doing it based on attention. They called their neural network Transformer.
(DIR) Post #AqLCg54oOAj5iqM7jk by janbogar@mastodonczech.cz
2025-01-22T12:50:23Z
0 likes, 0 repeats
@futurebird @bri_seven @crypticcelery @u0421793 3/3Transformer has layers. Each layer looks at all the words at the same time, not one by one like RNNs.For each word, it looks at all the other words and computes their attention score, which reflects how important they are to determine its meaning.Then it enriches vector representation of the word with corrections from all the other words, proportional to their attention scores.This enriched representation is sent to next layer.
(DIR) Post #AqLCg6daaM8gYzdPaC by janbogar@mastodonczech.cz
2025-01-22T12:57:05Z
0 likes, 1 repeats
@futurebird @bri_seven @crypticcelery @u0421793 4/3 (I misjudged)Transformers proved easier to train and better able to capture dependencies between distant words.The problem was that computing attention for every pair of words in e.g. a book is impossible, so they can only take into account a text of limited length.That's why ChatGPT has a context window - it looks at only some number of words at the end of text to predict the next word. For long texts, it doesn't see their beginnings.
(DIR) Post #AqLD1nwukTlyyQ1KwC by futurebird@sauropods.win
2025-01-22T13:05:53Z
0 likes, 0 repeats
@u0421793 @crypticcelery How big is it?
(DIR) Post #AqLDUKeIlUzeUxux1c by crypticcelery@chaos.social
2025-01-22T13:12:00Z
0 likes, 0 repeats
@futurebird @u0421793 That varies from model to model and primarily depends on two factors: The number of the parameters (numbers used for the models math, see other threads) and the precision of the parameters (i.e. how many "digits" you store of the parameters).For example, LLaMa 3.2 on device comes in variants 1B (1.21 billion parameters) and 3B (3.21 billion). A typical representation is a so called FP16, which is 2 bytes per number, so that would be 2.41GB or 6.42GB respectively. (1/2)
(DIR) Post #AqLDr3T0oWALryTKXw by NearerAndFarther@techhub.social
2025-01-22T13:16:06Z
0 likes, 0 repeats
@futurebird @u0421793 @crypticcelery You can see some of the details of different models, including size, at the Ollama site:https://ollama.com/searchThey really range but most of the very capable but not huge ones are currently running between 3 and 5 GBs.Ollama is also one of the key tools that can help you run an LLM locally, which I've found a good way to learn more about them.
(DIR) Post #AqLDsxS29RUrn6vDP6 by faassen@fosstodon.org
2025-01-22T13:16:29Z
0 likes, 0 repeats
@futurebird@u0421793 @crypticcelery Local LLMs need enormous amounts of memory. There are "tiny" models that can be run locally fairly well and need a few gigabytes. But they aren't very good. Models that get closer to hosted models like ChatGPT need more memory than even a big consumer video card has (40 gigabytes+) even in quantized (slightly degraded but smaller) form.
(DIR) Post #AqLEkCewL2DWJ3LdJY by faassen@fosstodon.org
2025-01-22T13:19:32Z
0 likes, 0 repeats
@futurebird@u0421793 @crypticcelery Running them locally involves using much slower main memory, or a modern Apple which has fast unified memory with its video card, or multiple video cards. You can also run stuff online in a rented temporary server with non consumer gpus which have big memory (different from the normal cloud in that you controm its software)
(DIR) Post #AqLEkDecdhYbOMCtSS by futurebird@sauropods.win
2025-01-22T13:26:06Z
0 likes, 0 repeats
@faassen @u0421793 @crypticcelery My job keeps me in nice Macbooks. That shouldn’t be the issue.
(DIR) Post #AqLEkHDTOaDWRJXGhE by faassen@fosstodon.org
2025-01-22T13:24:15Z
0 likes, 0 repeats
@futurebird@u0421793 @crypticcelery There are various developments in the pipeline that make running huge models locally more in reach. Non Apple PC hardware with fast unified memory are going to be more available in the relatively near future. Nvidia also recently announced an AI workstation (they want to avoid eating into their own server market though)More speculatively there are people working on new hardware architectures but those are further down the line.
(DIR) Post #AqLGHWZzOdsLb8U86y by faassen@fosstodon.org
2025-01-22T13:43:18Z
0 likes, 0 repeats
@futurebird @u0421793 @crypticcelery I don't know much about those, but I take it the more memory the better. This article should contain helpful information https://simonwillison.net/2024/Dec/9/llama-33-70b/@simon is one of the best sources about LLMs out there.https://simonwillison.net/