Post B0qBbsMAGUbmVIdKTY by jannem@fosstodon.org
 (DIR) More posts by jannem@fosstodon.org
 (DIR) Post #B0q9oLKIBLL3v1Zh0y by mntmn@mastodon.social
       2025-12-02T13:48:23Z
       
       0 likes, 0 repeats
       
       in terms of "finding things in large texts", for example "find a page in this pdf that mentions both shutdown mode and reg18", are there interesting alternatives to all that llm stuff beyond regex search? are there natural language processing systems that are precise/reliable and understandable? i imagine something like a fuzzy parser with stemming and some sort of ontologies, synonyms and logical inference
       
 (DIR) Post #B0q9oMh16L6u9nDM4e by mntmn@mastodon.social
       2025-12-02T13:55:03Z
       
       1 likes, 0 repeats
       
       i don't like llms because they consume a lot of power and are connected to all the ai greed hype, they have to be strangely trained, their representations are not introspectable, they make tons of errors/are not reliable at all etc. i'd rather like a sharp, more machinistic tool that just clearly says "error" when it can't do the job. grep is such a tool--would be nice to have a grep that can clean up and normalize messy human language a bit
       
 (DIR) Post #B0qAAbeu8fnYwZpdjc by fmn@mastodon.social
       2025-12-02T14:00:41Z
       
       1 likes, 0 repeats
       
       @mntmn "are there natural language processing systems that are precise/reliable and understandable?" - let me wear my noam chomsky hat for a second: there are no such systems and never will be. natural language is ever changing and ambiguous, and parties involved often don't have - sufficiently - common context required for precise communication. this is why people talking or even reading have so many back-and-forths.
       
 (DIR) Post #B0qBbqDEEkGvr0RAGm by jannem@fosstodon.org
       2025-12-02T13:50:06Z
       
       0 likes, 0 repeats
       
       @mntmn No, not really. And that's a reason why small LLMs as language processors (not chatbots) are exciting.
       
 (DIR) Post #B0qBbrPJnHXLYn6Kki by mntmn@mastodon.social
       2025-12-02T13:50:43Z
       
       0 likes, 0 repeats
       
       @jannem i somehow find that hard to believe
       
 (DIR) Post #B0qBbsMAGUbmVIdKTY by jannem@fosstodon.org
       2025-12-02T14:10:33Z
       
       0 likes, 0 repeats
       
       @mntmn I mean, there's been many attempts. Especially for constrained applications such as a corporate document store and things like that. As far as I know, none of those systems were ever a success.
       
 (DIR) Post #B0qBbt81OYtCtj1Y4O by wolf480pl@mstdn.io
       2025-12-02T14:28:03Z
       
       1 likes, 0 repeats
       
       @jannem @mntmn AFAIK the way these LLM tools work is they have an embedding of words into a  vector space, they index text by converting every word in a every document to a vector, and storing it in a database together with ID of the document it came from, and then when you search, they turn each of the query words into vectors, and search for K nearest neighbors in the vector space for each of them.Then they feed the documents they found to an LLM.What if you skipped the last step?
       
 (DIR) Post #B0qJocDaNTRzqjCrEO by wolf480pl@mstdn.io
       2025-12-02T15:59:57Z
       
       0 likes, 0 repeats
       
       @pixxNo, because that uses exact match.The thing with the vector embedding is it tries to be semantic, i.e. place words with similar meaning near each other. @jannem @mntmn
       
 (DIR) Post #B0qcwqu3YrAYq8iGAq by ignaloidas@not.acu.lt
       2025-12-02T19:34:25.693Z
       
       0 likes, 0 repeats
       
       @wolf480pl@mstdn.io @jannem@fosstodon.org @mntmn@mastodon.social you can, it's kinda what LLMs grew out from, sometimes it's called semantic search (tho that term is very ambiguous these days) https://en.wikipedia.org/wiki/BERT_(language_model)
       
 (DIR) Post #B0qi8jnMJWuc2a4Cxc by mikoto@akko.wtf
       2025-12-02T19:35:41.388804Z
       
       0 likes, 0 repeats
       
       @ignaloidas @wolf480pl @mntmn @jannem BERT is pretty nice, used it a bunch in uni
       
 (DIR) Post #B0qi8ktmD9djSm4qbQ by wolf480pl@mstdn.io
       2025-12-02T20:32:33Z
       
       0 likes, 0 repeats
       
       @mikotocan you run it on a 16GB GPU?@jannem @mntmn @ignaloidas