Post AcFQiuFsmRqqgpgpdY by djf@hachyderm.io
 (DIR) More posts by djf@hachyderm.io
 (DIR) Post #AcFQiuFsmRqqgpgpdY by djf@hachyderm.io
       2023-11-27T22:41:47Z
       
       0 likes, 0 repeats
       
       O’Reilly Media says they’re going to train an LLM on author’s works but will still be able to pay royalties based on what works are used in the AI’s output. I’m not sure that is possible and I’d love to hear what @simon and other LLM-savvy writers think about the planhttps://www.oreilly.com/about/oreilly-approach-to-generative-ai.html
       
 (DIR) Post #AcFQivJopIatzKXUPY by simon@fedi.simonwillison.net
       2023-11-28T00:00:06Z
       
       0 likes, 0 repeats
       
       @djf They don't explicitly say they're training a model, which to me indicates that they're using Retrieval Augmented Generation - that trick where the LLM runs a search for relevant text against their corpus of information and pastes snippets of it back into the LLM to help answer questionsThat's how I'd expect them to build something like this, and it does mean they can account for which author's content is used in answering questions and give credit and remuneration accordingly
       
 (DIR) Post #AcFQvx6dUa9HIQaN2e by simon@fedi.simonwillison.net
       2023-11-28T00:02:06Z
       
       0 likes, 0 repeats
       
       @djf Huh, on https://www.oreilly.com/about/oreilly-approach-to-generative-ai.html they do say:> we’re training an LLM whose answers can be depended on because it’s being trained solely on trusted contentMy hunch is that is just clumsy wording on their part though - I would expect them to be using RAG, which I believe gives much better results for this kind of citation-based knowledge augmentation than fine-tuning a full model
       
 (DIR) Post #AcFRB82b7R2nYOvvtI by simon@fedi.simonwillison.net
       2023-11-28T00:05:09Z
       
       0 likes, 0 repeats
       
       @djf If I'm wrong and they're training a new model as opposed to doing RAG then yeah, I don't know how they would provide citations or account for which author's content is being used to answer questions
       
 (DIR) Post #AcFSSdUhWtFXyuzTVo by djf@hachyderm.io
       2023-11-28T00:20:07Z
       
       0 likes, 0 repeats
       
       @simon thanks, Simon!