[HN Gopher] Are Large Language Models a Threat to Digital Public...
       ___________________________________________________________________
        
       Are Large Language Models a Threat to Digital Public Goods?
        
       Author : oss_fan
       Score  : 25 points
       Date   : 2023-07-17 20:37 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | jonny_eh wrote:
       | Does this mean we can't adopt new languages since ChatGPT is
       | frozen to 2022 coding knowledge?
        
         | ggurface wrote:
         | Yes, exactly. It's all over.
        
       | hklparc wrote:
       | Of course they are. Experts will stop posting for free when their
       | output is stolen and displayed without attribution.
        
       | martingalex2 wrote:
       | This is a very intriguing study as it would appear the fact that
       | LLMs need public data to train on, its incentive on the
       | marketplace of ideas is to reduce the very thing that gives it
       | its power. It contains the seeds of its own destruction,
       | destroying the web and open data ethos, and yet another data
       | point pointing at another AI winter.
        
         | flangola7 wrote:
         | AI winter? Labs haven't even began to scratch the surface of
         | the data available to train models on. We may be running out of
         | quality human-written text but we have yet to dive into:
         | 
         | Video Audio Images Heat Motion/acceleration Lidar RF Sonar
         | Radar Network traffic Atmospheric pressure Wind vectors
         | Magnetic fields System/application logs Electrical current UV
         | X-ray Microwave Ionizing particle emissions
        
           | RandomLensman wrote:
           | Interesting, so we chuck in exabytes and more of data
           | generated each day and then what?
        
         | detourdog wrote:
         | For reasons I can't articulate I see LLMs as a vehicle for
         | removing the creators from their ideas. This is very different
         | than search engines. If a search engine generates traffic for
         | documented ideas it creates a community. An LLM based internet
         | seems to remove the creator and shim itself in between for the
         | sake of business.
        
           | RandomLensman wrote:
           | If a handful of firms are indeed allowed to harvest he
           | collective knowledge for rent seeking that would indeed be a
           | shame.
        
           | TeMPOraL wrote:
           | It's tricky, because in many ways, this is achieving exactly
           | what I, as user, want computers to do for me: give me
           | information I requested, and only that. I very much do _not_
           | care about who discovered /created/published it, except only
           | if it helps me quickly ascertain the trustworthiness of said
           | information. I do _not_ want to be forced or prodded to
           | establish relationships with creators or communities. I do
           | _not_ want their ads and upsells.
           | 
           | It's the same issue as with search engines providing
           | "information boxes": huge win for me, but a mortal enemy for
           | those who want to monetize anything resembling intellectual
           | property.
           | 
           | > _An LLM based internet seems to remove the creator and shim
           | itself in between for the sake of business._
           | 
           | This sounds bad, and in some cases it is, but in others it is
           | not. Content farms and recipe sites have creators behind them
           | too.
        
       | dannyobrien wrote:
       | I think one thing that would tip this in a positive way is easy
       | re-sharing of learning. I have plenty of "source code" now for
       | conversations where I've taken something that was hard to find
       | out (or code that was previously hard to write) through a non-LLM
       | route. Publishing those conversations isn't as easy or re-
       | adoptable into the commons as it could be.
       | 
       | (I'll note that proprietary models would be first-order
       | disincentivised to stop this because such conversations can be
       | used to better train other models, but of course they will
       | benefit more generally for maintaining a commons of knowledge to
       | draw from.)
        
       ___________________________________________________________________
       (page generated 2023-07-17 23:00 UTC)