[HN Gopher] Are Large Language Models a Threat to Digital Public...
___________________________________________________________________
Are Large Language Models a Threat to Digital Public Goods?
Author : oss_fan
Score : 25 points
Date : 2023-07-17 20:37 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jonny_eh wrote:
| Does this mean we can't adopt new languages since ChatGPT is
| frozen to 2022 coding knowledge?
| ggurface wrote:
| Yes, exactly. It's all over.
| hklparc wrote:
| Of course they are. Experts will stop posting for free when their
| output is stolen and displayed without attribution.
| martingalex2 wrote:
| This is a very intriguing study as it would appear the fact that
| LLMs need public data to train on, its incentive on the
| marketplace of ideas is to reduce the very thing that gives it
| its power. It contains the seeds of its own destruction,
| destroying the web and open data ethos, and yet another data
| point pointing at another AI winter.
| flangola7 wrote:
| AI winter? Labs haven't even began to scratch the surface of
| the data available to train models on. We may be running out of
| quality human-written text but we have yet to dive into:
|
| Video Audio Images Heat Motion/acceleration Lidar RF Sonar
| Radar Network traffic Atmospheric pressure Wind vectors
| Magnetic fields System/application logs Electrical current UV
| X-ray Microwave Ionizing particle emissions
| RandomLensman wrote:
| Interesting, so we chuck in exabytes and more of data
| generated each day and then what?
| detourdog wrote:
| For reasons I can't articulate I see LLMs as a vehicle for
| removing the creators from their ideas. This is very different
| than search engines. If a search engine generates traffic for
| documented ideas it creates a community. An LLM based internet
| seems to remove the creator and shim itself in between for the
| sake of business.
| RandomLensman wrote:
| If a handful of firms are indeed allowed to harvest he
| collective knowledge for rent seeking that would indeed be a
| shame.
| TeMPOraL wrote:
| It's tricky, because in many ways, this is achieving exactly
| what I, as user, want computers to do for me: give me
| information I requested, and only that. I very much do _not_
| care about who discovered /created/published it, except only
| if it helps me quickly ascertain the trustworthiness of said
| information. I do _not_ want to be forced or prodded to
| establish relationships with creators or communities. I do
| _not_ want their ads and upsells.
|
| It's the same issue as with search engines providing
| "information boxes": huge win for me, but a mortal enemy for
| those who want to monetize anything resembling intellectual
| property.
|
| > _An LLM based internet seems to remove the creator and shim
| itself in between for the sake of business._
|
| This sounds bad, and in some cases it is, but in others it is
| not. Content farms and recipe sites have creators behind them
| too.
| dannyobrien wrote:
| I think one thing that would tip this in a positive way is easy
| re-sharing of learning. I have plenty of "source code" now for
| conversations where I've taken something that was hard to find
| out (or code that was previously hard to write) through a non-LLM
| route. Publishing those conversations isn't as easy or re-
| adoptable into the commons as it could be.
|
| (I'll note that proprietary models would be first-order
| disincentivised to stop this because such conversations can be
| used to better train other models, but of course they will
| benefit more generally for maintaining a commons of knowledge to
| draw from.)
___________________________________________________________________
(page generated 2023-07-17 23:00 UTC)