[HN Gopher] Understanding and managing the impact of machine lea...
___________________________________________________________________
Understanding and managing the impact of machine learning models on
the web
Author : kaycebasques
Score : 57 points
Date : 2024-04-04 19:11 UTC (3 hours ago)
(HTM) web link (www.w3.org)
(TXT) w3m dump (www.w3.org)
| kaycebasques wrote:
| This would have been a better link:
| https://www.w3.org/reports/ai-web-impact/
| dang wrote:
| Ok, we've changed to that from https://github.com/w3c/ai-web-
| impact above. Thanks!
| MacsHeadroom wrote:
| > the copyright system creates a (relatively) shared
| understanding between creators and consumers that, by default,
| content cannot be redistributed, remixed, adapted or built upon
| without creators' consent. This shared understanding made it
| possible for a lot of content to be openly distributed on the
| Web.
|
| That is not remotely a shared understanding, is wrong, and has
| nothing to do with making it possible for a lot of content to be
| openly distributed on the web. Content is distributed quite
| widely without concern for copyright.
|
| > A number of AI systems combine (1) automated large-scale
| consumption of Web content, and (2) production at scale of
| content, in ways that do not recognize or otherwise compensate
| content it was trained from.
|
| > While some of these tensions are not new (as discussed below),
| systems based on Machine Learning are poised to upend the
| existing balance. Unless a new sustainable equilibrium is found,
| this exposes the Web to the following undesirable outcomes:
|
| > Significantly less open distributed content (which would likely
| have a disproportionate impact on the less wealthy part of the
| population)
|
| That's even more ridiculous. The wealthy stand the most to gain
| from restricting the flow of information to channels which
| collect rent on behalf of their capital. It's the "less wealthy"
| who routinely find ways to distribute content outside of rent-
| seeking channels. It's the "less wealthy" who benefit the most
| from the commoditization of creative content via generative
| algorithms.
|
| Quite frankly, I expected better from W3C.
| munificent wrote:
| _> > the copyright system creates a (relatively) shared
| understanding between creators and consumers that, by default,
| content cannot be redistributed, remixed, adapted or built upon
| without creators' consent. This shared understanding made it
| possible for a lot of content to be openly distributed on the
| Web._
|
| _> That is not remotely a shared understanding, is wrong, and
| has nothing to do with making it possible for a lot of content
| to be openly distributed on the web. Content is distributed
| quite widely without concern for copyright._
|
| I'm not sure if the switch from active voice in the original
| quote to passive in yours was deliberate or not, but
| "understanding between creators and consumers" is very
| different from your "content is distributed".
|
| It is the case, yes, that people widely distribute content on
| the web with no regard for copyright law. But those people
| aren't generally _creators_ of that content.
|
| The article is talking about the incentives that the web places
| on content creators. If the result of AIs harvesting every bit
| of content on the web and regurgitating it without sending
| consumers over to the creator's website, then creators will
| stop putting stuff online.
|
| People cloning and resharing content without regard to
| copyright has not so far seemed to have systemic negative
| effects on the web. Search engines seem to be pretty good at
| pointing users to upstream original sources of copyright
| content, so plagiarism is commong but apparently not common
| enough to cause context authors to stop putting it online.
|
| AI risks tipping that balance such that content creators really
| might stop posting stuff online. Why waste a meaningful chunk
| of your life creating a thing and putting it on the web if the
| only thing that will ever see it and know that it came from you
| is an AI slurping it up?
|
| _> It 's the "less wealthy" who routinely find ways to
| distribute content outside of rent-seeking channels._
|
| Again, I think you're presuming a world where content magically
| exists a priori and the network is simply a mechanism for
| deploying it. The article is about what happens when the system
| discourages people from making _at all._
|
| Poor people can find ways to pirate just about every book on
| Earth... except for those books that never ended up getting
| written because the incentives placed on the author didn't work
| out.
| skissane wrote:
| > Why waste a meaningful chunk of your life creating a thing
| and putting it on the web if the only thing that will ever
| see it and know that it came from you is an AI slurping it
| up?
|
| People have many different motivations for creating content.
| Some content is designed to advocate for a viewpoint, and if
| an AI is going to pick up that viewpoint and regurgitate it,
| the author may consider that "mission accomplished". Other
| content is created for personal reasons - e.g. here is this
| poem/novel/software I wrote in an exercise in personal self-
| expression, if someone finds it and likes it that's great, if
| it gets ignored, so what; if some AI slurps it up and uses
| that to help it regurgitate something similar but different,
| why should I care?
|
| If AI causes a decline in the commercial viability of web
| content, many people who miss the early days of the web, when
| personal reasons and personal interests were in the driving
| seat, not commerce, might view that decline as a good thing
| pmayrgundter wrote:
| I agree with the general idea of tagging content to help
| classify.
|
| I'd given this some thought via MIME and ended up with a kind of
| BioNFT.. so named bc it uses NFTs piecewise, but tracking the
| creation events and agents types (bio, ai, etc) as part of the
| content lifecycle
|
| https://twitter.com/PMayrgundter/status/1638016474483683328
|
| Highlight..
|
| What if: - devices sign source creations with a
| biosignature - editing tools sign input -
| media types include that, effectively saying:
| ai_edited(human_created(photo))
|
| and do this under the experimental namespace in MIME:
| image/x.bio(pablo@example.com/photo123).html image/x.a
| dobe.photoai(http://x.bio(pablo@example.com/photo123)).html
| Retr0id wrote:
| What you're describing here is basically what C2PA is
| zerojames wrote:
| Reference, for context: https://c2pa.org/
|
| And: the BBC just started using C2PA across some content. The
| BBC's R&D team talking about it:
| https://www.bbc.co.uk/rd/blog/2024-03-c2pa-verification-
| news...
| pmayrgundter wrote:
| Thanks for the ref! Checking it out
| yieldcrv wrote:
| C2PA would be better onchain so maybe you could do a proof
| of concept implementation of that
| WJW wrote:
| Perhaps I am being too cynical, but how do you protect this
| scheme against hostile actors?
|
| Bits don't have color after all (see
| https://ansuz.sooke.bc.ca/entry/23 if you don't get this
| reference) and it would be fairly trivial to manually alter
| such media types to anything you want. For example if someone
| posts a `ai_edited(human_created(photo))` online, it would be
| straightforward to take the pixels of the photo and re-publish
| them as a "new" image with only the `human_created(photo)`
| tags. You could also randomly start adding `ai_edited` tags to
| things you want discredited, etc.
___________________________________________________________________
(page generated 2024-04-04 23:00 UTC)