[HN Gopher] Understanding and managing the impact of machine lea...
       ___________________________________________________________________
        
       Understanding and managing the impact of machine learning models on
       the web
        
       Author : kaycebasques
       Score  : 57 points
       Date   : 2024-04-04 19:11 UTC (3 hours ago)
        
 (HTM) web link (www.w3.org)
 (TXT) w3m dump (www.w3.org)
        
       | kaycebasques wrote:
       | This would have been a better link:
       | https://www.w3.org/reports/ai-web-impact/
        
         | dang wrote:
         | Ok, we've changed to that from https://github.com/w3c/ai-web-
         | impact above. Thanks!
        
       | MacsHeadroom wrote:
       | > the copyright system creates a (relatively) shared
       | understanding between creators and consumers that, by default,
       | content cannot be redistributed, remixed, adapted or built upon
       | without creators' consent. This shared understanding made it
       | possible for a lot of content to be openly distributed on the
       | Web.
       | 
       | That is not remotely a shared understanding, is wrong, and has
       | nothing to do with making it possible for a lot of content to be
       | openly distributed on the web. Content is distributed quite
       | widely without concern for copyright.
       | 
       | > A number of AI systems combine (1) automated large-scale
       | consumption of Web content, and (2) production at scale of
       | content, in ways that do not recognize or otherwise compensate
       | content it was trained from.
       | 
       | > While some of these tensions are not new (as discussed below),
       | systems based on Machine Learning are poised to upend the
       | existing balance. Unless a new sustainable equilibrium is found,
       | this exposes the Web to the following undesirable outcomes:
       | 
       | > Significantly less open distributed content (which would likely
       | have a disproportionate impact on the less wealthy part of the
       | population)
       | 
       | That's even more ridiculous. The wealthy stand the most to gain
       | from restricting the flow of information to channels which
       | collect rent on behalf of their capital. It's the "less wealthy"
       | who routinely find ways to distribute content outside of rent-
       | seeking channels. It's the "less wealthy" who benefit the most
       | from the commoditization of creative content via generative
       | algorithms.
       | 
       | Quite frankly, I expected better from W3C.
        
         | munificent wrote:
         | _> > the copyright system creates a (relatively) shared
         | understanding between creators and consumers that, by default,
         | content cannot be redistributed, remixed, adapted or built upon
         | without creators' consent. This shared understanding made it
         | possible for a lot of content to be openly distributed on the
         | Web._
         | 
         |  _> That is not remotely a shared understanding, is wrong, and
         | has nothing to do with making it possible for a lot of content
         | to be openly distributed on the web. Content is distributed
         | quite widely without concern for copyright._
         | 
         | I'm not sure if the switch from active voice in the original
         | quote to passive in yours was deliberate or not, but
         | "understanding between creators and consumers" is very
         | different from your "content is distributed".
         | 
         | It is the case, yes, that people widely distribute content on
         | the web with no regard for copyright law. But those people
         | aren't generally _creators_ of that content.
         | 
         | The article is talking about the incentives that the web places
         | on content creators. If the result of AIs harvesting every bit
         | of content on the web and regurgitating it without sending
         | consumers over to the creator's website, then creators will
         | stop putting stuff online.
         | 
         | People cloning and resharing content without regard to
         | copyright has not so far seemed to have systemic negative
         | effects on the web. Search engines seem to be pretty good at
         | pointing users to upstream original sources of copyright
         | content, so plagiarism is commong but apparently not common
         | enough to cause context authors to stop putting it online.
         | 
         | AI risks tipping that balance such that content creators really
         | might stop posting stuff online. Why waste a meaningful chunk
         | of your life creating a thing and putting it on the web if the
         | only thing that will ever see it and know that it came from you
         | is an AI slurping it up?
         | 
         |  _> It 's the "less wealthy" who routinely find ways to
         | distribute content outside of rent-seeking channels._
         | 
         | Again, I think you're presuming a world where content magically
         | exists a priori and the network is simply a mechanism for
         | deploying it. The article is about what happens when the system
         | discourages people from making _at all._
         | 
         | Poor people can find ways to pirate just about every book on
         | Earth... except for those books that never ended up getting
         | written because the incentives placed on the author didn't work
         | out.
        
           | skissane wrote:
           | > Why waste a meaningful chunk of your life creating a thing
           | and putting it on the web if the only thing that will ever
           | see it and know that it came from you is an AI slurping it
           | up?
           | 
           | People have many different motivations for creating content.
           | Some content is designed to advocate for a viewpoint, and if
           | an AI is going to pick up that viewpoint and regurgitate it,
           | the author may consider that "mission accomplished". Other
           | content is created for personal reasons - e.g. here is this
           | poem/novel/software I wrote in an exercise in personal self-
           | expression, if someone finds it and likes it that's great, if
           | it gets ignored, so what; if some AI slurps it up and uses
           | that to help it regurgitate something similar but different,
           | why should I care?
           | 
           | If AI causes a decline in the commercial viability of web
           | content, many people who miss the early days of the web, when
           | personal reasons and personal interests were in the driving
           | seat, not commerce, might view that decline as a good thing
        
       | pmayrgundter wrote:
       | I agree with the general idea of tagging content to help
       | classify.
       | 
       | I'd given this some thought via MIME and ended up with a kind of
       | BioNFT.. so named bc it uses NFTs piecewise, but tracking the
       | creation events and agents types (bio, ai, etc) as part of the
       | content lifecycle
       | 
       | https://twitter.com/PMayrgundter/status/1638016474483683328
       | 
       | Highlight..
       | 
       | What if:                 - devices sign source creations with a
       | biosignature            - editing tools sign input            -
       | media types include that, effectively saying:
       | ai_edited(human_created(photo))
       | 
       | and do this under the experimental namespace in MIME:
       | image/x.bio(pablo@example.com/photo123).html            image/x.a
       | dobe.photoai(http://x.bio(pablo@example.com/photo123)).html
        
         | Retr0id wrote:
         | What you're describing here is basically what C2PA is
        
           | zerojames wrote:
           | Reference, for context: https://c2pa.org/
           | 
           | And: the BBC just started using C2PA across some content. The
           | BBC's R&D team talking about it:
           | https://www.bbc.co.uk/rd/blog/2024-03-c2pa-verification-
           | news...
        
           | pmayrgundter wrote:
           | Thanks for the ref! Checking it out
        
             | yieldcrv wrote:
             | C2PA would be better onchain so maybe you could do a proof
             | of concept implementation of that
        
         | WJW wrote:
         | Perhaps I am being too cynical, but how do you protect this
         | scheme against hostile actors?
         | 
         | Bits don't have color after all (see
         | https://ansuz.sooke.bc.ca/entry/23 if you don't get this
         | reference) and it would be fairly trivial to manually alter
         | such media types to anything you want. For example if someone
         | posts a `ai_edited(human_created(photo))` online, it would be
         | straightforward to take the pixels of the photo and re-publish
         | them as a "new" image with only the `human_created(photo)`
         | tags. You could also randomly start adding `ai_edited` tags to
         | things you want discredited, etc.
        
       ___________________________________________________________________
       (page generated 2024-04-04 23:00 UTC)