[HN Gopher] GitHub Copilot, with "public code" blocked, emits my...
___________________________________________________________________
GitHub Copilot, with "public code" blocked, emits my copyrighted
code
Author : davidgerard
Score : 262 points
Date : 2022-10-16 19:33 UTC (3 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| [deleted]
| naillo wrote:
| It's interesting if the consequence of this will be people open
| sourcing things way less. Would give another layer of irony to
| openais name.
| D13Fd wrote:
| If the code is public, my guess is that someone else stole it and
| added it to an open source repo without authorization. Microsoft
| may have then picked it up from there.
| SrslyJosh wrote:
| This just means that if you use Copilot for work, you're
| exposing yourself and/or your employer to unknown legal
| liability. =)
| teaearlgraycold wrote:
| Not sure how you'd get caught if your code is kept private.
| jen20 wrote:
| Not entirely sure how this could happen! "naikrovek" assured me
| not three days ago on this very site that I was "detached from
| reality" [1] for thinking that this would happen again.
|
| To be fair I thought it might be at least a week or two.
|
| [1]: https://news.ycombinator.com/item?id=33194643
| an1sotropy wrote:
| This is a huge and looming legal problem. I wonder if what should
| be a big uproar about it is muted by the widespread
| acceptance/approval of github and related products, in which case
| its a nice example of how monopolies damage communities.
| jeroenhd wrote:
| I think it won't become a legal problem until Copilot steals
| code from a leaked repository (i.e. the Windows XP source code)
| and that code gets reused in public.
|
| Only then will we see an answer to the question "is making an
| AI write your stolen code a viable excuse".
|
| I very much approve of the idea of Copilot as long as the
| copied code is annotated with the right license. I understand
| this is a difficult challenge but just because this is
| difficult doesn't mean such a requirement should become
| optional; rather, it should encourage companies to fix their
| questionable IP problems before releasing these products into
| the wild, especially if they do so in exchange for payment.
| jijji wrote:
| if you're posting your code publicly on the web, its hard to get
| upset that people are seeing/using it
| betaby wrote:
| It's under the very specific license though. With your logic
| it's OK to train AI on a leaked Windows code then. It is/was
| publicly on the web.
| galleywest200 wrote:
| While I agree with the first portion of your rebuttal, the
| second portion makes no sense as leaked code is not "you"
| putting it on the internet. It would be a nefarious actor
| doing so.
| lerpgame wrote:
| code licenses will be irrelevant in a few years if you are
| able to refactor anything you want using ai.
| betaby wrote:
| Unless lawyers from the music industry step in.
| themoonisachees wrote:
| Yes? Its production code that is supposed to work (keyword
| supposed). I'd like the code-suggestor to also be trained on
| AAA games source leaks.
| qu4z-2 wrote:
| Can I introduce you to a concept called copyright?
|
| Is it fine if an author publishes a short story publicly on the
| web for someone else to submit it to a contest as their own
| work?
| Hamuko wrote:
| https://nedroidcomics.tumblr.com/post/41879001445/the-intern...
| deepspace wrote:
| This shows how copyright is all screwed up. Let's say the code in
| question is based on a published algorithm, maybe Yuster and
| Zwick, (I did not check).
|
| What exactly gives Davis a better claim to the copyright than the
| inventors of the algorithm? Yes, I know software is copyrightable
| while algorithms are not, but it is not at all clear to my why
| that should be the case. The effort of translating an algorithm
| into code is trivial compared to designing the algorithm in the
| first place, no?
| clnq wrote:
| To be honest, it would probably benefit all of humanity if we
| stopped rewriting the same code to then fix the same bugs in
| it, and instead just used each other's algorithms to do
| meaningful work.
|
| I work for a large tech company whose lawyers definitely care
| that my code doesn't train an AI model somewhere much more than
| I do. On the contrary, I would really like to open source all
| of my work - it would make it more impactful and would
| demonstrate my skills. It makes me a bit sad that my life's
| work is going to be behind lock and key, visible to relatively
| few people. Not to mention that the hundreds of thousands of
| work hours, energy and effort that will be spent to replicate
| it all over my industry in all other lock-and-key companies
| makes the industry as a whole tremendously inefficient.
|
| I hope that AI models like Copilot will finally show to the
| very litigious tech companies that their intellectual property
| has been all over the public domain from the start. And we can
| get over a lot of the petty algorithm IP suits that probably
| hold back all tech in aggregate. We should all be working
| together, not racing against each other in the pursuit of
| shareholder value.
|
| Historically, mathematicians used to keep their solutions
| secret in the interest of employment in the middle ages. So
| there used to be mathematicians that could, for example, solve
| certain quadratic equations but it took centuries before all
| humanity could not benefit from this knowledge. I believe this
| is what is happening with algorithms now. And it is very
| counter-progress in my opinion.
| heavyset_go wrote:
| You can patent an algorithm if you want to protect it.
| jeroenhd wrote:
| (in some countries)
| jonnycomputer wrote:
| Microsoft should just train it on all their proprietary code
| instead. See how sanguine they are about it then.
| jacooper wrote:
| They avoided answering this question at all costs.
|
| Because it exposes their direct hypocrisy in this, its fair use
| for OSS but not for us.
|
| Questions here are very important, and its no surprise GitHub
| avoided answering anything about CoPilot's legality:
|
| https://sfconservancy.org/GiveUpGitHub/
| naikrovek wrote:
| who said they haven't.
|
| for something to show up verbatim in the output of a textual AI
| model it needs to be an input many times.
|
| I wonder if the problem is not copilot, but many people using
| this person's code without license or credit, and copilot being
| trained on those pieces of code as well. copilot may just be
| exposing a problem rather than creating one.
|
| I don't know much about AI, and I don't use copilot.
| make3 wrote:
| there's exactly no way they have
| akudha wrote:
| With the amount of resources that Microsoft has, how hard can
| it be for them to exclude proprietary code that other people
| have stolen? I'd bet it is easy for them, but they won't do
| it. Because they don't care, because who is gonna take on
| them?
|
| Will they "accidentally" include proprietary code from say,
| Oracle? Nope. They'll make sure of it. But Joe Random? Sure
| belorn wrote:
| Microsoft have a public statement that they don't use
| proprietary code, only public code with public licenses. They
| have a lot of companies as customers who uses github, and
| they also use a lot third-party code in their own products.
| stefan_ wrote:
| Even BSD et. al. have attribution requirements - that must
| be a vanishingly small amount of code to be used. Me thinks
| the people who run GitHub (who have apparently decided to
| abandon the core business for the latest fun project)
| aren't being entirely upfront.
| eyelidlessness wrote:
| As a thought experiment: what do we all suppose would be the
| impact to Microsoft if they deliberately made public the
| proprietary source code for all of their publicly available
| commercial products and efforts (including licensed software,
| services; excluding private contracts, research), but the rest
| of their intellectual property and trade secrets remained
| private?
|
| Since I'm posing the question, here's my guess:
|
| - Their stock would take at least a short term hit because it's
| an unconventional and uncharacteristic move
|
| - The code would reveal more about their strategic interests to
| competitors than they'd like, but probably nothing revelatory
|
| - It might confirm or reinforce some negative perceptions of
| their business practices
|
| - It might dispel some too
|
| - It may reduce some competitive advantage amongst enormous
| businesses, and may elevate some very large firms to potential
| competitors
|
| - It would provide little to no new advantage to smaller
| players who aren't already in reach of competing with them
| and/or don't have the resources to capitalize on access to the
| code
|
| - It would probably significantly improve public perception of
| the company and its future intentions, at least among
| developers and the broader tech community
|
| In other words, a wash. Overall business impact would be
| roughly neutral. The code has more strategic than technical
| value, there are few who could leverage the technical value
| that is any kind of revenue center with growth potential. Any
| disadvantage would be negated by the public image goodwill it
| generated.
|
| Maybe my take is naive though! Maybe it would really hurt
| Microsoft long term if suddenly everyone can fork Windows 11,
| or steal ideas for their idiosyncratic office suite, or get
| really clever about how to get funded to go head to head with
| Azure armed with code everyone else can access too.
| 8note wrote:
| If they released all the source, I'd be able to run the nice
| drawing app from windows inkspaces again, unkilling the app
| they want dead
| mccorrinall wrote:
| If they'd open source their software I wouldn't have to wait
| two months till they finally release the pdbs for the kernel
| after every 2XH1 / 2XH2 update.
|
| It's so annoying that they are sooooo slow at this and we
| have to keep our users from upgrading after every release.
| thorum wrote:
| What might be going on here is that Copilot pulls code it thinks
| may be relevant from other files in your VS Code project. You
| have the original code open in another tab, so Copilot uses it as
| a reference example. For evidence, see the generated comment
| under the Copilot completion: "Compare this snippet from
| Untitled-1.cpp" - that's the AI accidentally repeating the prompt
| it was given by Copilot.
| ianbutler wrote:
| I just tested it myself, and I most certainly do not have his
| source open, and it reproduced his code verbatim with just the
| function header in a random test c file I created in the middle
| of a rust project I'm working on.
| thorum wrote:
| Ah ok.
| naikrovek wrote:
| stefan_ wrote:
| Seems other people tried it?
| https://twitter.com/larrygritz/status/1581713252144517120
| zaps wrote:
| Drunk conspiracy theory: Nat knew Copilot would be a complete
| nightmare and bailed.
| [deleted]
| colesantiago wrote:
| Github Copilot is not AI at all, it is just a dumb code
| regurgitator that just sells you code you wrote on GitHub and
| takes all the credit for it shamelessly.
| davidgerard wrote:
| it's totally AI, in the "legal responsibility laundering"
| sense. This is the main present day use case for saying "AI".
| Jevon23 wrote:
| Hopefully you understand how artists feel about DALL-E and
| Midjourney now.
| pessimizer wrote:
| I like that if you prompt these with specific artists names,
| they try their best to rip those particular artists off.
| lolinder wrote:
| I use copilot in my work every day, but only in places where I
| know the code cannot be regurgitated because what I'm doing has
| never been done before.
|
| I can write an HTML form, then prompt copilot to generate a
| serializable class that can be used to deserialize that form on
| the server. I can write a test for one of our internal apis,
| and for every subsequent test I can just write the name of what
| I expect it to check and it generates a test that _correctly_
| uses our internal APIs and verifies the expected behavior.
|
| You can have problems with the ethics of how GitHub and OpenAI
| produced what they did, but to describe it the way that you did
| requires never having really attempted to use it seriously.
| ianbutler wrote:
| I just tested it myself on a random c file I created in the
| middle of a rust project I'm working on, it reproduced his full
| code verbatim from just the function header so clearly it does
| regurgitate proprietary code unlike some people have said, I do
| not have his source so co-pilot isn't just using existing
| context.
|
| I've been finding co-pilot really useful but I'll be pausing it
| for now, and I'm glad I have only been using it on personal
| projects and not anything for work. This crosses the line in my
| head from legal ambiguity to legal "yeah that's gonna have to
| stop".
| naikrovek wrote:
| what proprietary code? the guy on Twitter is seeing his own GPL
| code bring reproduced. nothing proprietary there.
|
| do you have the "don't reproduce code verbatim" preference set?
| webstrand wrote:
| He owns the copyright to the code, and the code is not in the
| public domain, therefore it is proprietary code.
| yjftsjthsd-h wrote:
| That's not how anybody uses the word proprietary when
| dealing with software licensing. It's a term of art that
| stands in contrast to open source licenses.
| ianbutler wrote:
| For the record, I don't typically think in terms of the
| open source community.
|
| I grant that if most people are using it one way here I
| was likely wrong for the way it is typically used by the
| normal open source community, I followed up with a reply
| saying it would likely be more correct for me to have
| said "improperly licensed" to be included in the training
| set.
|
| Still it being private means it probably shouldn't be in
| the training set anyway regardless of license, because in
| the future, truly proprietary code could be included, or
| code without any license which reserves all right to the
| creator.
| ianbutler wrote:
| Sorry it would likely be more correct to say "improperly
| licensed" code and not proprietary. Still for someone like
| me, the possibility of having LGPL, or any GPL licensed code
| generated in their project is a solid no thanks. I know
| others may think differently but those are toxic licenses to
| me.
|
| Not to mention this code wasn't public so it's kind of moot,
| having someone's private code be generated into my project is
| very bad.
|
| As to the option, I do not, I wasn't even aware of the
| option, but it's pretty silly to me that's not on by default,
| or even really an option. That should probably be enabled
| with no way to toggle it without editing the extension.
| shadowgovt wrote:
| Searching for the function names in his libraries, I'm seeing
| some 32,000 hits.
|
| I suspect he has a different problem which (thanks to
| Microsoft) is now a problem he has to care about: his code
| probably shows up in one or more repos copy-pasted with
| improper LGPL attribution. There'd be no way for Copilot to
| know that had happened, and it would have mixed in the code.
|
| (As a side note: understanding _why_ an ML engine outputs a
| particular result is still an open area of research AFAIK.)
| chiefalchemist wrote:
| I thought the same thing. But then shouldn't CP look at
| things it's not supposed to use and see if that's happened?
| How is that any different than you committing your API to
| Platform X and shortly thereafter Platform X reaches out to
| you...because GH let them know?
| ianbutler wrote:
| Yeah that's a mess, but that's way too much legal baggage for
| me, an otherwise innocent end user, to want to take on.
| Especially when I personally tend to try and monetize a lot
| of my work.
|
| I understand there's no way for the model to know, but it's
| really on Microsoft then to ensure no private, or poorly
| licensed or proprietary code is included in the training set.
| That sounds like a very tall order, but I think they're going
| to have to otherwise they're eventually going to run into
| legal problems with someone who has enough money to make it
| hurt for them.
| shadowgovt wrote:
| Agreed. Silver lining: MS is now heavily incentivized to
| invest in solutions for an open research problem.
| [deleted]
| enragedcacti wrote:
| Expanding on that, even if Microsoft sees the error of their
| ways and retrains copilot against permissively licensed
| source or with explicit opt-in, it may get trained on
| proprietary code the old version of copilot inserted into a
| permissively licensed project.
|
| You would have to just hope that you can take down every
| instance of your code and keep it down, all while copilot
| keeps making more instances for the next version to train on
| and plagiarize.
| [deleted]
| mdaniel wrote:
| I didn't feel like weighing into that Twitter thread, but in the
| screenshot one will notice that the code generated by Copilot has
| secretly(?) swapped the order of the interior parameters to
| "cs_done". Maybe that's fine, but maybe it's not, how in the
| world would a Copilot consumer know to watch out for that? Double
| extra good if a separate prompt for "cs_done" comingles multiple
| implementations where some care and some don't. Partying ensues!
|
| Not to detract from the well founded licensing discussion, but
| who is it that finds this madlibs approach useful in coding?
| bmitc wrote:
| What does
|
| > with "public code" blocked
|
| mean? Are you able set a setting in GitHub to tell GitHub that
| you don't want your code used for Copilot training data? Is this
| an abuse of the license you sign with GitHub, or did they update
| it at some point to allow your code to be automatically used in
| Copilot? I'm not crazy about the idea of paying GitHub for them
| to make money off of my code/data.
| galleywest200 wrote:
| The option to omit "public code" means it should, in theory,
| omit code that is licensed under such banners as the GPL. It
| does not mean "omit private repositories".
| [deleted]
| _the_inflator wrote:
| Well, this can impose a serious risk to companies and their cloud
| strategy based on GitHub.
|
| Can these enterprises really make sure, that their code won't be
| used to train Copilot? I am skeptical.
| deworms wrote:
| It prints this code because you have it open in another editor
| tab. Wish people who don't know at all how it works stopped
| acting all outraged when they're laughably wrong.
| yjftsjthsd-h wrote:
| > It prints this code because you have it open in another
| editor tab.
|
| People upthread have reproduced and demonstrated that that's
| not the issue here.
|
| EDIT: Actually, OP says "The variant it produces is not on my
| machine." -
| https://twitter.com/DocSparse/status/1581560976398114822
|
| > Wish people who don't know at all how it works stopped acting
| all outraged when they're laughably wrong.
|
| Physician, heal thyself.
| lupire wrote:
| Can you link to more info about this? If this is accurate, many
| people aren't aware.
| Traubenfuchs wrote:
| What keeps him from suing if he is so sure?
|
| Those pretty little licenses are a waste of storage if no one
| enforces them.
| SamoyedFurFluff wrote:
| Money. Suing is often survival of the richest.
| psychphysic wrote:
| Hot take, AI will steal all our jobs. Get over it.
| kweingar wrote:
| I've noticed that people tend to disapprove of AI trained on
| their profession's data, but are usually indifferent or positive
| about other applications of AI.
|
| For example, I know artists who are vehemently against DALL-E,
| Stable Diffusion, etc. and regard it as stealing, but they view
| Copilot and GPT-3 as merely useful tools. I also know software
| devs who are extremely excited about AI art and GPT-3 but are
| outraged by Copilot.
|
| For myself, I am skeptical of intellectual property in the first
| place. I say go for it.
| bcrosby95 wrote:
| I look at IP differently.
|
| For copyright, the act of me creating something doesn't deprive
| you of anything, except the ability to consume or use the thing
| I created. If I were influenced by something, you can still be
| influenced by that same thing - I do not exhaust any resources
| I used.
|
| This is wholely different from physical objects. If I create a
| knife, I deprive you of the ability to make something else from
| those natural resources. Natural resources that I didn't create
| - I merely exploited them.
|
| Because of this, I'm fine with copyright (patents are another
| story). But I have some issues with physical property.
| joecot wrote:
| > For myself, I am skeptical of intellectual property in the
| first place. I say go for it.
|
| If we didn't live in a Capitalist society, that would be fair.
| But we currently do. That Capitalist society cares little about
| the well being of artists unless it can find a way to make
| their art profitable. Projects like DALL-E and Midjourney
| pillage centuries of human art and sell it back to us for a
| profit, while taking away work from artists who struggle to
| make ends meet as it is. Software Developers are generally less
| concerned about Copilot because they're still making 6 figures
| a year, but they'll start to get concerned if the technology
| gets smart enough that society needs less Developers.
|
| An automated future _should_ be a good thing. It should mean
| that computers can take care of most tasks and humans can have
| more leisure time to relax and pursue their passions. The
| reason that artists and developers panic over things like this
| is that they are watching themselves be automated out of
| existence, and have seen how society treats people who aren 't
| useful anymore.
| yjftsjthsd-h wrote:
| I can think of two explanations for that off the top of my
| head.
|
| The first is that people only recognize the problems with the
| things that they're familiar with, which you would kind of
| expect.
|
| The other option is that there's a difference in the thing that
| people object to. My _impression_ is that artists seem to be
| reacting to the idea that they could be automated out of a job,
| where programmers are mostly objecting to blatant copyright
| violation. (Not universally in either case, but often.) If that
| is the case, then those are genuinely different arguments made
| by different people.
| lucideer wrote:
| I don't know specifically what DALL-E was trained on, but if
| it's art for which the artists' have not consented to it being
| used to train AI then that's problematic. I haven't seen any
| objections to DALL-E _on that basis specifically_ though,
| whereas all the discussion of Copilot is around the fact that
| code authorship & Github accounts are not intrinsically tied
| together, making it impossible to have code authors consent to
| their code being used, regardless of what ToS someone's agreed
| to.
|
| > _For myself, I am skeptical of intellectual property in the
| first place. I say go for it._
|
| I'm in a similar boat but this is precisely the reason I object
| so strongly to Copilot. IP has been invented &
| perpetuated/extended to protect large corporate interests,
| under the guise of protecting & sustaining innovators &
| creative individuals. Copilot is a perfect example of large
| corporate interest ignoring IP _when it suits them_ to exploit
| individuals.
|
| In other words: the reason I'm skeptical of IP is the same
| reason I'm skeptical of Copilot.
| __alexs wrote:
| Stable Diffusion and DallE were both trained on copyrighted
| content scraped from the internet with no consent from the
| publishers.
|
| It's quite a common complaint because some of the most
| popular prompts involve just appending an artist's name to
| something to get it to copy their style.
| dawnerd wrote:
| In theory AI should never return an exact copy of a copyrighted
| work or even anything close enough you could argue is the
| original "just changed". If the styles are the same I think
| that's fine, no different than someone else cloning it. But
| there's definitely outputs from stable diffusion that looks
| like the original with some weird artifacts.
|
| We need regulation around it.
| XorNot wrote:
| > there's definitely outputs from stable diffusion that looks
| like the original with some weird artifacts.
|
| Do you have examples? Because SD will generate photoreal
| outputs and then get subtle details (hands, faces) wrong, but
| unless you have the source image in hand then you've no way
| of knowing whether it's a "source image" or not.
| rtkwe wrote:
| Code is much easier to do that because the avenues for
| expression are significantly limited compared to just
| creating an image. For it to be useful copilot has to produce
| compiling and reasonably terse and understandable code. The
| compiler in particular is a big bottle neck to the range of
| the output.
| ghoward wrote:
| I am a programmer who has written extensively on my blog and HN
| against Copilot.
|
| I am also not a hypocrite; I do not like DALL-E or Stable
| Diffusion either.
|
| As a sibling comment implies, these AI tools give more power to
| people who control data, i.e., big companies or wealthy people,
| while at the same time, they take power away from individuals.
|
| Copilot is bad for society. DALL-E and Stable Diffusion are bad
| for society.
|
| I don't know what the answer is, but I sure wish I had the
| resources to sue these powerful entities.
| vghfgk1000 wrote:
| akudha wrote:
| _but I sure wish I had the resources to sue these powerful
| entities._
|
| I wonder if there is a crowdfunding platform like gofundme,
| for lawsuits. Or can gofundme itself can be used for this
| purpose? It would be fantastic to sue the mega polluters,
| lying media like Fox etc.
|
| That said, even with a lot of money, are these cases
| winnable? Especially given the current state of Supreme Court
| and other federal courts?
| williamcotton wrote:
| I'm a programmer and a songwriter and I am not worried about
| these tools and I don't think they are bad for society.
|
| What did the photograph do to the portrait artist? What did
| the recording do to the live musician?
|
| Here's some highfalutin art theory on the matter, from almost
| a hundred years ago: https://en.wikipedia.org/wiki/The_Work_o
| f_Art_in_the_Age_of_...
| snarfy wrote:
| > What did the recording do to the live musician?
|
| The recording destroyed the occupation of being a live
| musician. People still do it for what amounts to tip money,
| but it used to be a real job that people could make a
| living off of. If you had a business and wanted to
| differentiate it by having music, you had to pay people to
| play it live. It was the only way.
| __alexs wrote:
| > What did the photograph do to the portrait artist?
|
| It completely destroyed the jobs of photo realistic
| portrait artists. You only have stylised portrait painting
| now and now that is going to be ripped off too.
| SamoyedFurFluff wrote:
| But this isn't like photography and portrait artistry. This
| is more like a wealthy person stealing your entire art
| catalog, laundering it in some fancy way, and then claiming
| they are the original creator. Stable Diffusion has
| literally been used to create new art by screenshotting
| someone's live-streamed art creation process as the seed.
| While creating derivative work has always been considered
| art(such as deletion poetry and collage), it's extremely
| uncommon and blase to never attribute the original(s).
| insanitybit wrote:
| > This is more like a wealthy person stealing your entire
| art catalog, laundering it in some fancy way, and then
| claiming they are the original creator.
|
| If I take a song, cut it up, and sing over it, my release
| is valid. If I parody your work, that's my work. If you
| paint a picture of a building and I go to that spot and
| take a photograph of that building it is my work.
|
| I can derive all sorts of things, things that I own, from
| things that others have made.
|
| Fair use is a thing: https://www.copyright.gov/fair-use/
|
| As for talking about the originals, would an artist
| credit every piece of inspiration they have ever
| encountered over a lifetime? Publishing a seed seems fine
| as a _nice_ thing to do, but pointing at the billion
| pictures that went into the drawing seems silly.
| tremon wrote:
| Fair use is an affirmative defense. Others can still sue
| you for copying, and you will have to hope a judge agrees
| with your defense. How do you think Google v. Oracle
| would have turned out if Google's defense was "no your
| honor, we didn't copy the Java sources. We just used
| those sources as input to our creative algorithms, and
| this is what they independently produced"?
| ghoward wrote:
| Do you know what's different about the photograph or the
| recording?
|
| _They are still their own separate works!_
|
| If a painter paints a person for commission, and then that
| person also commissions a photographer to take a picture of
| them, is the photographer infringing on the copyright of
| the painter? Absolutely not; the works are separate.
|
| If a recording artist records a public domain song that
| another artist performs live, is the recording artist
| infringing on the live artist? Heavens, no; the works are
| separate.
|
| On the other hand, these "AI's" are taking existing works
| and reusing them.
|
| Say I write a song, and in that song, I use one stanza from
| the chorus of one of your songs. Verbatim. Would you have a
| copyright claim against me for that? Of course, you would!
|
| That's what these AI's do; they copy portions and mix them.
| Sometimes they are not substantial portions. Sometimes,
| they are, with verbatim comments (code), identical
| structure (also code), watermarks (images), composition
| (also images), lyrics (songs), or motifs (also songs).
|
| In the reverse of your painter and photographer example, we
| saw US courts hand down judgment against an artist who
| blatantly copied a photograph. [1]
|
| Anyway, that's the difference between the tools of
| photography (creates a new thing) and sound recording
| (creates a new thing) versus AI (mixes existing things).
|
| And yes, sound mixing can easily stray into copyright
| infringement. So can other copying of various copyrightable
| things. I'm not saying humans don't infringe; I'm saying
| that AI does _by construction_.
|
| [1]: https://www.reuters.com/world/us/us-supreme-court-
| hears-argu...
| williamcotton wrote:
| I'm not sure sure that originality is that different
| between a human and a neural network. That is to say that
| what a human artist is doing has always involved a lot of
| mixing of existing creations. Art needs to have a certain
| level of familiarity in order to be understood by an
| audience. I didn't invent 4/4 time or a I-IV-V
| progression and I certainly wasn't the first person to
| tackle the rhyme schemes or subject matter of my songs. I
| wouldn't be surprised if there were fragments from other
| songs in my lyrics or melodies, either from something I
| heard a long time ago or perhaps just out of coincidence.
| There's only so much you can do with a folk song to begin
| with!
|
| BTW, what happened after the photograph is that there
| were less portrait artists. And after the recording there
| were less live musicians. There are certainly no less
| artists nor musicians, though!
| ghoward wrote:
| > I'm not sure sure that originality is that different
| between a human and a neural network. That is to say that
| what a human artist is doing has always involved a lot of
| mixing of existing creations.
|
| I disagree, but this is a debate worth having.
|
| This is why I disagree: humans don't copy _just_
| copyrighted material.
|
| I am in the middle of developing and writing a romance
| short story. Why? Because my writing has a glaring
| weakness: characters, and romance stands or falls on
| characters. It's a good exercise to strengthen that
| weakness.
|
| Anyway, both of the two people in the (eventual) couple
| developed from _my real life_ , and not from any
| copyrighted material. For instance, the man will
| basically be a less autistic and less selfish version of
| myself. The woman will basically be the kind of person
| that annoys me the most in real life: bright, bubbly,
| always touching people, etc.
|
| There is no copyrighted material I am getting these
| characters from.
|
| In addition, their situation is not typical of such
| stories, but it _does_ have connections to my life. They
| will (eventually) end up in a ballroom dance competition.
| Why that? So the male character hates it. I hate ballroom
| dance during a three-week ballroom dancing course in 6th
| grade, the girls made me hate ballroom dancing. I won 't
| say how, but they did.
|
| That's the difference between humans and machines:
| machines can only copyright and mix other copyrightable
| material; humans can copy _real life_. In other words,
| machines can only copy a representation; humans can copy
| the real thing.
|
| Oh, and the other difference is emotion. I've heard that
| people without the emotional center of their brains can
| take _six hours_ to choose between blue and black pens.
| There is something about emotions that drives decision-
| making, and it 's decision-making that drives art.
|
| When you consider that brain chemistry, which is a
| function of genetics and previous choices, is a big part
| of emotions, then it's obvious that those two things,
| genetics and previous choices, are _also_ inputs to the
| creative process. Machines don 't have those inputs.
|
| Those are the non-religious reasons why I think humans
| have more originality than machines, including neural
| networks.
| c7b wrote:
| > these AI tools give more power to people who control data,
| i.e., big companies or wealthy people, while at the same
| time, they take power away from individuals.
|
| Not sure I agree, but I can at least see the point for
| Copilot and DALL-E - but Stable Diffusion? It's open source,
| it runs on (some) home-use laptops. How is that taking away
| power from indies?
|
| Just look at the sheer number of apps building on or
| extending SD that were published on HN, and that's probably
| just the tip of the iceberg. Quite a few of them at least
| looked like side projects by solo devs.
| ghoward wrote:
| SD is better than the other two, but it will still
| centralize control.
|
| I imagine that Disney would take issue with SD if material
| that Disney owned the copyright to was used in SD. They
| would sue. SD would have to be taken off the market.
|
| Thus, Disney has the power to ensure that their copyrighted
| material remains protected from outside interests, and they
| can still create unique things that bring in audiences.
|
| Any small-time artist that produces something unique will
| find their material eaten up by SD in time, and then,
| because of the sheer _number_ of people using SD, that
| original material will soon have companions that are like
| it _because they are based on it in some form_. Then, the
| original won 't be as unique.
|
| Anyone using SD will not, by definition, be creating
| anything unique.
|
| And when it comes to art, music, photography, and movies,
| _uniqueness_ is the best selling point; once something is
| not unique, it becomes worth less because something like it
| could be gotten somewhere else.
|
| SD still has the power to devalue original work; it just
| gives normal people that power on top of giving it to the
| big companies, while the original works of big companies
| remain safe because of their armies of lawyers.
| c7b wrote:
| > I imagine that Disney would take issue with SD if
| material that Disney owned the copyright to was used in
| SD. They would sue. SD would have to be taken off the
| market.
|
| Are you sure?
|
| I'm not familiar with the exact data set they used for SD
| and whether or not Disney art was included, but my
| understanding is that their claim to legality comes from
| arguing that the use of images as training data is 'fair
| use'.
|
| Anyone can use Disney art for their projects as long as
| it's fair use, so even if they happened to not include
| Disney art in SD, it doesn't fully validate your point,
| because they could have done so if they wanted. As long
| as training constitutes fair use, which I think it should
| - it's pretty much the AI equivalent of 'looking at
| others' works', which is part of a human artist's
| training as well.
| ghoward wrote:
| > Are you sure?
|
| Yes, I'm sure.
|
| > I'm not familiar with the exact data set they used for
| SD and whether or not Disney art was included, but my
| understanding is that their claim to legality comes from
| arguing that the use of images as training data is 'fair
| use'.
|
| They could argue that. But since the American court
| system is currently (almost) de facto "richest wins,"
| their argument will probably not mean much.
|
| The way to tell if something was in the dataset would be
| to use the name of a famous Disney character and see what
| it pulls up. If it's there, then once the Disney beast
| finds out, I'm sure they'll take issue with it.
|
| And by the way, I don't buy all of the arguments for
| machine learning as fair use. Sure, for the training
| itself, yes, but once the model is used by others, you
| now have a distribution problem.
|
| More in my whitepaper against Copilot at [1].
|
| [1]: https://gavinhoward.com/uploads/copilot.pdf
| cmdialog wrote:
| Obviously this is a matter of philosophy. I am using Copilot
| as an assistant, and for that it works out very nicely. It's
| fancy code completion. I don't know who is trying to use this
| to write non-trivial code but that's as bad an idea as trying
| to pass off writing AI "prompts" as a type of engineering.
|
| These things are tools to make more involved things. You're
| not going to be remembered for all the AI art you prompted
| into existence, no matter how many "good ones" you manage to
| generate. No one is going to put you into the Guggenheim for
| it.
|
| Likewise, programmers aren't going to become more depraved or
| something by using Copilot. I think that kind of prescriptive
| purism needs to Go Away Forever, personally.
| bayindirh wrote:
| I, with my software developer hat, am not excited by AI. Not a
| bit, honestly. Esp. about these big models trained on huge
| amount of data, without any consent.
|
| Let me be perfectly clear. I'm all for the tech. The
| capabilities are nice. The thing I'm _strongly against_ is
| training these models on any data without any consent.
|
| GPT-3 is OK, training it with public stuff regardless of its
| license is not.
|
| Copilot is OK, training on with GPL/LGPL licensed code without
| consent is not.
|
| DALL-E/MidJourney/Stable Diffusion is OK. Training it with non
| public domain or CC0 images is not.
|
| "We're doing something amazing, hence we need no permission" is
| ugly to put it very lightly.
|
| I've left GitHub because of CoPilot. Will leave any photo
| hosting platform if they hint any similar thing with my
| photography, period.
| psychphysic wrote:
| I disagree.
|
| Those are effectively cases of cryptomnesia[0]. Part and
| parcel of learning.
|
| If you don't want broad access your work, don't upload it to
| a public repository. It's very simple. Good on you for
| recognising that you don't agree with what GitHub looks at
| data in public repos, but it's not their problem.
|
| [0] https://en.m.wikipedia.org/wiki/Cryptomnesia
| bayindirh wrote:
| > Those are effectively cases of cryptomnesia.
|
| Disagree, outputting training data as-is is not
| cryptomnesia. This is not Copilot's first case. It also
| reproduced ID software's fast inverse square root function
| as-is, including its comments, but without its license.
|
| > If you don't want broad access your work, don't upload it
| to a public repository. It's very simple.
|
| This is actually both funny and absurd. This is why we have
| licenses at this point. If all the licenses is moot, then
| this opens a very big can of worms...
|
| My terms are simple. If you derive, share the derivation
| (xGPL). Copilot is deriving my code. If you use my code as
| a derivation point, honor the license, mark the derivation
| with GPL license. This voids your business case? I don't
| care. These are my terms.
|
| If any public item can be used without any limitations,
| Getty Images (or any other stock photo business) is
| illegal. CC licensing shouldn't exist. GPL is moot. Even
| the most litigious software companies' cases (Oracle, SCO,
| Microsoft, Adobe, etc.) is moot. Just don't put it on
| public servers, eh?
|
| Similarly, music and other fine arts are generally publicly
| accessible. So copyright on any and every production is
| also invalid as you say, because it's publicly available.
|
| Why not put your case forward with attorneys of Disney, WB,
| Netflix and others? I'm sure they'll provide all their
| archives for training your video/image AI. Similarly
| Microsoft, Adobe, Mathworks, et al. will be thrilled to
| support your CoPilot competitor with their code, because a)
| Any similar code will be just cryptomnesia, b) The software
| produced from that code is publicly accessible anyway.
|
| At this point, I even didn't touch to the fact that humans
| are trained much more differently than neural networks.
| matheusmoreira wrote:
| > For myself, I am skeptical of intellectual property in the
| first place. I say go for it.
|
| Me too. I think copyright and these silly restrictions should
| be abolished.
|
| At the same time, I can't get over the fact these self-serving
| corporations are all about "all rights reserved" when it
| benefits them while at the same time undermining other people's
| rights. Microsoft absolutely knows that what they're doing is
| wrong. Recently it was pointed out to me that Microsoft
| employees can't even look at GPL source code, lest they
| subcounsciously reproduce it. Yet they think their software can
| look at other people's code and reproduce it?
| wzdd wrote:
| This talking point seems to come up often, but since it's
| basically saying that people are hypocrites I think it is a bad
| faith thing to say without reasonable proof that it's not a
| fringe opinion (or completely invented).
|
| For what it's worth, the people I know who are opposed to this
| sort of "useful tool" don't discriminate by profession.
| teddyh wrote:
| An accusation of hypocrisy _is not an argument_ ; at least not
| a relevant one.
| pclmulqdq wrote:
| I think the distinction is that only one of those classes tends
| to produce exact copies of work. Programmers get very upset at
| DALL-E and Stable Diffusion producing exact (and near-exact)
| copies of artwork too. In contrast to exact copying, production
| of imitations (not exact copies, but "X in the style of Y") is
| something that artists have been doing for centuries, and is
| widely thought of as part of arts education.
|
| For some reason, code seems to lend itself to exact copying by
| AIs (and also some humans) rather than comprehension and
| imitation.
| XorNot wrote:
| I'm mildly suspicious that this example is an implementation
| of a generic matrix functionality though: you couldn't patent
| this sort of work, because it's not patentable - it's a
| mathematics. It's fundamentally a basic operation, that would
| have to be implemented with a similar structure regardless of
| how you do it.
| pclmulqdq wrote:
| Patents and copyrights are totally different, and should be
| treated as such. The issue isn't about whether someone
| copies the algorithm, it's whether they copy the written
| code. Nothing in an algorithms textbook is patentable
| either, but if you copy the words describing an algorithm
| from it, you are stealing their description.
| heavyset_go wrote:
| Mathematics is not patentable, but you can patent the steps
| a computer takes to compute the results of that particular
| algorithm.
| ChildOfChaos wrote:
| I think sadly it's just people being protective, the technology
| is interesting so if it doesn't hit their line of work, it's
| fantastic, if it does, then it's terrible.
|
| There is no arguing against it though, you can't stop it, all
| this stuff is coming eventually to all of these areas, might as
| well try and find ways to use the oppurutinies while you can
| while some of this is still new.
| naillo wrote:
| I mean we definitely _can_ stop it. Laws are a pretty strong
| deterrent.
| ghaff wrote:
| "We" maybe can't stop it. But if there were the political
| will to kneecap many uses of machine learning, it's not
| obvious there's any reason it _couldn 't_ be done even if
| not 100% effective. Whether that would be a good thing is a
| different question.
| faeriechangling wrote:
| You can slow this, you can't stop it whatsoever. It's about
| as ultimately futile as an effort as trying to stop piracy.
| People are ALREADY running salesforce codegen and stable
| diffusion at home, you can't put the genie back in the
| bottle, what we'll have 20 years from now is going to make
| critics of these tools have nightmares.
|
| If you try to outlaw it, the day before the laws come into
| effect, I'm going to download the very best models out
| there and run it on my home computer. I'll start organising
| with other scofflaws and building our own AI projects in
| the fashion of leelachesszero with donated compute time.
|
| You can shut down the commercial versions of these tools.
| You can scare large corporations from banning the use of
| these tools by corporations. You can pull an uno reverse
| card and use modified versions of the tools to CHECK for
| copyright infringement and sue people under existing laws
| AND you'll probably even be able to statistically prove
| somebody is an AI user. But STOPPING the use of these
| tools? Go ahead and try, won't happen.
| tablespoon wrote:
| > You can slow this, you can't stop it whatsoever. It's
| about as ultimately futile as an effort as trying to stop
| piracy. ... But STOPPING the use of these tools? Go ahead
| and try, won't happen.
|
| So? No one needs to _stop it totally_. The world isn 't
| black and white, pushing it to the fringes is almost
| certainly a sufficient success.
|
| Outlawing murder hasn't stopped murder, but no one's
| given up on enforcing those laws because of the futility
| of perfect success.
|
| > If you try to outlaw it, the day before the laws come
| into effect, I'm going to download the very best models
| out there and run it on my home computer. I'll start
| organising with other scofflaws and building our own AI
| projects in the fashion of leelachesszero with donated
| compute time.
|
| That sounds like a cyberpunk fantasy.
| faeriechangling wrote:
| Cyberpunk sure, but fantasy? Not at all.
| throwaway675309 wrote:
| You'll never be able to push it to the fringes because
| there will never be a legal universal agreement even from
| country to country on where to draw the line.
|
| And as computers get more powerful and the models get
| more efficient it'll become easier and easier to self
| host and run them on your own dime. There are already one
| click installers for generative models such as stable
| diffusion that run on modest hardware from a few years
| back.
| tpm wrote:
| What would the law do? Forbid automatic data collection
| and/or indexing and further use without explicit copyright
| holder agreement? That would essentially ban the whole
| internet as we know it, not saying that would be bad, but
| this is never going to happen, too much accumulated
| momentum in the opposite direction.
| chiefalchemist wrote:
| To your point, the law can do a lot of things. The issue
| here is the clarity and ability to enforce the law.
| [deleted]
| machinekob wrote:
| I'm pretty sure DALL-E was trained only on not copyright
| material ( they say so :| ).
|
| But to be honest if your code is open source im pretty sure
| Microsoft don't care about licence they'll just use it cause
| "reasons" same about stable diffusion they don't give a fuk
| about data if its in internet they'll use it so its topic that
| probably will be regulated in few years.
|
| Until then lets hope they'll get milked (both Microsoft and
| NovelAI) for illegal content usage and I srsly hope at least
| few layers will try milking it asap especially NovelAI which
| illegally usage a lot of copyrighted art in the training data.
| msbarnett wrote:
| > I'm pretty sure DALL-E was trained only on not copyright
| material
|
| Nope. DALL-E generates images with the Getty Watermark, so
| clearly there's copyrighted materials in its training set: ht
| tps://www.reddit.com/r/dalle2/comments/xdjinf/its_pretty_o...
| pclmulqdq wrote:
| Lots of people ironically put the Getty watermark on
| pictures and memes that they make to satirically imply that
| they are pulling stock photos off the internet with the
| printscreen function instead of paying for them.
| msbarnett wrote:
| Memes generally would not fall under the category of non-
| copyrighted material; they're most of the time extremely
| copyrighted material just being used without permission.
| And even a wholly original work an artist sarcastically
| puts a Getty watermark and then licensed under Creative
| Commons or something would fall into very murky territory
| - the Getty watermark itself is the intellectual property
| of Getty. The original image author might plead fair use
| as satire, but satirical intentions aren't really a
| defence available to DALL-E.
|
| So even if we're assuming these were wholly original
| works that the author placed under something like a
| Creative Commons license, the fact that it incorporated
| an image they had no rights to would at the very least
| create a fairly tangled copyright situation that any
| really rigorous evaluation of the copyright status of
| every image in the training set would tend to argue
| towards rejecting as not worth the risk of litigation.
|
| But the more likely scenario here is that they did
| minimal at best filtering of the training set for
| copyrights.
| pclmulqdq wrote:
| You could argue that mocking the Getty logo like that is
| some form of fair use, which would be a backdoor through
| which it can end up as a legitimate element of a public
| domain work, in which case it would be fair game.
|
| I agree with you that it is also possible that people
| posted Getty thumbnails to some sites as though they are
| public domain, and that is how the AIs learned the
| watermark.
| nottorp wrote:
| Dunno about Getty, but I've been shown the cover for
| Beatles' Yellow Submarine done in different colors as some
| great AI advancement.
| machinekob wrote:
| Thanks for posting this out never see that before. If they
| use copyright images they should also get punished in the
| original paper they say no copyright content was used but
| it can be just lies who know data speak for itself and if
| they can prove this in court they should get punished ( so
| again Microsoft getting rekt for that will be good to see
| :] ).
| tpxl wrote:
| When Joe Rando plays a song from 1640 on a violin he gets a
| copyright claim on Youtube. When Jane Rando uses devtools to
| check a website source code she gets sued.
|
| When Microsoft steals all code on their platform and sells it,
| they get lauded. When "Open" AI steals thousands of copyrighted
| images and sells them, they get lauded.
|
| I am skeptical of imaginary property myself, but fuck this one
| set of rules for the poor, another set of rules for the masses.
| lo_zamoyski wrote:
| The poor are the masses, or at least part of the masses.
| gw99 wrote:
| If this is the new status quo then I suggest we find out how
| to fuck up the corpus as best as possible.
| a4isms wrote:
| > one set of rules for the poor, another set of rules for the
| masses.
|
| _Conservatism consists of exactly one proposition, to wit:_
|
| _There must be in-groups whom the law protects but does not
| bind, alongside out-groups whom the law binds but does not
| protect._
|
| --Composer Frank Wilhoit[1]
|
| [1]: https://crookedtimber.org/2018/03/21/liberals-against-
| progre...
| thrown_22 wrote:
| sbuttgereit wrote:
| Thanks for posting the link to the quote. Having said that,
| I don't think it's possible to quote that bit and get an
| understanding of the idea being conveyed without it's
| opening context. Indeed, it's likely to cause a false idea
| of what's being conveyed. From earlier in the same post:
|
| _" There is no such thing as liberalism -- or
| progressivism, etc.
|
| There is only conservatism. No other political philosophy
| actually exists; by the political analogue of Gresham's
| Law, conservatism has driven every other idea out of
| circulation."_
| a4isms wrote:
| I agree that adds considerable depth to the value of the
| quote, and connects it to the conversation he appeared to
| be having, which is about the first line you've quoted:
|
| There is no such thing as being a Liberal or Progressive,
| there is only being a Conservative or anti-Conservative,
| and while there is much nuance and policy to debate about
| that, it boils down to deciding whether you actually
| support or abhor the idea of "the law" (which is a much
| broader concept than just the legal system) existing to
| enforce or erase the distinction between in-groups and
| out-groups.
|
| But that's just my read on it. Getting back to
| intellectual property, it has become a bitter joke on
| artists and creatives, who are held up as the
| beneficiaries of intellectual property laws in theory,
| but in practice are just as much of an out-group as
| everyone else.
|
| We are bound by the law--see patent trolls, for example--
| but not protected by it unless we have pockets deep
| enough to sue Disney for not paying us.
| stickfigure wrote:
| Yeah, inequality sucks. So how about we focus on making the
| world better for everyone instead of making the world equally
| shitty for everyone?
| imwillofficial wrote:
| This makes no sense.
|
| Absolutely nobody is arguing to make the world shittier
| zopa wrote:
| Because we're not the ones with the power. People with
| limited power pick the fights they might win, not the
| fights that maximize total welfare for everyone including
| large copyright holders. There's no moral obligation to be
| a philosopher king unless you're actually on a throne.
| foobarbecue wrote:
| > one set of rules for the poor, another set of rules for the
| masses
|
| Presumably by "the masses" you meant "the large
| corporations"?
| rtkwe wrote:
| I think copilot is a clearer copyright violation than any of
| the stable diffusion projects though because code has a much
| narrower band of expression than images. It's really easy to
| look at the output of CoPilot and match it back to the
| original source and say these are the same. With stable
| diffusion it's much closer to someone remixing and aping the
| images than it is reproducing originals.
|
| I haven't been following super closely but I don't know of
| any claims or examples where input images were recreated to a
| significant degree by stable diffusion.
| e40 wrote:
| Preach. So incredibly annoyed when I tried to send a video of
| my son playing Beethoven to his grandparents and it was taken
| down due to a copyright violation.
| c7b wrote:
| > When Joe Rando plays a song from 1640 on a violin he gets a
| copyright claim on Youtube. When Jane Rando uses devtools to
| check a website source code she gets sued.
|
| Do you have any evidence for those claims, or anything
| resembling those examples?
|
| Music copyright has long expired for classical music, and big
| shots are definitely not exempt from where it applies. Just
| look at how much heat Ed Sheeran, one of the biggest
| contemporary pop stars, got for "stealing" a phrase that was
| literally just chanting "Oh-I" a few times (just to be clear,
| I am familiar with the case and find it infuriating that this
| petty rent-seeking attempt went to trial at all, even if
| Sheeran ended up being completely cleared, but to great
| personal distress as he said afterwards).
|
| And who ever got sued for using dev tools? Is there even a
| way to find that out?
| banana_giraffe wrote:
| https://twitter.com/mpoessel/status/1545178842385489923
|
| Among many others. Classical music may have fallen into
| public domain, but modern performances of it is
| copyrightable, and some of the big companies use copyright
| matching systems, including YouTube's, that often flags new
| performances as copies of recordings.
| codefreakxff wrote:
| There have been a number of stories about musicians being
| copyright claims. Here is the first result on Google
|
| https://www.radioclash.com/archives/2021/05/02/youtuber-
| gets...
|
| For being sued for looking at source here is the first
| result on Google
|
| https://www.wired.com/story/missouri-threatens-sue-
| reporter-...
| frob wrote:
| Just to be clear, because it's in the title, the reporter
| was threatened with a lawsuit for looking at source code.
| I cannot find anyone acually sued for it. BTW, here's an
| article saying said reporter wasn't sued: https://www.the
| register.com/AMP/2022/02/15/missouri_html_hac...
|
| Anyone with a mouth can run it and threaten a lawsuit. If
| fact, I threaten to sue you for misinformation right now
| unless you correct your post. Fat lot of good my threat
| will do because no judge in their right mind would
| entertain said lawsuit because it's baseless.
| c7b wrote:
| Ok - it is a true shame that the YouTube copyright claim
| system is so broken as to enable those shady practices,
| and that politicians still haven't upped their knowledge
| of the internet beyond a 'series of tubes'.
|
| But surely the answer should be to fix the broken YT
| system and to educate politicians to abstain from
| baseless threats, not to make AI researchers pay for it?
| insanitybit wrote:
| > Joe Rando plays a song from 1640 on a violin he gets a
| copyright claim on Youtube
|
| That can't possibly be a valid claim, right? AFAIK copyright
| is "gone" after the original author dies + ~70 years. Before
| fairly recently it was even shorter. Something from 1640
| surely can't be claimed under copyright protection. There are
| much more recent changes where that might not be the case,
| but 1640?
|
| > When Jane Rando uses devtools to check a website source
| code she gets sued.
|
| Again, that doesn't sound like a valid suit. Surely she would
| win? In the few cases I've heard of where suits like this are
| brought against someone they've easily won them.
| cipherboy wrote:
| The poster isn't claiming that this is a valid DMCA suit.
| Nearly everyone who is at a mildly decent level and has
| posted their own recordings of classical musical to YouTube
| have received these claims _in their Copyright section_.
| YouTube itself prefixes this with some lengthy disclaimer
| about how this isn't the DMCA process but that they reserve
| the right to kick you off their site based on fraudulent
| matches made by their algorithms.
|
| They are absolutely completely and utterly bullshit. Nobody
| with half an ear for music will mistake my playing of
| Bach's G Minor Sonata with Arthur Grumiaux (too many out of
| tune notes :-D). But yet, YouTube still manages to match
| this to my playing, probably because they have never heard
| it before now (I recorded it mere minutes before).
|
| So no, it isn't a valid claim, but this algorithm trained
| on certain examples of work, manages to make bad
| classifications with potentially devastating ramifications
| for the creator (I'm not a monetized YouTube artist, but if
| this triggered a complete lockout of my Google account(s),
| this likely end Very Badly).
|
| I think it's a very relevant comparison to the GP's
| examples.
| alxlaz wrote:
| > That can't possibly be a valid claim, right?
|
| It's not, but good luck talking to a human at Youtube when
| the video gets taken down.
|
| > Again, that doesn't sound like a valid suit. Surely she
| would win?
|
| Assuming she could afford the lawyer, and that she lives
| through the stress and occasional mistreatment by the
| authority, yes, probably. Both are big ifs, though.
| lbotos wrote:
| > That can't possibly be a valid claim, right?
|
| I'm not a lawyer, but my understanding is that while the
| "1640's violin composition" _itself_ may be out of
| copyright, if I record myself playing it, _my recording of
| that piece is my copyright_. So if you took my file
| (somehow) and used it without my permission, and I could
| prove it, I could claim copyright infringement.
|
| That's my understanding, and I've personally operated that
| way to avoid any issues since it errs on the side of
| safety. (Want to use old music, make sure the license of
| the recording explicitly says public domain or has license
| info)
| vghfgk1000 wrote:
| insanitybit wrote:
| Yes, that sounds right to me. But that's not relevant to
| "Joe Whoever played it and got sued".
| lupire wrote:
| The problem is that YouTube AI thinks your recording is
| the same as every other recording, because it doesn't
| understand the difference between composition and
| recording.
| Rimintil wrote:
| > That can't possibly be a valid claim, right?
|
| It has indeed happened.
|
| https://boingboing.net/2018/09/05/mozart-bach-sorta-
| mach.htm...
|
| Sony later withdrew their copyright claim.
|
| There are two pieces to copyright when it comes to public
| domain:
|
| * The work (song) itself -- can't copyright that
|
| * The recording -- you are the copyright owner. No one,
| without your permission, can re-post your recording
|
| And of course, there is derivative work. You own any
| portion that is derivative of the original work.
| insanitybit wrote:
| > Sony later withdrew their copyright claim.
|
| Right, that's my point... I can sue anyone for anything,
| doesn't mean I'll win.
| imwillofficial wrote:
| It worked out justified in this case.
|
| The VAST majority of cases it does not.
| sumedh wrote:
| > I can sue anyone for anything, doesn't mean I'll win.
|
| You cant sue if you dont have money, a big corp can sue
| even if they know they are wrong.
| pessimizer wrote:
| > Again, that doesn't sound like a valid suit. Surely she
| would win? In the few cases I've heard of where suits like
| this are brought against someone they've easily won them.
|
| That's freedom of speech for everyone who can afford a
| lawyer to bring suit against a music rights-management
| company.
| insanitybit wrote:
| Yes, this is a problem with the legal system in general.
| kevin_thibedeau wrote:
| The songwriter copyright is expired but there is still a
| freshly minted copyright on the video and the audio
| performance.
|
| This becomes particularly onerous when trolls claim
| copyright on published recordings of environmental sounds
| that happen to be similar but not identical to someone
| else's but they do have a legitimate claim on the original
| recording.
| Rodeoclash wrote:
| This isn't a legal copyright claim, it's a "YouTube"
| copyright claim which is entirely owned and enforced by
| YouTube.
| insanitybit wrote:
| OK but then we're just talking about content moderation,
| which seems like a separate issue. I think using "YouTube
| copyright claim" as a proxy for "legal copyright claim"
| is more to the parent's point, especially since that's
| how YouTube purports the claim to work. Otherwise it
| feels irrelevant.
| cipherboy wrote:
| Copyright claims are a form of content moderation, by
| preventing reuse of content that others own.
|
| But it can still be weaponized to prevent legitimate
| resubmissions of parallel works, that can potentially
| deplatform legitimate users, depending on the reviewer
| and the clarity of the rebuttal.
| lupire wrote:
| YouTube does this moderation in order to avoid legal
| pressure from copyright holders, as in
|
| https://en.m.wikipedia.org/wiki/Viacom_International_Inc.
| _v.....
| cyanydeez wrote:
| Basically, copyright is for people with copyright lawyers
| kodah wrote:
| That's not even a joke. One of the premises of a copyright
| is that you defend your intellectual property or lose it.
| If the system were more equitable then it would defend your
| copyright.
| heavyset_go wrote:
| You're thinking of trademarks.
| eropple wrote:
| This is an inaccurate description of copyright, at least
| in the United States.
|
| Trademarks require active defense to avoid
| genericization. Copyright may be asserted at the holder's
| discretion.
| heavyset_go wrote:
| Your post is a good example of the _tu quoque_ fallacy[1].
|
| [1] https://en.wikipedia.org/wiki/Tu_quoque
| tablespoon wrote:
| > I've noticed that people tend to disapprove of AI trained on
| their profession's data, but are usually indifferent or
| positive about other applications of AI.
|
| In other words: the banal observation that people care far more
| when their stuff is stolen than when some stranger has their
| stuff stolen.
| lerpgame wrote:
| deworms wrote:
| As an asie, this code is an unreadable mess, for a guy
| brandishing his credentials even in his github username you'd
| think he'd know a thing or two about clean code.
| stonogo wrote:
| Feel free to send patches.
| deworms wrote:
| Why would I waste time doing this?
| kortilla wrote:
| Because you were already willing to waste time panning his
| code on a public forum. Maybe do something constructive
| instead of destructive if your time is so precious.
| [deleted]
| ahmedbaracat wrote:
| " AI-focused products/startups lack a business model aligning the
| incentives of both the company and the domain experts (Data
| Dignity)"
|
| https://blog.barac.at/a-business-experiment-in-data-dignity
|
| Yes I am quoting myself
| faeriechangling wrote:
| Not your repo, not your code.
|
| I celebrate Microsofts shameless plundering of Github to create
| new products that increase productivity. The incredible thing is
| that people trusted Microsoft to use their code on their terms to
| begin with. This is a company who has been finding ways to make
| open source code into a proprietary product since the 90s.
|
| Nobody can stop people from replicating what Microsoft did in the
| long run anyways. Eventually any consumer with enough access to
| source code will be able to make their own copilot. Even if
| copilot is criminalised Microsoft can just sell access to the
| entire GitHub dataset and let other people commit the "crime".
| Then you're right back where we started with having to sue the
| end users of copilot for infringement instead of Microsoft.
|
| Use private repos or face the inevitability that copilot-like
| products will scrape your code.
| ilrwbwrkhv wrote:
| Of course it does. What are you going to do? Sue them?
| ralph84 wrote:
| Ok. So instead of whining about it on Twitter sue GitHub. No
| matter what you think of Copilot, establishing some case law on
| AI-generated code will be beneficial to everyone.
| mjr00 wrote:
| Whining about it on Twitter = free and easy
|
| Suing Github = signing up for a ~decade long incredibly
| expensive and time-consuming legal battle against one of the
| richest companies in the world
|
| There may be a slight difference in effort between these two
| options.
| anonydsfsfs wrote:
| Not to mention Microsoft could countersue using their
| enormous patent war chest, which they have a history of
| doing[0]
|
| [0] https://techcrunch.com/2012/03/22/microsoft-and-tivo-
| drop-th...
| ghaff wrote:
| It goes beyond code. Also photos, art, text, etc. Be careful
| what you wish for. Whether you like it or not, with a stroke of
| a pen Congress or the Supreme Court in the US could probably
| wipe out the legal use of a huge amount of the training data
| used for ML.
| adastra22 wrote:
| Good.
| Jevon23 wrote:
| Good! Large corporations shouldn't be able to profit off of
| other people's data without consent or compensation.
| drstewart wrote:
| Great! I assume you believe all search engines should be
| illegal then?
| belorn wrote:
| Accessing a computer system without permission is
| illegal. Search engines operate under the assumption that
| they have permission to access any public available
| server unless explicitly forbidden.
|
| If a company or person assume they got copyright
| permission to any work public accessible then they will
| quickly find out that such assumption is wrong, and that
| they require explicit permission.
| ghaff wrote:
| >Search engines operate under the assumption that they
| have permission to access any public available server
| unless explicitly forbidden.
|
| And why should opt-out be a reasonable norm? To be clear,
| the internet (among many other things) breaks down if
| every exchange of information is opt-in. Sharing of
| photographs taken in public places is another example.
| But the internet basically functions because people share
| information on an opt-out basis (that may or may not even
| be respected).
| ghoward wrote:
| Search engines don't sell the information of others; they
| sell certain _metadata_ of that information, namely, the
| _location_ of that information.
| ghaff wrote:
| And excerpts of that information in many cases.
| res0nat0r wrote:
| The repo he linked to on twitter is a public repo though. Am I
| missing something?
|
| https://twitter.com/DocSparse/status/1581462433335762944
| tpxl wrote:
| Public != copyright free.
| taspeotis wrote:
| > The repo he linked to on twitter is a public repo though. Am
| I missing something?
|
| I dunno the title says it used public code when it was meant to
| block public code.
| kurtoid wrote:
| I think they're more concerned about it repeating code w/o
| ownership/copyright labels
| Waterluvian wrote:
| I think people may be drastically over-valuing their code. If it
| was emitting an entire meaningful product, that would be
| something else. But it's emitting nuts and bolts.
|
| If the issue is more specifically copyright infringement, then
| leverage the legal apparatus in place for that. Their lawyers
| might listen better.
|
| This is not a strongly held opinion and if you disagree I would
| love to hear your constructive thoughts!
| jacooper wrote:
| I mean it starts like this, but if Copilot gets a pass,
| companies might just use AI as a way to launder code and avoid
| complying with Free licenses.
| chiefalchemist wrote:
| To some extent I agree with your opening. That is, plenty of
| cases CP is showing how mundane most code is. It's one
| commodity stitched to another stitched to another.
|
| That's not considering any legal / license issues, just a
| simple statement about the data used to train CP.
| mjr00 wrote:
| Same issue with Stable Diffusion/NovelAI and certain people's
| artwork (eg Greg Rutkowski) being obviously used as part of the
| training set. More noticeable in Copilot since the output needs
| to be a lot more precise.
|
| Lawmakers need to jump on this stuff ASAP. Some say that it's no
| different from a person looking at existing code or art and
| recreating it from memory or using it as inspiration. But the law
| changes when technology gets involved already, anyway. There's no
| law against you and I having a conversation, but I may not be
| able to record it depending on the jurisdiction. Similarly,
| there's no law against you looking at artwork that I post online,
| but it's not out of question that a law could exist preventing
| you from using it as part of an ML training dataset.
| SrslyJosh wrote:
| > Some say that it's no different from a person looking at
| existing code or art and recreating it from memory or using it
| as inspiration.
|
| Hah, no, the model encodes the code that it was trained on.
| This is not "recreating from memory", this is "making a copy of
| the code in a different format." (Modulo some variable
| renaming, which it's probably programmed to do to in order to
| obscure the source of the code.)
| CapsAdmin wrote:
| I would imagine the root problem here is people taking
| copyrighted code, pasting it in their project and disregarding
| the license. To me this seems common, especially when it comes to
| toy, test and hobby projects.
|
| I don't see how copilot or similar tools can solve this problem
| without vetting each project.
| yjftsjthsd-h wrote:
| That's an entirely plausible explanation, but it doesn't mean
| that Microsoft has any less of a legal nightmare on their
| hands.
| CapsAdmin wrote:
| I'm not really sure what I think about this. How responsible
| should Microsoft be for someone's badly licensed code on
| their platform? If they somehow had the ability to ban
| projects using stolen snippets of code, I don't think I'd
| dare to host my hobby projects there.
|
| If you can't trust that the code in a project is compatible
| with the license of the project then the only option I see is
| that copilot cannot exist.
|
| I love free software and whatnot, but I have a feeling this
| situation would've been quite different if copilot was made
| by the free software community and accidentally trained on
| some non free code..
| yjftsjthsd-h wrote:
| > I love free software and whatnot, but I have a feeling
| this situation would've been quite different if copilot was
| made by the free software community and accidentally
| trained on some non free code..
|
| _Precisely._ Would it be okay for me to publish some code
| as GPL because my buddy gave it to me and promised that it
| was totally legit and I could use it and it definitely wasn
| 't copy-pasted from one of the Windows source leaks?
|
| > If you can't trust that the code in a project is
| compatible with the license of the project then the only
| option I see is that copilot cannot exist.
|
| It might be possible to feed it only manually-vetted
| inputs, but yes; as it currently is, Copilot appears to be
| little but a massive copyright-infringement engine.
| CapsAdmin wrote:
| > Precisely. Would it be okay for me to publish some code
| as GPL because my buddy gave it to me and promised that
| it was totally legit and I could use it and it definitely
| wasn't copy-pasted from one of the Windows source leaks?
|
| But where do you draw the line? What if you accidentally
| came up with the same or similar solution to something in
| windows? The code might not be from your friend either,
| it could be from N steps of copy paste, rework,
| reformating, refactoring, etc.
| yjftsjthsd-h wrote:
| > But where do you draw the line? What if you
| accidentally came up with the same or similar solution to
| something in windows?
|
| Yes, I agree that it's unclear how to deal with that in
| the general case at scale. Although cases like OP make me
| think that we could maybe worry about the grey area after
| we've dealt with the blatant copies.
|
| > The code might not be from your friend either, it could
| be from N steps of copy paste, rework, reformating,
| refactoring, etc.
|
| Well, my personal tendency would be to apply the same
| standard to Microsoft that they would apply to us. How
| many steps of removal is needed to copy MS proprietary
| code and it be okay?
| [deleted]
| williamcotton wrote:
| Is the code in question even covered by copyright in the first
| place? It seems utilitarian in nature.
|
| Oh, the comments! Those are covered by copyright for sure.
| williamcotton wrote:
| You know, I make it a habit of not trying to get upset by
| downvotes but this is really absurd. What am I saying that is
| incorrect? Am I being rude? What exactly do you disagree with?
| williamcotton wrote:
| Like, should I just stop interacting with people on this
| website? Is that the intent? To make me just go away?
___________________________________________________________________
(page generated 2022-10-16 23:00 UTC)