[HN Gopher] Tool to convert copyrighted music into fair use
___________________________________________________________________
Tool to convert copyrighted music into fair use
Author : alphabet9000
Score : 180 points
Date : 2021-07-10 20:15 UTC (2 hours ago)
(HTM) web link (fairuseify.ml)
(TXT) w3m dump (fairuseify.ml)
| lyxell wrote:
| I remember that the guys behind The Pirate Bay actually made a
| service like this back in the days. You would submit a song and
| get a mashup back of cuts from other songs where each cut would
| be short enough to fall under fair use. I can't find any
| references to it online anymore though. Maybe someone else
| remembers what the service was called.
| rikkipitt wrote:
| I don't remember the site, but it reminds me of the Girl Talk
| album called "All Day" -
| https://en.wikipedia.org/wiki/All_Day_(Girl_Talk_album). It was
| originally released as a free digital download.
|
| > Greg Gillis composed the album using overlapping samples of
| 372 songs by other artists.
|
| This article goes into it a bit more: "Girl Talk, Fair Use, and
| Three Hundred Twenty-Two Reasons for Copyright Reform" -
| https://jipel.law.nyu.edu/ledger-vol-1-no-1-4-pearl/
| [deleted]
| Black101 wrote:
| He should not have faked the machine learning, but I like the
| idea.
| cjohansson wrote:
| Hilarious stuff
| not2b wrote:
| I was kinda hoping it would change the song enough to get it past
| Youtube's copyright filters, but apparently not.
| konstruction wrote:
| Hilarious :-)
| er4hn wrote:
| Finally, Copilot for Music!
| laurent92 wrote:
| Strangely, a lookalike of a music hit is nothing like the
| original, and it's worth analyzing!
|
| - Music is a vehicle for a common experience. Everyone knows
| the next notes of some Lady Gaga song. We feel like learning
| the lyrics will make us able to sing together if we were in a
| club, and share something with other clubbers. Any AI who would
| reproduce the voice and instruments would still not make you
| feel like you are sharing a common moment with the rest of the
| auditors,
|
| - Hits are hits because we hear them a thousand times. It's
| been proven that people don't necessarily like it the first
| time. It's the familiarity with the song which make us like it
| (or hate it when we've heard to too much).
|
| - Even worse: We like some songs even more because we love the
| author. Be it because they are politically involved, have a
| cute face, has a nice life story, or seem to hide answers to
| life in the lyrics of their work - But an AI producing the same
| exact notes wouldn't trigger similar affection from us. It's
| like hearing our kid singing: Very cute, but we wouldn't like
| the same song by another kid. Audiences have a genuine
| emotional attachment to the authors. It's especially visible
| since the MCM revolution: Before MCM, music mattered; Now the
| image matters way more, bands have a face, a graphic style, a
| story to tell - and music could be as crap as possible, if we
| like the band it can still have success. MCM changed music
| forever, proving that AI can't replace that feeling.
|
| Can it?
| imwillofficial wrote:
| I hear a lot of stated assumptions on how certain things
| trigger emotional investment and other don't.
|
| If you knew how manufactured the music industry was, and how
| nothing of what you see of celebrities is true, it might as
| well be AI plucking our heart strings, because it isn't
| "real" in the sense that I think you mean, authentic human
| connection over shared experience.
| [deleted]
| anderber wrote:
| What are the criteria that this tool uses to determine something
| to be fair use?
| sumnole wrote:
| It's a joke ragging on Github Copilot, which suggests to its
| users code on github regardless of its copyright. The claim is
| that any code written with Copilot does not infringe since it's
| 'machine-generated code'. Github Copilot takes github code,
| learns it and then feeds it to users based on prompts but you
| can end up essentially copy and pasting an entire copyrighted
| snippet. This satire site takes your uploaded mp3, 'learns it'
| and hands you back the same mp3.
| anderber wrote:
| Ah, thank you for the explanation!
| amelius wrote:
| Nice try, but ... Co-pilot can be sued for copyright infringement
| in specific cases. Therefore it doesn't mean you can get away
| with copyright infringement if you copy Co-pilot's model.
| habibur wrote:
| I remember Web Font Player before dynamic fonts became available.
| You could upload a copyrighted font. Say Microsoft or Apple's
| font. It will trace that font and generate your "Web Font" and
| then you could use it without any copyright issue as that's not
| the original font rather a machine learnt one.
|
| Guess fonts are still like this.
| MeinBlutIstBlau wrote:
| So having tried it out the song sounds...exactly the same. So
| does this just make it that when it's played these detection
| systems can't pick it up since it's somewhat different? Or if I
| make a commercial product, include this version of the song, I
| can somehow afford lawyers to defend myself when the music
| industry sues me for using what sounds like the same song, just
| with the 1's and 0's ordered a little differently?
|
| Edit: was out of the loop on the joke...
| detaro wrote:
| It's a joke about GitHub Copilot.
| [deleted]
| abetusk wrote:
| I think that's the joke. It literally takes the exact same song
| unaltered but it says it's "using machine learning", "fair use"
| etc. to give the pretense of it being legitimate.
|
| This is most likely a commentary on GitHub co-pilot and how the
| authors of this joke think that GitHub co-pilot is violating
| copyright and does not fall under fair-use.
|
| I just confirmed the "processed" file has the same SHA256 sum
| as the original.
|
| EDIT: I incorrectly labelled at Google co-pilot instead of
| GitHub co-pilot. Fixed.
| barbecue_sauce wrote:
| Github Copilot.
| [deleted]
| laurent92 wrote:
| The code source is setTimeout(..., random()). I'd say, even if
| it takes long to build the neural network, it is very CPU
| efficient.
| sycren wrote:
| By uploading the licensed music in the first place, are we not
| breaching copyright law?
| pornel wrote:
| It's not uploading, it's making available for scraping.
| Hamuko wrote:
| Obviously machine learning is fair use, so it supersedes
| copyright.
| laurent92 wrote:
| I sense a sleigh of bitterness in the programming community
| after Copilot ;) It's the rumbling sound of the imagination
| of a thousand people throwing the towel saying "What now".
| Hamuko wrote:
| Yeah, why would anyone be bitter about a giant corporation
| creating a commercial code laundering machine that digests
| a massive amount of copyrighted code and spits out "clean"
| code free of all the burdens of its inputs?
| wizzwizz4 wrote:
| Wow. And it's entirely client-side, too! Impressive.
| steelbrain wrote:
| In case you don't get it, view the source :)
|
| It's just a bunch of a sleep(random()) and visual changes on
| viewport and you download the exact file you uploaded
| londons_explore wrote:
| I tried to play the original and downloaded music a bunch of
| times to try to figure out any differences...
| cblconfederate wrote:
| The page code was written by co-pilot
| gavinray wrote:
| Damn, I read this as "Tool, the band, is converting all of their
| copyrighted music into fair-use music." and got excited.
|
| But this is funny too I guess
| IshKebab wrote:
| How do you go to this much effort to make a point without even
| reading about how copyright and fair use works? There have been
| multiple comments on HN and Reddit explaining how it doesn't work
| like this.
| [deleted]
| ChristianGeek wrote:
| Great way for the owner of the site to build up a library of free
| music!
| IceHegel wrote:
| audiophiles gotta try this! it makes the music soo much better
| speedgoose wrote:
| Have you tried Github co-pilot? It's not going to copy paste the
| Linux source code, like Dungeon AI is not going to copy paste a
| Tolkien book.
| andersource wrote:
| Most of the time, but it _can_ , and I think that's the issue a
| lot of people have with it.
|
| https://twitter.com/mitsuhiko/status/1410886329924194309
|
| https://news.ycombinator.com/item?id=27710287
| zxzax wrote:
| I don't understand why anyone has an issue with that. You
| know what else can copy and paste code all the time? Humans.
| But we have various ways of stopping employees who copy and
| paste code from stackoverflow and github without checking the
| license, so it's the same thing if you use one of these
| tools. There's nothing new I can see here to be upset about.
|
| This would be a lot more interesting if it showed the various
| GPT-3 experiments at generating music and used that as a
| point of comparison.
| paulgb wrote:
| > But we have various ways of stopping employees who copy
| and paste code from stackoverflow and github without
| checking the license
|
| What would those be? I've worked at a number of
| organizations that were (rightfully) paranoid about
| accidentally incorporating GPL code, but even there I
| wasn't aware of automated tooling to prevent it, it was
| only enforced through developer vigilance.
| zxzax wrote:
| If you actually want a paid service, there are plagiarism
| detectors like Fossa and Codequiry. Although in my
| opinion, code review should be enough to catch any
| "accidental" incidents of plagiarism, the differences in
| writing style should make it very obvious when the
| employee has copied something. That of course probably
| won't apply if you suspect the employee is intentionally
| changing it around to obfuscate the origin of the code,
| but it seems that wouldn't be the case if they were just
| committing the output straight from a neural net. But
| automated scanners probably won't be able to catch those
| well either -- the way to catch that would be to make
| them do pair programming a lot.
| reader_mode wrote:
| >Although in my opinion, code review should be enough to
| catch any "accidental" incidents of plagiarism, the
| differences in writing style should make it very obvious
| when the employee has copied something.
|
| You must do some CSI level code reviews. Best I'm able to
| do is figure out if code will work and if something can
| be done obviously better. Stylistic calls (beyond lint
| enforceable) are up to authors as far as I'm concerned.
|
| And even then it's trivial to fix up naming schemes and
| such to march codebase - doubt that gets you out of
| copyright issues.
| UncleMeat wrote:
| This truly is the engineer's disease. Hundreds of incredibly
| strong opinions about the legal system derived almost
| entirely from a few tweets and zero experience outside of
| software engineering.
|
| Copilot is neat. If you are concerned about it, talk to a
| lawyer and get their opinion.
| speedgoose wrote:
| This fast inverse square root function is very well known,
| with even a Wikipedia page, and it is more than 20 years old.
| My country doesn't have software patents but it seems that
| the standard duration of a software patent is 20 years, so
| even if this function was patented, the patent would have
| expired by now.
| lilyball wrote:
| Copyright and patent are different. Also, while you can't
| copyright an algorithm, your specific source code that
| implements it is copyrighted (assuming it's sufficiently
| original).
|
| In this case it's not implementing the algorithm, it's
| copying a particular famous implementation, down to the
| comments.
| NautilusWave wrote:
| Copyright is different from patent. Copyright is
| (basically) forever.
| dublin wrote:
| There is no real reason for copyright terms to exceed
| patent terms.
|
| (And FWIW, patent terms should be inversely proportional to
| the number of patents issued in that category the previous
| year. This would automatically reduce terms in categories
| where innovation is rapid, promoting competition and drive
| to get to market, but preserve maximum protection for
| inventions in mature categories with a slower pace of
| innovation.
| nonbirithm wrote:
| So when are neural nets trained on images or text going to be
| confronted with the same copyright concerns? At the point that
| GitHub has forced the issue into the spotlight with Copilot I
| feel that it's only a matter of time before this reaches the
| courts. Nobody seemed to care about copyright at the time people
| were having fun creating AI dream collages or nonexistent anime
| girls from a model trained on the Danbooru imageset. In the
| latter case it's not clear that 100% of the original Pixiv and
| Twitter creators gave their consent to have their work rehosted
| on a different site in the first place, much less be involved in
| ML experiments. That data was from 2018.
|
| I'm almost tempted to believe that the people at GitHub knew this
| was going to blow up as much as it did as some kind of a
| challenge to the status quo of copyright and licensing, if only
| so that everyone would start talking about the issue. Why did the
| GitHub representative plainly state that Copilot was trained on
| all of GitHub's codebase without seeming to care about the
| pushback on Twitter and HN that was bound to happen as a result?
| jiminymcmoogley wrote:
| by the time the dinosaurs that dictate our laws begin to care
| about it, copyright will no longer exist
| tgv wrote:
| Congratulations. That's got to be a 100% accurate algorithm.
| hedora wrote:
| Ooh. They have a DARPA grant! Applying now.
| jjcon wrote:
| When did the HN crowd become so defensive of copyright? I
| understand the concerns on copilot but it's kinda weirding me
| out.
| throw0101a wrote:
| > _When did the HN crowd become so defensive of copyright?_
|
| Copyright is good in limited quantities. The current multi-
| decade time horizon is probably what a lot of people are
| against, and not the concept in general.
|
| And limited time period seems to be consistent through history.
| From the paper "Copyrights and Creativity: Evidence from
| Italian Opera in the Napoleonic Age":
|
| > _Comparing changes in the creation of new operas across
| Italian states with and without copyrights, we show that the
| adoption of basic copyrights encouraged the creation of new
| work. Moreover, we find that copyrights changed the quality of
| creative output by encouraging composers to produce more
| popular and durable works. These results generalize to a
| broader set of musical compositions and to librettos, as the
| literary component to the score of operas. Based on these
| findings, we conclude that the adoption of basic levels of
| copyright protection - not exceeding the lifetime of the
| composer - can help to raise both the quantity and the quality
| of new creative works._
|
| > _Importantly, we find that extensions in the length of
| copyright beyond the composer's life did not encourage
| creativity. Performance data reveal that few operas were played
| after the first 20 years, which suggests that only the most
| durable creative goods stand to gain from copyright
| extensions._ [...]
|
| * https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2505776
| ReactiveJelly wrote:
| Both the permissive and copyleft licenses are only enforcable
| through copyright law.
|
| I don't mind that copyright exists, I just wish it was better.
|
| Also there's a power difference between individuals violating
| the rights of a big company, and a big company violating the
| rights of many individuals.
|
| If Copilot isn't reined in, it feels like yet another case of
| "The laws only apply to poor people".
| carom wrote:
| I guarantee this is not Microsoft's announcement that they are
| forfeiting their copyrights. This is just them abusing the
| spirit of ours.
| hjek wrote:
| Say your AGPL code is Copiloted into someone's new program and
| they decide to release that under a non-free license; that's
| the issue. We're defensive of _copyleft_.
| aurelian15 wrote:
| As weird as it may seem, you should not forget that free
| software licenses are built upon the fabric of copyright.
| Without copyright, free software could not exist in its current
| form. For GPL-like "copyleft" licenses, there would be no way
| to enforce that binary distributions of derived works are
| accompanied by their source code. Similarly, in the context of
| permissive BSD/MIT-style licenses, there would be no way to
| enforce attribution.
|
| So, given that FOSS---which a large portion of the HN crowd
| depends on---cannot work without copyright (at least not in its
| current form), the recent discussions may be less of a
| surprise.
| cortesoft wrote:
| Maybe... although I personally think that the GPL and other
| 'copy left' licenses aren't the reason open source has
| prospered, nor do I think enforcing attribution really helps
| the FOSS world that much.
|
| People write and share code because it is useful to do that,
| not because licenses require them to.
|
| I think FOSS would do fine with no copyright, and in fact
| more software might end up open source if we had ZERO
| copyright... why not make your code open source and get back
| contributions when your code would end up being shared
| anyway?
| dublin wrote:
| There were other open source licenses at the party before
| the GPL dropped its controversial "viral" turd in the
| punchbowl - and many of them still exist nearly unmodified.
| (e.g. BSD with attribution removed, etc.)
| cortesoft wrote:
| I know, but I am just questioning whether any license is
| needed for open source to prosper.
|
| I am positing that if licenses didn't exist, and anyone
| could do anything they want with any bit of code they
| see, open source would still prosper.
| breck wrote:
| "The heathen are sunk down in the pit that they made: in the
| net which they hid is their own foot taken"
|
| Copyright is a horrible system. Microsoft has been one of the
| biggest proponents of that system. But now they've clearly
| violated it. They should either join in abolishing it, or face
| its consequences.
| michaelmrose wrote:
| Consider people's reaction to people selling boot leg DVDs vs
| torrenting a movie. Although people may consider both morally
| incorrect the corrupting profit motive results in the former
| being seen far more negatively. In the current situation there
| is also the matter that the Microsoft is still perceived
| rightly I think very negatively and open source authors very
| positively. Also in a David v Golliath situation nobody wants
| to be seen rooting for the giant.
|
| Personally I would be concerned about insert corp here
| accidentally stealing code from an open source project then
| years later going after the open source project for copyright
| infringement regarding the code they in fact stole from the
| open source project.
| PaulKeeble wrote:
| Because its my (and many of ours) code they have "learnt" from,
| stripped the license and are intending to sell on. When we
| listed code under MIT or GPL we meant those licenses, they
| weren't random and Microsoft just seems to be completely
| ignoring the reality of reproducing those works which are
| covered by those licenses, they are making code private and
| paid for that is open source. Not OK.
| bobthebuilders wrote:
| Using ddos-guard, does this sell my info to Russia?
| imwillofficial wrote:
| Isn't that service run out of a bunker in Norway or something?
| I remember they were in the news for something recently.
| sellyme wrote:
| I've seen a lot of people ragging on Copilot for "copy+pasting"
| code - does anyone have links to cases where it has done this
| without the user intentionally trying to generate a specific
| (extremely famous) code snippet?
|
| I've seen tons of comments here and on Reddit that talk about
| multiple instances of entire functions being copied verbatim, but
| the only thing even remotely close to that I've seen is the fast
| inverse square root, so I must have missed a few tweets or
| something.
| meibo wrote:
| It only seems to if you give it no or very little "source"
| input, like an empty file with a comment that says "// X
| algorithm".
|
| There's been a lot of bikeshedding on this, but GitHub
| decidedly hasn't given enough information on how it works and
| what the training dataset is, and the fair use question
| definitely needs to be answered, maybe even in court - it's
| just a matter of time.
| Hamuko wrote:
| > _what the training dataset is_
|
| All non-private repositories on GitHub.
| brutal_chaos_ wrote:
| Is it though? I'd assume that too, but we don't really
| know, do we? I mean to ask, what have Microsoft stated that
| leads you to believe this? (Maybe there's a press release I
| missed?)
| teraflop wrote:
| https://twitter.com/NoraDotCodes/status/14127413397714616
| 35
| IshKebab wrote:
| Github did an analysis and found that it does do it, though
| very rarely, and usually when it has little context (e.g. at
| the start of a file). They're working on detecting those cases
| though so it doesn't happen accidentally, so it is unlikely to
| be a realistic problem.
| dogecoinbase wrote:
| It's happily spitting out licenses and copyright notices with
| other people's names on them, it's pretty clearly half-baked.
___________________________________________________________________
(page generated 2021-07-10 23:00 UTC)