[HN Gopher] Stable-Audio-Demo
___________________________________________________________________
Stable-Audio-Demo
Author : beefman
Score : 476 points
Date : 2024-02-13 03:50 UTC (19 hours ago)
(HTM) web link (stability-ai.github.io)
(TXT) w3m dump (stability-ai.github.io)
| lbourdages wrote:
| This is right into the "uncanny valley" of music.
|
| It definitely sounded "like music", but none of it is what a
| human would produce. There's just something off.
| bane wrote:
| The overall audio quality sounds pretty good and it seems to do
| a good job of sustaining a consistent rhythm and musical
| concept. But I agree there's something "off" about some of the
| clips.
|
| - The rave music sounds great. But that's because EDM can be
| quite out there in terms of musical construction.
|
| - The guitar sounds weird because it doesn't sound like chords
| a human hand can make on a tuning nobody tunes their guitar to
| - with a strange mix of open and closed strings that don't make
| sense. I think the restrictions of what a guitar can do aren't
| well understood by the model.
|
| - The disco chord progression is bizarre. It doesn't sound bad,
| but it's unlikely to be something somebody working in the genre
| would choose.
|
| - meditation music - I mean, most of that genre may as well
| just be some randomized process
|
| - drum solo - there's some weird issues in some of the drum
| sounds, things like cymbals, rides and hats changing tone in
| the middle of a note, some of the toms sound weird, it sounds
| like a mix of stick and brush and stick and stick and brush all
| at the same time...it's sort of the same problem the solo
| guitar has where it's just not produced within the constraints
| of what a drum player can actually do on an instrument made of
| actual drums
|
| - sound effects, all are pretty good, a little chunky and low
| bit-rate or low sample-rate sounding, there's probably
| something going on in the network that's reducing the rate
| before it gets build back up. There's a constant sort of reverb
| in all of the examples
|
| I honestly can't say I prefer their model over some of the
| musicgen output even if their model is doing a better job at
| following the prompts in some cases.
|
| All of the models have a very low bitrate encoding problems and
| other weird anomalous things. Some of it reminds me of the
| output from older mp3 encoders, where hihats and such would get
| very "swishy" sounding. You can hear some of it in the
| autoencoder reconstructions, especially the trumpet and the
| last example.
|
| However, in any case, I'm actually glad in some ways to see the
| progress being made in this area. It's really impressive. This
| was complete science fiction only a very few years ago.
| darkwater wrote:
| > - drum solo - there's some weird issues in some of the drum
| sounds, things like cymbals, rides and hats changing tone in
| the middle of a note, some of the toms sound weird, it sounds
| like a mix of stick and brush and stick and stick and brush
| all at the same time...it's sort of the same problem the solo
| guitar has where it's just not produced within the
| constraints of what a drum player can actually do on an
| instrument made of actual drums
|
| And I would say that there is also background noise from time
| to time, at some point I heard some noise akin to voices.
| Maybe it is some artifact caused by the training data (many
| drum solos are performed exclusively live).
| otabdeveloper4 wrote:
| AI pictures are the same. We are more tolerant of six fingered-
| pictures with missing limbs, for some reason.
| lbourdages wrote:
| We're used to drawings, 3D renders, etc.
|
| There's no such thing as "artificial music" - at the very
| least, not since electronic music has become mainstream.
| RobinL wrote:
| Here is a silly song I generated using suno.ai, which I have
| found to be incredibly impressive (at least, a small percentage
| of its outputs are very good, most are bad). I think it's good
| enough that most humans wouldn't realise it's AI generated.
| https://app.suno.ai/song/8a64868d-9dd3-46db-91af-f962d4bec8b...
| Agraillo wrote:
| Very good for my taste, but I should clarify, I'm obsessed
| with catchy tunes, as a listener and as a hobby musician,
| growing my own brainworms from time to time. And I must say
| that suno.ai is very impressive, in my case semi-ready
| brainworms are almost always in 30%-50% cases. And what's
| more important, it's really an inspiration tool for all kinds
| of tasks, like lyrics polishing or playing-along after track
| separation. Maybe catchy melodies are not for all, but who
| can argue with charts when The Beatles, ABBA and Queen were
| almost always producers of ones.
| comex wrote:
| Wow. I'm guessing it's generating MIDI or something rather
| than synthesizing audio from scratch? Even so, the quality of
| the score is leaps and bounds better than any of the long-
| form audio on the Stable Audio demo page (either Stable Audio
| itself or the other models). The audio model outputs seem to
| take a sequence of 1 to 3 chords, add a barebones melody on
| top, and basically loop this over and over. When they deviate
| from the pattern, it feels unplanned and chaotic and they
| often just snap back to the pattern without resolving the
| idea added by the deviation. (Either that or they completely
| change course and forget what they were doing before.) Yes,
| EDM in particular often has repetitive chord structures and
| basic melodies, but it's not _that_ repetitive. In
| comparison, from listening to a few suno.ai outputs, they
| reliably have complex melodies and reasonable chord
| progressions. They do tend to be repetitive and formulaic,
| but the repetition comes on a longer time scale and isn't as
| boring. And they do sometimes get confused and randomly set
| off in a new direction, but not as often. Most of the time,
| the outputs sound like real songs. Which is not something I
| knew AI could do in 2024.
| RobinL wrote:
| I don't have any special insight into how it works, but I
| suspect it is largely synthesizing audio from scratch. The
| more I've thought about it, the task of generating music
| feels very similar to the task of text-to-speech with
| realistic intonation. So feels like the same techniques
| would be applicable.
|
| Suno do have an open source repo here that presumably uses
| similar tech: https://github.com/suno-ai/bark
|
| > Bark was developed for research purposes. It is not a
| conventional text-to-speech model but instead a fully
| generative text-to-audio model, which can deviate in
| unexpected ways from provided prompts. Suno does not take
| responsibility for any output generated. Use at your own
| risk, and please act responsibly.
|
| I've generated probably >200 songs now with Suno, of which
| perhaps 10 have been any good, and I can't detect any
| pattern in terms of the outputs.
|
| Here's another one which is pretty good. I accidentally
| copied and pasted the prompt and lyrics, and it's amazing
| to me how 'musically' it renders the prompt:
|
| https://app.suno.ai/song/d7bad82b-3018-4936-a06d-8477b400aa
| e...
|
| Here are a couple more which are pretty good (i use it
| primarily for making fun songs for my kids):
|
| https://app.suno.ai/song/a308ca8a-9971-47a3-8bb3-a95126ff1a
| 8...
|
| https://app.suno.ai/song/3b78a631-b52a-4608-a885-94f2edc190
| b...
|
| And this one's kindof interesting in that it can render
| 'gregorian chant' (i mean it's not very good): https://app.
| suno.ai/song/0da7502b-73cf-4106-88e8-26f4f465a5f...
|
| But this is one reason it feels like these models are very
| similar to text-to-speech but with a different training set
| Agraillo wrote:
| My understanding is that they use a side effect of the Bark
| model. The comment
| https://news.ycombinator.com/item?id=35647569 from
| JonathanFly probably explains it well. If you train your
| model on a massive amount of audio mixes of lyrics+music
| then prompting lyrics alone pulls the music with it as when
| the comment suggested that prompting context-correlated
| texts might pull the background noises usual for such
| context. Already while writing this I imagine training with
| a huge set of publicly performed poetry pieces that would
| allow generating novel performances of artificial poets
| with novel prompts. This is different to riffusion.com
| approach, where works the genius idea of more or less
| feeding spectrograms as images to Stable Diffusion.
| urbandw311er wrote:
| That's impressive. Why do the printed lyrics for the second
| chorus differ from the audio? (Which repeats those from the
| first chorus)
| RobinL wrote:
| I generated the lyrics using ChatGPT 4 and the suno model
| attempts to follow them.
|
| It generally does a good job, but I have noticed it's
| fairly common in a second chorus for it to ignore the
| direction and instead use the same lyrics as the first
| chorus
| npteljes wrote:
| That is fantastic. It has a bit of weirdness in the
| background, but nothing that would stop me from enjoying it.
| dcre wrote:
| One thing I noticed is that when it's playing chords, it seems
| a lot more likely than human players to put both major and
| minor thirds in. This isn't unheard of -- the famous Hendrix
| chord in "Purple Haze" consists of root, major third, 7th,
| minor third. But it sounds pretty weird when you do it in every
| chord.
| ShamelessC wrote:
| So there aren't public weights, is that right? Having trouble
| finding anything that says one way or the other.
|
| edit: Oh okay, didn't realize this was somehow a controversial
| comment to make. It would have been great if you had answered the
| question before downvoting but that's fine I suppose.
| grey8 wrote:
| Nope. They did release code for training, inference and fine
| tuning, but no datasets or weights.
|
| See https://github.com/Stability-AI/stable-audio-tools
| ShamelessC wrote:
| Thanks!
| turnsout wrote:
| Wonder if it's an IP issue. They don't want every record
| label coming after them.
| ShamelessC wrote:
| Yeah that tracks.
| 8n4vidtmkvmk wrote:
| The music is pretty meh but the sound effects are exciting for
| indie game dev!
| nullandvoid wrote:
| <deleted>
| AuryGlenz wrote:
| Too bad according to their page you need an enterprise license
| for even indie games.
| romanzubenko wrote:
| As with Stable Diffusion, text prompting will be the least
| controllable way to get useful output with this model. I can
| easily imagine midi being used as an input with control net to
| essentially get a neural synthesizer.
| zone411 wrote:
| Yes. Since working on my AI melodies project
| (https://www.melodies.ai/) two years ago, I've been saying that
| producing a high-quality, finalized song from text won't be
| feasible or even desirable for a while, and it's better to
| focus on using AI in various aspects of music making that
| support the artist's process.
| l33tman wrote:
| Emad hinted here on HN the last time this was discussed that
| they were experimenting with exactly that. It will come, by
| them or by someone else quickly.
|
| Text-prompting is just a very coarse tool to quickly get some
| base to stand on, ControlNet is where the human creativity
| again enters.
| emadm wrote:
| Yeah, we build ComfyUI so you can imagine what is coming
| soon around that.
|
| Need to add more stuff to my Soundcloud
| https://on.soundcloud.com/XrqNb
| 3cats-in-a-coat wrote:
| Text will be an important input channel for texture, sound
| type, voice type and so on. You can't just use input audio,
| that defeats the point of generating something new. You can't
| also only use MIDI, it still needs to know what sits behind
| those notes, what performance, what instrument. So we need
| multiple channels.
| numpad0 wrote:
| It's crazy that nobody cares. It seems to me that ML hype
| trends focus on denying skills and disproving creativity by
| denoising randoms into what are indistinguishable from human
| generation, and to me this whole chain of negatives don't seem
| to have proven its worth.
| JAlexoid wrote:
| LLMs allow people without certain skills to be creative in
| forms of art that are inaccessible to them.
|
| With Dalee - I can get an image of something I have in my
| head, without investing into watching hundreds of hours of
| Bob Ross(which I do anyway)
|
| With audio generators - I can produce music that is in my
| head, without learning how to play an instrument or paying
| someone to do it. I have to arrange it correctly, but I can
| put out a techno track without spending years in learning the
| intricacies.
| raincole wrote:
| For music perhaps. For sound effects I think text prompting is
| the rather good UI.
| bemmu wrote:
| Controlnet/img2img style where you can mimic a sound with
| your mouth and it then makes it realistic could also be
| usable.
| b0ner_t0ner wrote:
| But works great when you don't need much control, prompt
| example: "Free-jazz solo by tenor saxophonist, no time
| signature."
| gcanko wrote:
| I think it would be ideal if it could take the audio recording
| of humming or singing a melody together with a text prompt and
| spitting out a track that resembles it
| reissbaker wrote:
| This is incredibly good compared to SOTA music models (MusicGen,
| MusicLM). It looks like there's also a product page where you can
| subscribe to use it, similar to Midjourney:
| https://www.stableaudio.com/
|
| Sadly it's not open-weight and it doesn't look like there's an
| API (again like Midjourney): you subscribe monthly to generate
| audio in their UI, rather than having something developers can
| integrate or wrap.
| ex3ndr wrote:
| Thankfully you can train it at home, the bigger question is a
| data.
| nullandvoid wrote:
| I was hoping to use it to generate some sound effects to use in
| a game I'm working on - but looks like I need an "enterprise
| license" (https://www.stableaudio.com/pricing)
|
| Why does this have a different clause I wonder, and doesn't
| just fall under "In commercial products below 100,000 MAU"?
| emadm wrote:
| Different deal with the underlying data holders with revenue
| share etc
| emadm wrote:
| There is a CC licensed version soon plus API.
|
| Models are advancing very fast, will be quite the year for
| music.
| TillE wrote:
| I was briefly excited about the idea of generating sound effects,
| but those "footsteps" are incredibly bad.
| laborcontract wrote:
| I tried generating music on stableaudio.com and, yes, it's bad.
| However, given the blistering pace of developing in these
| models, I would not be surprised if these sound incredible in a
| year or two.
| berkes wrote:
| Everyone every time seems to assume a linear (or exponential)
| curve upwards.
|
| But what is the proof for that?
|
| I consider it far more likely that we had a breakthrough and
| now rushing towards the next plateau. Maybe are nearing that.
|
| Like in the curve of a PID controller. It's how most or many
| human improvements go.
| leodriesch wrote:
| I'd say most are thinking of Midjourneys success in image
| generation when talking about this kind of progress.
| berkes wrote:
| I'm too.
|
| But I still see no evidence that this keeps improving and
| not plateauing at some (current?) level.
| spacebanana7 wrote:
| The plateau we're heading for is getting professional human
| level output from these models with logarithmic progress.
|
| I suspect this is because the underlying production factors
| like compute, data & model design are steadily improving
| whilst humans have diminishing sensitivity to output
| quality.
|
| In the game of AI generated photorealistic images or
| history essays there's not much improvement left to make.
| Most humans are already convinced by the output of these
| things.
| andrewstuart wrote:
| I felt a great disturbance in the Force, as though all the music
| licensing lawyers in the USA all cried out at once.
| shon wrote:
| Perhaps the disturbance you feel is actually the RIAA moving
| their Death Star into firing range of Stability.ai
| emadm wrote:
| stableaudio.com is fully licensed, music is an interesting
| area
|
| https://www.musicbusinessworldwide.com/stability-ai-
| launches...
| kouteiheika wrote:
| Serious question, I'd genuinely like to know - why?
|
| You didn't license the images when training Stable
| Diffusion, and yet you did for Stable Audio? In both cases
| the training should either be fair use and legal without
| any licensing, or be infringing and need licensing. Why is
| audio different than images? Am I missing something here?
| emadm wrote:
| Law for music is different to other media types
| andbberger wrote:
| wake me up when it can write a fugue
| alacritas0 wrote:
| this can produce some pretty disturbing, but interesting music
| using the prompt "energetic music, violin, voice, orchestra,
| piano, minimalism, john adams, nixon in china":
| https://www.stableaudio.com/1/share/953f079e-d704-4138-904c-...
| FergusArgyll wrote:
| It reminds me a little of breath of the wild guardian music
| seydor wrote:
| Finally, some music from the future
| MrThoughtful wrote:
| So many questions ...
|
| They publish the code to train on your own music, but not the
| weights of their model? So you cannot just upload this thing to
| some EC2 instance and start creating your own music, correct?
|
| Is this the same as https://www.stableaudio.com?
| alacritas0 wrote:
| this sounds like progress, but it is still very bad except for
| highly repetitive music like the EDM examples they give, and
| even then, it still can't get tempo right
| nextworddev wrote:
| StabilityAI is just a marketing machine at this point that is
| praying for an acquisition, since the runway is diminishing
| qwertox wrote:
| I think we still need the step where the AI learns what a high
| quality sound library sounds like and then applies the previously
| learned abilities by triggering sounds of that library via MIDI.
|
| That way you'd get perfect audio quality with the creativity of a
| musical AI.
| eru wrote:
| How would MIDI get you eg a guitar being played dirty? Or some
| subtle echo that comes from recording in a bathroom?
| arrakeen wrote:
| the AI designs and controls the effects chain and mastering
| too
| qwertox wrote:
| It would use a sampler and for the subtle echo effect add a
| reverb to the bus.
|
| https://www.youtube.com/watch?v=EQdp2QLiSYQ&t=187s
| sebzim4500 wrote:
| You could have AI do some postprocessing. I think a similaar
| approach is the future for image generation, you have a model
| output a 3D scene, use a classical raytracer to do rendering
| and then have a final model apply corrections to achieve
| photorealism.
| jchw wrote:
| I've always wished for something like that for image generation
| AI. It'd be much cooler/more interesting to watch AI try to
| draw/paint pictures with strokes rather than just magically
| iterate into a fully-rendered image. I dunno what kind of
| dataset or architecture you could possibly apply to accomplish
| this, but it would be very interesting.
| AuryGlenz wrote:
| I get what you're saying, but if you watch Stable Diffusion
| do each step it's at least kind of similar. If you keep the
| same seed but change a detail, often the broad "strokes" are
| completely the same.
| 3ds wrote:
| Isn't that what suno.ai does?
| shon wrote:
| Interestingly, Ed Newton-Rex, the person hired to build Stable
| Audio, quit shortly after it was released due to concerns around
| copyright and the training data being used.
|
| He's since founded https://www.fairlytrained.org/
|
| Reference: https://x.com/ednewtonrex
| az226 wrote:
| That's an interesting take. But quite the odd stance since he
| joined Stability and the training of Stable Diffusion was well
| known.
| doctorpangloss wrote:
| For generative models, if the model authors do not publish the
| architecture of their model; and, the model uses a
| transformation from text to another kind of media; you can
| assume that they have delegated some part of their model to a
| text encoder or similar feature which is trained on data that
| they do not have an express license to.
|
| Even for rightsholders with tens of millions to hundreds of
| millions of library items like images or audio snippets, the
| performance of the encoder or similar feature in text-to-X
| generative models is too poor on the less than billion tokens
| of text in the large repositories. This includes Adobe's
| Firefly.
|
| It is also a misconception that large amounts of similar data,
| like the kinds that appear in these libraries, is especially
| useful. Without a powerful text encoder, the net result is that
| most text-to-X models create things that look or sound very
| average.
|
| The simplest way to dispel such issues is to publish the
| architecture of the model.
|
| But anyway, even if it were all true, the only reason we are
| talking about diffusers, and the only reason we are paying
| attention to this author's work Fairly Trained, is because of
| someone training on data that was not expressly licensed.
| sillysaurusx wrote:
| If you require licensing fees for training data, you kill
| open source ML.
|
| That's why it's important for OpenAI to win the upcoming
| court cases.
|
| If they lose, they'll survive. But it will be the end of open
| model releases.
|
| To be clear, I don't like the idea of companies profiting off
| of people's work. I just like open source dying even less.
| sillysaurusx wrote:
| Replying to a deleted comment:
|
| > It sounds as if you imply that would be bad. But what if
| it wasn't?
|
| Entirely possible. The early history of aviation was open
| source in the sense that many unlicensed people
| participated, and died. The world is strictly better with
| licensing requirements in place for that field.
|
| But no one knows. And if history is any guide for software,
| it seems better to err on freedoms that happen to have some
| downside rather then clamping down on them. One could
| imagine a world where BitTorrent was illegal. Or
| cryptography, or bitcoin.
| raverbashing wrote:
| Are you really comparing licensing for a profession with
| licensing of IP?
| sillysaurusx wrote:
| It's much the same. Only authorized people are allowed to
| do X. Since X costs a lot of money, by definition it
| can't be open source. There are no hobbyist pilots that
| carry passengers without a license, and if there are,
| they're quickly told to stop. Generative AI faces a real
| chance of having the same fate. Which means open source
| will look similar to these planes trying to compete with
| commercial aircraft: https://pilotinstitute.com/flying-
| without-a-license/
|
| If you can think of a better example, I'd like to know
| though. I'll use it in future discussions. It's hard to
| think of good analogies when the tech has new social
| effects.
| PeterStuer wrote:
| If I fly a plane and crash, my passengers die. If I
| generate an image using a model whose training included
| some unlicensed imagery... Disney misses out on a
| fraction of a cent?
|
| There is a real reason why some professions are licenced
| and others are not.
|
| Your analogy is nonsensical. Not having a better one is
| irrelevant.
| sillysaurusx wrote:
| If training data requires licensing fees, ML
| practitioners will become a licensed field de facto,
| because no one in the open source world will have the
| resources to pursue it on their own.
|
| Perhaps a better analogy is movies. At least with acting,
| you can make your own movies, even if you're on a
| shoestring budget. With ML, you quite literally can't
| make a useful model. There's not enough uncopyrighted
| data to do anything remotely close to commercial models,
| even in spirit.
| avisser wrote:
| > If training data requires licensing fees, ML
| practitioners will become a licensed field de facto,
|
| You know the word "license" has multiple, dissimilar
| meanings, right?
| deely3 wrote:
| > If you require licensing fees for training data, you kill
| open source ML.
|
| kill open source ML -> decrease speed of improvements for
| some open source ML
| sillysaurusx wrote:
| Sadly not. Making something illegal has social effects,
| not just legal effects. I've grown tired of being
| verbally spit on for books3. One lovely fellow even said
| that he hoped my daughter grows up resenting me for it.
|
| It being legal is the only guard against that kind of
| thing. People will still be angry, but they won't be so
| numerous. Right now everyone outside of AI almost
| universally despises the way AI is trained.
|
| Which means you won't be able to say that you do open
| source ML without risking your job. People will be angry
| enough to try to get you fired for it.
|
| (If that sounds extreme, count yourself lucky that you
| haven't tried to assemble any ML datasets and release
| them. The LAION folks are in the crosshairs for
| supposedly including CSAM in their dataset, and they're
| not even a dataset, just an index.)
| multjoy wrote:
| If everyone is unhappy with your rampant piracy, then
| perhaps that is a sign that you're doing it wrong?
| sillysaurusx wrote:
| Perhaps. The reason I did it was because OpenAI was doing
| it, and it's important for open source to be able to
| compete with ChatGPT. But if OpenAI's actions are ruled
| illegal, then empirically open source wasn't a persuasive
| enough reason to allow it.
| 4bpp wrote:
| Is there evidence that it's actually everyone or even
| close to everyone? The core innovation that the internet
| brought to harassment is that it is sufficient for some
| 0.0...01% of all people to take issue with you and be
| sufficiently dedicated to it for every waking minute of
| your life to be filled with a non-stop torrent of
| vitriol, as a tiny percentage of all internet users still
| amounts to thousands.
| viraptor wrote:
| US copyright has limited reach. There are models trained
| in China, where the IP rules are... not really enforced.
| It would be an interesting world where you use / pay for
| those models because you can't train them locally.
| JAlexoid wrote:
| > Right now everyone outside of AI almost universally
| despises the way AI is trained.
|
| I don't agree with this. Most people don't care at all,
| and at best people would argue about some form of
| compensation.
|
| Saying "everyone" is unsubstantiated.
|
| I mean... "Everyone was angry at Napster" at the same
| time "everyone is angry at the MPAA/RIAA"
| marcyb5st wrote:
| Is there a license that states: if you use this data for ML
| training you must open source model weights and
| architecture?
| sillysaurusx wrote:
| It's deeper than that. The basis of licensing is
| copyright. If the upcoming court cases rule in OpenAI's
| favor, you won't be able to apply copyright to training
| data. Which means you can't license it.
|
| Or rather, you can, but everyone is free to ignore you. A
| license without teeth is no license at all. The GPL is
| only relevant because it's enforceable in court.
|
| I'm sure some countries will try the licensing route
| though, so perhaps there you'd be able to make one.
|
| EDIT: I misread you, sorry. You're saying that if OpenAI
| loses and license fees become the norm, maybe people will
| be willing to let their data be used for open source
| models, and a license could be crafted to that effect.
|
| Probably, yes. But the question is whether there's enough
| training data to compete with the big companies that can
| afford to license much more. I'm doubtful, but it could
| be worth a try.
| JAlexoid wrote:
| >The GPL is only relevant because it's enforceable in
| court.
|
| The irony of GPL, is that it's validity with respect to
| users is only now tested in court.
|
| https://www.dlapiper.com/en/insights/publications/2024/01
| /sf...
| JoshTriplett wrote:
| > If you require licensing fees for training data, you kill
| open source ML.
|
| And likely proprietary ML as well, hopefully.
|
| (To be clear, I think AI is an absolutely incredible
| innovation, capable of both good and harm; I also think
| it's not unreasonable to expect it to play a safer, slower
| strategy than the Uber "break the rules to grow fast until
| they catch up to you" playbook.)
|
| I'm all for eliminating copyright. Until that happens, I'm
| utterly opposed to AI getting a special pass to ignore it
| while everyone else cannot.
|
| Fair use was intended for things like reviews, commentary,
| education, remixing, non-commercial use, and many other
| things; that doesn't make it appropriate for "slurp in the
| entire Internet and make billions remixing all of it at
| once". The commercial value of AI _should_ utterly break
| the four-factor test.
|
| Here's the four-factor test, as applied to AI:
|
| "What is the character of the use?" - Commercial
|
| "What is the nature of the work to be used?" - Anything and
| everything
|
| "How much of the work will you use?" - All of it
|
| "If this kind of use were widespread, what effect would it
| have on the market for the original or for permissions?" -
| Directly competes with the original, killing or devaluing
| large parts of it
|
| Literally every part of the four-factor test is maximally
| against this being fair use. (Open Source AI fails three of
| four factors, and then many _users_ of the resulting AI
| fail the first factor as well.)
|
| > If they lose, they'll survive.
|
| That seems like an open question. If they lose _these_
| court cases, setting a precedent, then there will be ten
| thousand more on the heels of those, and it seems
| questionable whether they 'd survive _those_.
|
| > To be clear, I don't like the idea of companies profiting
| off of people's work. I just like open source dying even
| less.
|
| You're positioning these as opposed because you're focused
| on the case of Open Source AI. There are a massive number
| of Open Source _projects_ whose code is being trained on,
| producing AIs that launder the copyrights of those projects
| and ignore their licenses. I don 't want Open Source
| projects serving as the training data for AIs that ignore
| their license.
| viraptor wrote:
| > "How much of the work will you use?" - All of it
|
| That depends on the interpretation of "use", and it would
| be interesting to read what lawyers think. You learned
| the language largely from speech and copyrighted works.
| (All the stories, books, movies, etc. you ever
| read/heard) When you wrote this comment did you use all
| of them for that purpose? Is the case of AI different?
|
| To be clear that's a rhetorical question - I don't expect
| anyone here to actually have a convincing enough argument
| either way.
| JoshTriplett wrote:
| Principles applied to human brains are not automatically
| applicable to AI training. To the best of my knowledge,
| there's no particular law that says a human brain is
| exempt from copyright, but it empirically is, because the
| alternative would be utterly unreasonable. No such
| exemption exists for AI training, nor should it.
|
| Ideas/works/etc _literally_ live rent-free in your head.
| That doesn 't mean they should live rent-free in an AI's
| neural network.
|
| Changing that should involve _actually reducing or
| eliminating copyright_ , for everyone, not giving a
| special pass to AI.
| JAlexoid wrote:
| > To the best of my knowledge, there's no particular law
| that says a human brain is exempt from copyright, but it
| empirically is, because the alternative would be utterly
| unreasonable.
|
| Human brain most definitely is not exempt. If you read
| Lord of the Rings and then write down a new book, with
| the same characters and same story line - that's plain
| copying(lookup the etymology of the verb to copy). If you
| look at a painting and paint a very similar painting -
| that's still copying.
|
| Human brains are the reason we have copyright. Your
| recital of passages from any copyrighted book would
| violate the copyright, if not for fair use doctrine. And
| it has nothing to do with whether you do it yourself, or
| have a TTS engine produce the sound.
| JoshTriplett wrote:
| The human brain is absolutely exempt, insofar as the copy
| _stored in your brain_ does not make your brain subject
| to copyright, even if a subsequent work you produce might
| be. Nobody 's filing copyright infringement claims over
| people's memories in and of themselves.
|
| I'm saying that AI does not and should not automatically
| get the exception that a human brain does.
| sillysaurusx wrote:
| It's not so clear cut. Many lawyers believe all that
| matters is whether the _output_ of the model is
| infringing. As much as people love to cite ChatGPT
| spitting out code that violates copyright, the vast
| majority of the outputs do not. Those that do, are
| quickly clamped down on -- you'll find it hard to get
| Dalle to generate an image of anything Nintendo related,
| unless you're using crafty language.
|
| There's also the moral question. Should creators have the
| right to prevent their bits from being copied at all?
| Fundamentally, people are upset that their work is being
| used. But "used" in this case means "copied, then
| transformed." There's precedent for such copying and
| transformation. Fair use is only one example. You're
| allowed to buy someone's book and tear it up; that copy
| is yours. You can also download an image and turn it into
| a meme. That's something that isn't banned either. The
| question hinges on whether ML is quantitatively
| different, not qualitatively different. Scale matters,
| and it's a difference of opinion whether the scale in
| this case is enough to justify banning people from
| training on art and source code. The courts' opinion will
| have the final say.
|
| The thing is, I basically agree with you in terms of what
| you want to happen. Unfortunately the most likely outcome
| is a world where no one except billion dollar
| corporations can afford to pay the fees to create useful
| ML models. Are you sure it's a good outcome? The chance
| that OpenAI will die from lawsuits seems close to nil.
| Open source AI, on the other hand, will be the first on
| the chopping block.
| bryanrasmussen wrote:
| >Those that do, are quickly clamped down on -- you'll
| find it hard to get Dalle to generate an image of
| anything Nintendo related, unless you're using crafty
| language.
|
| really it seems more like someone was afraid of angering
| Nintendo who is a corporate adversary one does not like
| to fight and thus it has a bunch of blocks to keep from
| generating anything that offends Nintendo, that does not
| really translate to quickly and easily stopping and
| blocking offending generations across every copyrighted
| work in the world.
| raphman wrote:
| > Many lawyers believe all that matters is whether the
| output of the model is infringing.
|
| What I don't understand (as a European with little
| knowledge of court decisions on fair use): with the same
| reasoning you might make software piracy a case of 'fair
| use', no? You take stuff someone else wrote - without
| their consent - and use it to create something new. The
| output (e.g. the artwork you create with Photoshop) is
| definitely not copyrighted by the manufacturer of the
| software. But in the case of software piracy, it is not
| about the output. With software, it seems clear that the
| act of taking something you do not have the rights for
| and using it for personal (financial) gain is not covered
| by fair use.
|
| Why can OpenAI steal copyrighted content to create
| transformative works but I cannot steal Photoshop to
| create transformative works? What am I missing?
| JAlexoid wrote:
| > make software piracy a case of 'fair use'
|
| That's not a good example. Making a copy of a record you
| own(as an example ripping a audio CD to MP3) is
| absolutely fair use. Giving your video game to your
| neighbor to play - that's also fair use.
|
| Fair use is limited when it comes to
| transformative/derivative work. Similar laws are in place
| all over the world, just in US some of those come from
| case law.
|
| > With software, it seems clear that the act of taking
| something you do not have the rights for and using it for
| personal (financial) gain is not covered by fair use.
|
| > Why can OpenAI steal copyrighted content to create
| transformative works but I cannot steal Photoshop to
| create transformative works?
|
| That's not a good analogy. The argument, that is not
| settled yet, is that a model doesn't contain enough
| copyrightable material to produce an infringing output.
|
| Take your software example - you legally acquire Civ6,
| you play Civ6, you learn the concepts and the visuals of
| Civ6... then you take that knowledge and create a game
| that is similar to Civ6. If you're a copyright maximalist
| - then you would say that creating any games that mimic
| Civ6 by people who have played Civ6 is copyright
| infringement. Legally there are definitely lower limits
| to copyright - like no one owns the copyright to the
| phrase "Once upon a time", but there may be a copyright
| on "In a galaxy far far away".
| Ukv wrote:
| > Why can OpenAI steal copyrighted content to create
| transformative works but I cannot steal Photoshop to
| create transformative works? What am I missing?
|
| If Photoshop was hosted online by Adobe, you would be
| free to do so. It's copyrighted, but you'd have an
| implied license to use it by the fact it's being made
| available to you to download. Same reason search engines
| can save and present cached snapshots of a website (Field
| v. Google).
|
| In other situations (e.g: downloading from an unofficial
| source) you're right that private copying is (in the US)
| still prima facie copyright infringement. However, when
| considering a fair use defense, courts do take the
| distinction into strong consideration: "verbatim
| intermediate copying has consistently been upheld as fair
| use if the copy is 'not reveal[ed] . . . to the public.'"
| (Authors Guild v. Google)
|
| If you were using Photoshop in some transformative way
| that gives it new purpose (e.g: documenting the evolution
| of software UIs, rather than just making a photo with it
| as designed) then you may* be able to get away with
| downloading it from unofficial sources via a fair use
| defense.
|
| *: (this is not legal advice)
| dannyobrien wrote:
| So, fair use is seen as a balance, and generally the
| balance is thought of as being codified under four
| factors:
|
| https://www.copyright.gov/title17/92chap1.html#107
|
| (1) the purpose and character of the use, including
| whether such use is of a commercial nature or is for
| nonprofit educational purposes;
|
| (2) the nature of the copyrighted work;
|
| (3) the amount and substantiality of the portion used in
| relation to the copyrighted work as a whole; and
|
| (4) the effect of the use upon the potential market for
| or value of the copyrighted work.
|
| There's more detailed discussion here:
| https://copyright.columbia.edu/basics/fair-use.html
| navjack27 wrote:
| Dalle on Bing is happy to generate Mario and Luigi and
| Sonic and basically everybody from everybody without
| using crafty language so I'm unsure of what you're
| talking about.
| JAlexoid wrote:
| It would be interesting to see if courts agree that
| training+transforming = copying.
|
| If I paint a picture inspired by Starry Night(Van Gogh) -
| does that inherently infringe on the original? I looked
| at that painting, learned the characteristics, looked at
| other similar paintings and painted my own. I basically
| trained my brain. (and I mean the copyright, not the
| individual physical painting)
|
| And I mean cases where I am not intentionally trying to
| recreate the original, but doing a derivative(aka
| inspired) work.
|
| Because it's already settled that recreating the original
| from memory will infringe on copyright.
| magicalhippo wrote:
| > "What is the character of the use?" - Commercial
|
| Your first factor seems to not at all be like that which
| Stanford has in its guidelines[1], which they call the
| transformative factor:
|
| _In a 1994 case, the Supreme Court emphasized this first
| factor as being an important indicator of fair use. At
| issue is whether the material has been used to help
| create something new or merely copied verbatim into
| another work._
|
| LLMs mostly create something new, but sometimes seems to
| be able to regurgitate passages verbatim, so I can see
| arguments for and against, but to my untrained eyes
| doesn't seem as clear cut.
|
| [1]: https://fairuse.stanford.edu/overview/fair-use/four-
| factors/
| andybak wrote:
| Bear with me here. Rushed and poorly articulated post
| incoming...
|
| In the broadest sense, generative AI helps achieve the
| same goals that copyleft licences aim for. A future where
| software isn't locked away in proprietary blobs and users
| are empowered to create, combine and modify software that
| they use.
|
| Copyleft uses IP law against itself to push people to
| share their work. Generative AI aims to assist in writing
| (or generating) code and make sharing less neccesary.
|
| I argue that if you are a strong believer in the ultimate
| goals of copyleft licences you should also be supporting
| the legality of training on open source code.
| TheOtherHobbes wrote:
| The obvious difference is that copyleft is voluntary,
| while having your art style stolen isn't.
|
| If an artist approached a software developer, created a
| painting of them using their Mac, and said "There, I've
| done your job for you" you'd think they were an idiot.
|
| This is the same from the other side. The inability to
| understand why that's a realistic analogy does not change
| the fact that it is.
| llm_trw wrote:
| > The obvious difference is that copyleft is voluntary,
| while having your art style stolen isn't.
|
| What a curious type of theft where the author keeps their
| art and I get different art.
| webmaven wrote:
| "> The obvious difference is that copyleft is voluntary,
| while having your art style stolen isn't."
|
| This is why it is important whether you consider that
| infringement occurs upon ingestion or output. If it only
| matters for outputs, then artists have a problem, since
| copyright doesn't protect styles at all, see for example
| the entire fashion industry.
|
| There is a saving grace though: Artists can make a case
| that the association of their distinctive style with
| their name is at least potentially a violation of
| trademark or trade dress, especially if that association
| is being used to promote the outputs to the public. This
| is a fairly clear case of commercial substitution in the
| market for creating new works in that artist's style and
| creating confusion concerning the origin of the resulting
| work.
|
| Note that the market for creating new works in a
| particular artist's distinctive and named style kind of
| goes away upon the artist's passing. What remains is the
| trademark issue of whether a particular work was actually
| created by the artist or not, which existing trademark
| law is well suited to policing, as long as the trademark
| is defended, even past the expiration of the copyright.
|
| Meanwhile, trademark (and copyright) also apply to the
| _subjects_ of works, like Nintendo 's Mario or Disney's
| Mickey Mouse or Marvel's Iron Man. But we don't really
| want models to simply be forbidden from producing them as
| outputs, or they become useless as tools for the purpose
| of parody and satire, not to mention the ability to
| create non-commercial fan art. The potential liability
| for violating these trademarks by _publishing_ works
| featuring those characters rests with the users rather
| than the tools, though, and again existing law is fairly
| well suited to policing the market. Similarly,
| celebrities ' right of publicity probably shouldn't
| prevent models from learning what they look like or from
| making images that include their likeness when prompted
| with their name, but users better be prepared to justify
| publishing those results if sued.
|
| You can also make the (technical) argument that if you
| just ask for an image of Wonder Woman, and you get an
| image that looks like Gal Gadot as Wonder Woman, that the
| model is overfitting. That's also the issue with the
| recent spate of coverage of Midjourney producing near-
| verbatim screenshots from movies.
|
| It might be appropriate though to regulate commercial
| generative AI services to the extent of requiring them to
| warn users of all the potential copyright/trademark/etc.
| violations, if they ask for images of Taylor Swift as
| Elsa, or Princess Peach, or Wonder Woman, for example.
| kmeisthax wrote:
| The majority of AI models out there (at least by
| popularity / capability) are proprietary; with weights
| and even model architectures that are treated as trade
| secret. Instead of having human-written music and movies
| that you legally can't copy, but practically can; you now
| have slop-generating models that live on a cloud server
| you have no control over. Artists and programmers who
| want to actually publish something - copyright or no -
| now have to compete with AI spam on search engines, while
| ChatGPT gets to merely be "confidently wrong" because it
| was built on the Internet equivalent of low-background
| metal - pre-AI training data. Generative AI is not a road
| that leads to less intellectual property[0], it's just an
| argument for reappropriating it to whoever has the
| fastest GPUs.
|
| This is contrary to the goals of the Free Software
| movement - and also why Free Software people were the
| first to complain about all the copying going on. One of
| the things Generative AI is really good at is plagiarism
| - i.e. taking someone else's work and "rewriting it" in
| different words. If that's fair use, then copyleft is
| functionally useless.
|
| It's important to keep in mind the difference between
| violating the letter of the law and opposing the business
| interests of the people who wrote the law. Copyleft and
| share-alike clauses have the intention of getting in the
| way of copyright as an institution, but it also relies on
| copyright to work, which is why the clauses have power
| even though they violate the spirit of copyright.
| Generative AI might violate the letter of the law, but
| it's very much in the spirit of what the law wants.
|
| [0] Cory Doctorow: "Intellectual property is any law that
| allows you to dictate the conduct of your competitors"
| CaptainFever wrote:
| Is FSF's stance on AI actually clear? I thought they were
| just upset it was made by Microsoft.
|
| Creative Commons has been fairly pro-AI -- they have been
| quite balanced, actually, but they do say that opt-in is
| not acceptable, it should be opt-out at most. EFF is
| fairly pro AI too -- at least, against using copyright to
| legislate against it.
|
| You shouldn't discount progress in the open model
| ecosystem. You can sort of pirate ChatGPT by fine tuning
| on its responses, there's GPU sharing initiatives like
| Stable Horde, there's TabbyML which works very well
| nowadays, and Stable Diffusion is still the most advanced
| way of generating images. There's very much of an anti-IP
| spirit going on there, which is a good thing -- it's what
| copyleft is there for in sprit, isn't it?
| Ukv wrote:
| > Fair use was intended for things like reviews,
| commentary, education, remixing, non-commercial use, and
| many other things
|
| "many other things" has included, for example, Google
| Books scanning millions of in-copyright books, storing
| internally them in full, and making snippets available.
|
| The basis for copyright itself is to "promote the
| progress of science and useful arts". For that reason a
| key consideration of fair use, which you've skipped
| entirely, is the transformative nature of the new work.
| As in Campbell v. Acuff-Rose Music: "The more
| transformative the new work, the less will be the
| significance of other factors", defined as "whether the
| new work merely 'supersede[s] the objects' of the
| original creation [...] or instead adds something new".
|
| > "How much of the work will you use?" - All of it
|
| For the substantiality factor, courts make the
| distinction between intermediate copying and what is
| ultimately made available to the public. As in Sega v.
| Accolade: "Accolade, a commercial competitor of Sega,
| engaged in wholesale copying of Sega's copyrighted code
| as a preliminary step in the development of a competing
| product" yet "where the ultimate (as opposed to direct)
| use is as limited as it was here, the factor is of very
| little weight". Or as in Authors Guild v. Google:
| "verbatim intermediate copying has consistently been
| upheld as fair use if the copy is 'not reveal[ed] . . .
| to the public.'"
|
| The factor also takes into account whether the copying
| was necessary for the purpose. As in Kelly v. Arriba
| Soft: "If the secondary user only copies as much as is
| necessary for his or her intended use, then this factor
| will not weigh against him or her"
|
| While there are still cases of overfitting resulting in
| generated outputs overly similar to training data, I
| think it's more favorable to AI than simply "it trained
| on everything, so this factor is maximally against fair
| use".
|
| > Directly competes with the original, killing or
| devaluing large parts of it
|
| The factor is specifically the effect of the use upon the
| work - not the extent to which your work would be
| devalued even if it had not been trained on your work.
| TheOtherHobbes wrote:
| None of those arguments make sense. The output of AI
| absolutely does supersede the objects of the original
| creation. If it didn't, artists wouldn't care that they
| were no longer able to make a living.
|
| Substantiality of code does not apply to substantiality
| of style. What's being copied is _look and feel_ , which
| is very much protected by copyright.
|
| The copying clearly is necessary for the purpose. No
| copying, no model. The fact that the copying is then
| compressed after ingestion doesn't change the fact that
| it's necessary for the modelling process.
|
| Last point - see first point.
|
| IANAL, but if I was a lawyer I'd be referring back to
| look and feel cases. It's the _essence_ of an artist 's
| look and feel that's being duplicated and used for
| commercial gain without a license.
|
| That's true whether it's one artist - which it can be,
| with added training - or thousands.
|
| Essentially what MJ etc do is curate a library of looks
| and feels, and charge money for access.
|
| It's a little more subtle than copying fixed objects, but
| the principle remains the same - original work is being
| copied and resold.
| Ukv wrote:
| > None of those arguments make sense. The output of AI
| absolutely does supersede the objects of the original
| creation. If it didn't, artists wouldn't care that they
| were no longer able to make a living.
|
| The question for transformative nature is whether it
| _merely_ supersedes or instead adds something new. E.G:
| Google translate was trained on books /documents
| translated by human translators and may in part displace
| that need, but adds new value in on-demand translation of
| arbitrary text - which the static works it was trained on
| did not provide.
|
| > Substantiality of code does not apply to substantiality
| of style.
|
| I'm not certain what you're saying here.
|
| > The copying clearly is necessary for the purpose. No
| copying, no model.
|
| Which, for the substantiality factor, works in favor of
| the model developers.
|
| > It's the essence of an artist's look and feel that's
| being duplicated and used for commercial gain without a
| license.
|
| Copyright protects works fixed in a tangible medium, not
| ideas in someone's head. It would protect _a work 's_
| look/appearance (which can be an issue for AI when
| overfitting causes outputs that are substantially similar
| to a protected work), but not style or " _an artist 's_
| look and feel".
| JAlexoid wrote:
| > What's being copied is _look and feel_, which is very
| much protected by copyright.
|
| If that were the case, no one would be able to paint any
| cubist paintings. (Picasso estate would own the
| copyright, to this day)
|
| It's not that clear cut, there are a lot of nuances.
| UncleEntity wrote:
| Ironically, Picasso was notorious for copying other
| artist's 'look and feel'...
| JoshTriplett wrote:
| > "many other things" has included, for example, Google
| Books scanning millions of in-copyright books, storing
| internally them in full, and making snippets available.
|
| That succeeds on a different part of the four-factor
| test, the degree to which it competes with / affects the
| market for the original.
|
| Google Books is not automatically producing new books
| derived from their copies that compete with the original
| books.
| Ukv wrote:
| > That succeeds on a different part of the four-factor
| test, the degree to which it competes with / affects the
| market for the original
|
| It satisfied multiple parts of the four-factor test. It
| was found satisfy the first factor due to being "highly
| transformative", the second factor was considered not
| dispositive is isolation and favoring Google when
| combined with its transformative purpose, and it
| satisfied the third factor as the usage was "necessary to
| achieve that purpose" - with the court making the
| distinction between what was copied (lots) and what is
| revealed to the public (limited snippets).
|
| As you had all factors as "maximally against" fair use,
| do you believe that AI is significantly less
| transformative than Google Books? I'd say even in cases
| where the output is the same format as the content it was
| trained on, like Google Translate, it's still generally
| highly transformative.
|
| > the degree to which it competes with
|
| Specifically, to be pedantic, it's the effect _of the use
| /copying_ of the original copyrighted work.
| fenomas wrote:
| Where this argument falls down for me is that "use"
| w.r.t. copyright means copying, and neither AI models nor
| their outputs include any material copied from the
| training data, in any usual sense. (Of course the inputs
| are copied during training, but those copies seem clearly
| ephemeral.)
|
| Genuinely curious: for anyone who thinks AI obviously
| violates copyright, how do you resolve this? E.g. do you
| think the violation happens during training or inference?
| And is it the trained model, or the model output, that
| you think should be considered a derived work?
| frabcus wrote:
| Personally I think trained models are derived works of
| all the training data.
|
| Just like a translation of a book is a derived works of
| the original. Or a binary compiled output is a derived
| works of some source code.
| fenomas wrote:
| Wikipedia:
|
| > In copyright law, a derivative work is an expressive
| creation that includes major copyrightable elements of
| ... the underlying work
|
| A trained model fails that on two counts, doesn't it?
| Both the "includes" part, and the fact that a model is
| itself not an expressive work of authorship.
| thuuuomas wrote:
| Curating training data is an exercise in editorial
| judgement.
| JAlexoid wrote:
| You're trying to use words without the legal context
| here. The legal definition of words isn't 1-1 wit our
| colloquial usage.
|
| Translation of a book is non-transformative and retains
| the original author's artistic expression.
|
| As a counter example - if you write an essay about
| Picasso's Guernica painting, it is derivative according
| to our colloquial use of the term, but legally it's an
| original work.
| barrkel wrote:
| AI is a genie that you can't really stuff back into a
| bottle. It's out and it's global.
|
| If the US had tighter regulations, China or someone else
| will take over the market. If AI is genuinely
| transformative for productivity, then the US would just
| fall behind, sooner or later.
| beepbooptheory wrote:
| Then let them! If another country put forward tighter
| regulations to help actual people over and above the
| state that holds them, then that is good in itself, and
| either way will pay for itself. Why are we worried about
| China or whoever taking over the market of something that
| we see has bad effects?
|
| Like, we see this line everywhere now, and it simply
| doesnt make sense. At some point you just have to believe
| something, be principled. Treating the entire world as
| this zero sum deadlock of "progress" does nothing but
| prevent one from actually being critical about anything.
|
| This would-be Oppenheimer cosplay is growing really old
| in these discussions.
| iamsaitam wrote:
| The point should be to kill training on unlicensed
| material. There needs to be regulation and tools to
| identify what was the training data. But as always, first
| comes the siphoning part, the massive extraction of value,
| then when the damage is done there will be the slow moving
| reparations and conservationism.
| cornel_io wrote:
| A ton of us out here don't agree with your goals. I think
| these models are transformative enough that the value
| added by organizing and extracting patterns from the data
| outweighs the interests of the extremely diffuse set of
| copyright holders whose data was ingested. So regardless
| of the technical details of copyright law (which I
| _still_ think are firmly in favor of OpenAI et al) I
| would strongly opposed any effort to tighten a legal
| noose here.
| JAlexoid wrote:
| Agreed. And every software engineer writing code should
| pay 10% of their salary to the publishers of the books
| that they learned their programming skills from.
| dkjaudyeqooe wrote:
| That makes no sense. OpenAI must lose and it must not be
| possible to have proprietary models based on copyrighted
| works. It's not fair use because OpenAI is profiting from
| the copyright holders work and substituting for it while
| not giving them recompense.
|
| The alternative is that any models widely trained on
| copyrighted work are uncopyrightable and must be disclosed,
| along with their data sources. In essence this is forcing
| all such models to be open. This is the only equitable
| outcome. Any use of the model to create works has the same
| copyright issues as existing work creation, ie if
| substantially replicates an existing work it must be
| licenced.
| sillysaurusx wrote:
| For what it's worth, I agree with your second paragraph.
| But it would take legislation to enforce that. For now,
| it's unclear that OpenAI will lose. Quite the opposite;
| I've spoken with a few lawyers who believe OpenAI is on
| solid legal footing, because all that matters is whether
| the model's _output_ is infringing. And it's not. No one
| reads books via ChatGPT, and Dalle 3 has tight controls
| preventing it from generating Pokemon or Mario.
|
| All outcomes suck. The trick is to find the outcome that
| sucks the least for the majority of people. Maybe the
| needs of copyright holders will outweigh the needs of
| open source, but it's basically guaranteed that open
| source ML will die if your first paragraph comes true.
| dr_dshiv wrote:
| Proposal: revenue from Generative AI should be taxed 10%
| for an international endowment for the arts. In exchange,
| copyright claims are settled.
| Filligree wrote:
| With a minimum rate, such that no-one can pretend they're
| getting no income from it.
|
| We might apply that as a $5000 or so surcharge on AI
| accelerators capable of running the models, such as the
| 4090.
| dkjaudyeqooe wrote:
| > But it would take legislation to enforce that.
|
| Absolutely true. That's the end game and we should be
| working toward influencing that. It's within our power.
|
| > I've spoken with a few lawyers who believe OpenAI is on
| solid legal footing
|
| No one knows anything, this is too novel, and even if
| OpenAI gets some fair use ruling, it will be inequitable
| and legislation is inevitable. OpenAI is between a rock
| and a hard place here. If you read the basis for fair use
| and give each aspect serious consideration, as a judge
| should do, I can't see it passing fair use muster. It's
| not a case of simply reproducing work, which in unclear
| here, it's the negative effect on copyright holders, and
| that effect is undeniable.
|
| > All outcomes suck.
|
| I don't think so. It's possible to fashion something
| equitable, but people other than the corporations have to
| get involved.
| Joeri wrote:
| Just because something is not copyrightable doesn't
| automatically mean it must be disclosed. If weights
| aren't copyrightable (and I don't think they should be,
| as the weights are not a human creation), commercial AI's
| just get locked behind API barriers, with terms of usage
| that forbid cloning. Copyright then never enters the
| picture, unless weights get leaked.
|
| Whether or not that's equitable is in the eye of the
| beholder. Copyright is an artificial construct, not a
| natural law. There is nothing that says we must have it,
| or we must have it in its current form, and I would argue
| the current system of copyright has been largely harmful
| to creativity for a long time now. One of the most
| damning statements I've read in this thread about the
| current copyright system is how there's simply not enough
| unlicensed content to train models on. That is the bed
| that the copyright-holding corporations have made for
| themselves by lobbying to extend copyright to a century,
| and it all but assured the current situation.
| dkjaudyeqooe wrote:
| > Just because something is not copyrightable doesn't
| automatically mean it must be disclosed.
|
| No I'm saying that's what they law should be, because
| models can be built and used without anyone knowing. If
| it's illegal not to disclose them you can punish people.
|
| Copyright is something that protects the little guy as
| much as big corps. But the former has more to lose as a
| group in the world of AI models, and they will lose
| something here no matter what happens.
| Hoasi wrote:
| > I would argue the current system of copyright has been
| largely harmful to creativity for a long time now
|
| I'd love to hear that argument.
|
| How has the current system of copyright been harmful to
| creativity?
| benreesman wrote:
| The reality is always a dynamic tension between law,
| regulation, precedent, and enforceability.
|
| It is possible to strangle OpenAI without strangling AI:
| pmarca is anti-OpenAI in print, but you can bet your butt
| he hopes to invest in whatever replaces it, and he's got
| access to information that like, 10 people do.
|
| A useful example would be the Napster Wars: the music
| industry had been rent seeking (taking the fucking piss
| really) for decades and technology destroyed the free ride
| one way or another. The public (led by the
| technical/hacker/maker public) quickly showed that short of
| disconnecting the internet, we were going to listen to the
| 2 good songs without buying the 8 shitty ones. The
| technical public doesn't flex its muscles in a unified way
| very often, but when it does, it dictates what is and isn't
| on the menu.
|
| The public wants AI, badly. They want it aligned by them
| within the constraints of the law (which is what "aligned"
| should mean to begin with).
|
| The public is getting what it wants on this: you can bet
| the rent. Whether or not OpenAI gets on board or gets run
| the fuck over is up to them.
|
| "You in the market for a Tower Records franchise Eduardo?"
| emadm wrote:
| a16z are investors in openai
| benreesman wrote:
| I'd look again:
| https://twitter.com/pmarca/status/1756803719327621141
| pk-protect-ai wrote:
| I would say that GPT-3 and its successors have nothing to
| do with open source, and if OpenAI uses open source as a
| shield, then we are all doomed. I would distance myself and
| any open source projects from involvement in OpenAI court
| cases as far as possible. Yes, they have delivered some
| open source models, but not all of them. Their defense must
| revolve around fair use and purchased content if they use
| books and materials that were never freely available. It
| should be permissible to purchase a book or other materials
| once and use them for the training of an unlimited number
| of models without incurring licensing fees.
| chasing wrote:
| > If you require licensing fees for training data, you kill
| open source ML.
|
| This is another one of those "well if you treat the people
| fairly it causes problems" sort of arguments. And: Sorry.
| If you want to do this you have to figure out how to do it
| ethically.
|
| There are all sorts of situations where research would go
| much faster if we behaved unethically or illegally.
| Medicine, for example. Or shooting people in rockets to
| Mars. But we can't live in a society where we harm people
| in the name of progress.
|
| Everyone in AI is super smart -- I'm sure they can chin-
| scratch and figure out a way to make progress while
| respecting the people whose work they need to power these
| tools. Those incapable of this are either lazy, predatory,
| or not that smart.
| sillysaurusx wrote:
| "Ethical" in this case is a matter of opinion. The whole
| point of copyright was to promote useful sciences and
| arts. It's in the US constitution. You don't get to
| control your work out of some sense of fairness, but
| rather because it's better for the society you live in.
|
| As an ML researcher, no, there's basically no way to make
| progress without the data. Not in comparison with billion
| dollar corporations that can throw money at the licensing
| problem. Synthetic data is still a pipe dream, and
| arguably still a copyright violation according to you,
| since traditional models generate such data.
|
| To believe that this problem will just go away or that we
| can find some way around it is to close one's eyes and
| shout "la la la, not listening." If you want to kill open
| source AI, that's fine, but do it with eyes open.
| chasing wrote:
| Yes, it's true that open source projects that cannot pay
| to license content owned by other people are at a
| disadvantage versus those who can. Open source projects
| cannot, for example, wholly copy code owned by other
| people.
|
| Also, beware of originalist interpretations of the
| Constitution. I believe there's been about 250 years of
| law clarifying how copyright works, and, not to beat a
| dead horse, I don't think it carves out a special
| exception for open source projects.
| silviot wrote:
| > But anyway, even if it were all true, the only reason we
| are talking about diffusers, and the only reason we are
| paying attention to this author's work Fairly Trained, is
| because of someone training on data that was not expressly
| licensed.
|
| Thanks for putting this into words. I'm of the same opinion
| and this is the best articulation I have so far.
| prmoustache wrote:
| Not that it would have stopped the company for doing it anyway,
| but couldn't he think about that before working from them?
|
| Or did he needed that as it i part of the business model of his
| certfications?
| emadm wrote:
| It's a complex topic and perceptions change.
|
| Ed still likes Stability, especially as we fully trained
| stable audio on rights licensed data (bit different in audio
| to other media types), offer opt out of datasets etc.
| ImprobableTruth wrote:
| Calling him "the person hired to build Stable Audio" seems a
| bit misleading? He was in a executive position (VP of product
| for Stability's audio group). An important position, but
| "person hired to build" to me evokes the image of lead
| developer/researcher.
|
| I think that also helps in understanding his departure, since
| he's a founder with a music background.
| a_vanderbilt wrote:
| It isn't unusual for those in leadership positions to use
| such phrasing when talking about projects and products. It's
| not a "taking credit" from the engineers sort of thing, but
| rather about the leadership of the engineers.
| ARandomerDude wrote:
| Person A gets hired to write the software that is the
| company's actual product.
|
| Person B gets hired to observe Person A working, check
| email, and be the audio output buffer for Jira.
|
| Person B says "I built this."
|
| That's dishonesty no matter what the titles are or how
| important the emails were.
| Zetaphor wrote:
| Managing a group of people is not synonymous with doing the
| actual knowledge work of researching and developing
| innovations that enabled this technology. I find it hard to
| believe that the contribution of his management somehow
| uniquely enabled this group of engineers to create this
| using their experience and expertise.
|
| A captain may steer the ship, but they're not the one
| actually creating and maintaining the means by which it
| moves.
| shon wrote:
| Agreed. Leadership can sometimes bring actual value ;)
|
| And to be clear, I'm not sure Ed would call himself that.
| Those are my words, not his.
| gcanko wrote:
| There has to be a solution for the copyright roadbloacks that
| companies encounter when training models. I see it no different
| than an artist creating music which is influenced by the music
| the artist has been listening throughout his whole life,
| fundementally it's the exact same thing. You cannot create
| music or art in general in a vacuum
| jpc0 wrote:
| > Warning: This website may not function properly on Safari. For
| the best experience, please use Google Chrome
|
| Do better
| popalchemist wrote:
| Have you ever heard of an MVP?
| prmoustache wrote:
| That would be pertinent if it wasn't just a static web page
| with just text and some audio files to be played.
| zamadatix wrote:
| Reading about it, that ironically seems to be the exact
| problem Safari has. I mean the page "works" in Safari it's
| just you get these really random delays to the start of
| some of the sounds with all sorts of web discussion threads
| saying different ways to mitigate it on different
| platforms. I don't really fault them for having the goal to
| publish a paper and go the extra bit to make a friendly but
| imperfect webpage instead of being website creators who
| happen to publish papers on the side.
| pmontra wrote:
| By the way, it does work on Firefox Android. No idea of what
| there is in Safari that's not standard in Chrome and Firefox.
| Aachen wrote:
| ...and recommend Firefox
|
| is what you meant to say right? :)
| lopkeny12ko wrote:
| > We append "high-quality, stereo" to our sound effects prompts
| because it is generally helpful.
|
| It's hilarious that we've discovered you can get better outputs
| from LLMs by simply nicely telling it to generate better results.
| nine_k wrote:
| Maybe sometimes you want an old cassette sound, or even older
| scratched 78 rpm sound, etc. Computers, as usual, do what you
| asked them to do, not what you meant.
| ttul wrote:
| I find it interesting that they are releasing the code and lovely
| instructions for training, but no model. They are almost begging
| anonymous folks to hook the data loader up to an Apple Music
| account and go nuts. Not that I am suggesting anyone do that.
| zamadatix wrote:
| Speculatively it might have been part of an agreement with they
| were given the licensed stock audio library from AudioSparx to
| train on they wouldn't redistribute the resulting model.
| jsiepkes wrote:
| > Warning: This website may not function properly on Safari. For
| the best experience, please use Google Chrome.
|
| We've come full circle with the 90's and Internet Explorer. Well
| I guess this time the dominant browser is opensource so that's
| atleast something...
|
| Can someone please create an animated GIF button for Chrome which
| says: "Best viewed with Google Chrome"?
| Maxion wrote:
| Chrome isn't open source, chromium is. Best not to confuse the
| two.
| schleck8 wrote:
| Chrome and Chromium are virtually identical except for Google
| services, which aren't required to do anything with the
| browser except for installing Chrome extensions that can
| alternatively be sideloaded, so this is nitpicking.
| berkes wrote:
| It's essential nitpicking
| urbandw311er wrote:
| Jumping in to defend parent comment, there's nothing Open
| Source about Google Chrome and it's highly relevant in this
| context because they are notorious for putting technologies
| and tracking in there that many people find objectionable.
| forgotusername6 wrote:
| Tangential, but I tried to build chromium the other day but
| stopped when it said it required access to Google cloud
| platform to actually build it. If something requires a
| proprietary build system, does it matter that it's open
| source?
| nolist_policy wrote:
| That is not true. See every distribution packaging
| chromium.
|
| In particular, this package[1] by openSUSE builds
| completely offline. Many other distributions require
| packages to build offline.
|
| [1] https://build.opensuse.org/package/show/network:chrom
| ium/chr...
| forgotusername6 wrote:
| I think I got my wires crossed with ChromiumOS which when
| I last read the docs seemed to suggest that Google cloud
| platform was required. I now can't find those specific
| docs either so I retract my statement.
| squeaky-clean wrote:
| Don't forget media DRM built into Chrome but not Chromium.
| m463 wrote:
| I found this article to explain it well:
|
| https://www.lifewire.com/chromium-and-chrome-
| differences-417...
|
| and there is a further ungoogled-chromium:
|
| https://en.wikipedia.org/wiki/Ungoogled-chromium
| superb_dev wrote:
| Website works fine on safari too, I didn't notice any issues
| nness wrote:
| Same, I wonder what issue they thought they had...
| earthnail wrote:
| Safari is known to be troublesome when a webpage contains
| many HTML audio players. It can get extremely slow and
| unresponsive.
|
| Every researcher I know in the audio domain uses Chrome for
| exactly that reason. The alternative would be not to use
| the standard HTML audio tag which would be ridiculous.
| IndisciplineK wrote:
| > Can someone please create an animated GIF button for Chrome
| which says: "Best viewed with Google Chrome"?
|
| Here you go:
|
| <img src="data:image/gif;base64,R0lGODlhWAAfAKIAAAAAAP///zNm/zO
| ZM//MM/8zM8zMzP///yH/C05FVFNDQVBFMi4wAwEAAAAh+QQFZAAHACwAAAAAWA
| AfAAAD/wi63P4wyklnuDjrzbv/YNgpQWGehaGubOu+cCzP81KiJr02uqLvlYnBh
| sv9fEMADXmEFAuRJKBELZR+SRVyAVRym40n6iGtVrG8rI/J7DHETx7RnGpqvYy7
| Hr2Ai/NzGXVuem2FSnwAfnBcREWJbI2RiYt/ayRPWJqbQANPGgShoqGXU1anV5y
| qQDAKA54nFwKzsxejpHimdC9beKsthjuvsCYBtMcBt6RlqKe8iMG/WbzDsMbHyM
| q5VILPh3fQvr2IUuTA1cXY2bfbmc+9auLy8dMuANWe1+oCyezMj+/ClZtX6lK9c
| +j0qes3qt2FYoPskTPIwsGeb9TwKcTGUJuUQys3YkwqtyfSOHMV8b3aWEsZgY83
| IkqbeUelLGQdcTHTIJPmRT4qV2YY4LJUTBR2hnyDt2lBUJXajLpbkusgU01Onw4
| rKrWKEapS6EmKJjKr1qhdT32t4UWpQXhkA957ijZtzERh6el10wBqQ4uBMPQsW1
| UvRb4OqnmE8A8HH1bT3qKUEUSII8c+M/u0Ubmz58+gQ4seTXpBAgAh+QQFyAAHA
| CwNAAMAKwAZAAADf2i63P4wyvkArSBfZ/fqHhcaIJOB55d26VQqKPnJsBxLrBbv
| s3VqOBPN1iu+XMUaTzk8YoAtElCqg01HHid2E916v+DwNQz5+bRka2OcNr3M6hz
| 0R1Xp3jnqOZ6X+vVreVNzf4RmXYV7an6EjCVjhiiCfXeBK5NujZp0bZ2eDAkAOw
| ==">
|
| Edit: View the button:
| https://indiscipline.github.io/post/best-viewed-in-google-ch...
| rikafurude21 wrote:
| Surprised to see an actual gif pop up after adding that to a
| site. I guess thats just base64, still kind of amazing that
| its all inside a seemingly random string of text
| IndisciplineK wrote:
| By the way, you can simply paste the base64-encoded data
| (everything inside the quotes) into your address bar to
| view it. Probably not the safest action generally, but
| should be OK if it's an image.
| ecmascript wrote:
| Just a few days ago I was down voted for stating AI will be
| better in creating music than human would be:
| https://news.ycombinator.com/item?id=39273380#39273532
|
| Now this is released and now I feel I got grist to my mill.
|
| Sure it still kind of sucks, but it's very impressive for a
| _demo_. Remember that this tech is very much in it's infancy and
| it's very impressive already.
| larschdk wrote:
| I don't find this music to be good in any way. It sounds
| interesting over a few notes, but then completely fails to find
| any kind of progression that goes anywhere interesting, never
| iterating on the theme, never teasing you with subtle or
| surprising variation over a core theme, no built-ups or clear
| resolution. Very annoying to actually listen to.
| webprofusion wrote:
| Music is perfect for AI generation using trained models, because
| artists have been copying each other for at least the past 100
| years and having a computer do it for you is only notionally
| different. Sure a computer can never truly know your pain, but it
| can copy someone else's.
| kleiba wrote:
| My son suggested to play "Calm meditation music to play in a spa
| lobby" and "Drum solo" at the same time - sounds pretty good,
| actually...
| PeterStuer wrote:
| "Gen AI is the only mass-adoption technology that claims it's Ok
| to exploit everyone's work without permission, payment, or
| bringing them any other benefit."
|
| Is it? What about the printing press, photography, the copier,
| the scanner ...
|
| Sure, if a commercial image is used in a commercial setting,
| there is a potential legal case that could argue about
| infringement. This should NOT depend on the production means, but
| on the merit of the comparisons of the produced images.
|
| Xerox should not be sued because you can use a copier to copy a
| book (trust me kids, book copying used to be very, very big).
|
| Art by its social nature is always derivative, I can use
| diffusion models to create uncontestably original imagery. I can
| also try to get them to generate something close to an image in
| the training set if the model was large enough compared to the
| training set or the work just realy formulaic. However. It would
| be far easier and more efficient to just Google the image in the
| first place and patch it up with some Photoshop if that was my
| goal.
| wnkrshm wrote:
| But the social nature of art also means that humans give the
| originator and their influences credit - of course not the
| entire chain but at least the nearest neighbours of influence.
| While a user of a diffusion generator does not even know the
| influences unless specifically asked for.
|
| Shoulders of giants as a service.
| webmaven wrote:
| _> Xerox should not be sued because you can use a copier to
| copy a book (trust me kids, book copying used to be very, very
| big)._
|
| The appropriate analogy here isn't suing Xerox, but suing
| Kinko's (now FedEx Office).
|
| And it isn't just books, but other sorts of copyrighted
| material as well, such as photographs, which are still an
| issue.
| haswell wrote:
| > _Art by its social nature is always derivative, I can use
| diffusion models to create uncontestably original imagery_
|
| How are you defining "uncontestably original" here?
|
| The output could not exist if not for the training set used to
| train the model. While the process of deriving the end result
| is different than the one humans use when creating artwork, the
| end result is still derived from other works, and the degree of
| originality is a difference of degree, not of kind when
| compared to human output. (I acknowledge that the AI tool is
| enabled by a different process than the one humans use, but I'm
| not sure that a change in process changes the derivative nature
| of all subsequent output).
|
| As a thought experiment, imagine that assuming we survive,
| after another million years of human evolution, our brains can
| process imagery at the scale of generative AI models, and can
| produce derivative output taking into account more influences
| than any human could even begin to approach with our 2024
| brains.
|
| Is the output no longer derivative?
|
| Now consider the future human's interpretation of the work vs.
| the 2024 human's interpretation of the work. "I've never seen
| anything like this", says the 2024 human. "The influences from
| 5 billion artists over time are clear in this piece" says the
| future human.
|
| The fundamental question is: on what basis is the output of an
| AI model original? What are the criterion for originality?
| zamadatix wrote:
| Where was this quote pulled from? I can't find it in the site,
| paper, or code repo readmes for some reason. Did the HN link
| get changed?
| slicerdicer1 wrote:
| obviously someone shadowy and non-corporate (eg. an artist) just
| needs to come out and make a model which includes promptable
| artist/producer/singer/instrumentalist/song metadata.
|
| describing music without referring to musicians is so clunky
| because music is never labelled well. of course saying "disco
| house with funk bass and soulful vocals, uplifting" is going to
| be bland. Saying "disco house with nile rodgers rhythm guitar,
| michael mcdonald singing, and a bassline in the style of patrick
| alavi's power" is going to get you some magic
| ever1337 wrote:
| so this model can only ever understand music which is
| classified, described, labelled, standardized. and recombine
| those. sounds boring, sounds like the opposite of what (I would
| like to believe) people listen to music for, outside of a
| corporate stock audio context.
| gregorvand wrote:
| Not trying to knock the progress here, impressive. As a drummer,
| 'drum solo' is about as boring as it gets and some weird
| interspersing sounds. So, it depends on the intended audience.
|
| FWIW the sound effects also are not 'realistic' to my ear, at the
| moment.
|
| But again, the progress is huge, well done!
| toxik wrote:
| Yeah the drum solo really highlights how badly the model missed
| the point in a drum solo. I'm not a drummer, but this is just
| not pleasing to hear. Sounds like somebody randomly banging
| drums more or less in tempo.
|
| It does okay with muzak-type things though, which I guess
| tracks with my expectations.
| ZoomZoomZoom wrote:
| As a drummer, the 'drum solo` was surprisingly interesting to
| listen to, if you consider it happening over a stable 4/4
| pulse. The random-but-not-quite nature of the part makes for
| very unconventional rhythmic patterns. I'd like to be able to
| syncopate like this on the spot.
|
| Don't ask me to transcribe it.
|
| Tempo consistency is great. Extraneous noises and random cymbal
| tails show the deficiency of the model though.
| redman25 wrote:
| I think I was more disappointed by the music samples not having
| any transitions. Most songs have key changes and percussion
| turnovers.
| TrackerFF wrote:
| Now, if they can also generate MIDI-tracks to accompany - that'd
| be great.
|
| That would add some much-needed levels of customization.
| seydor wrote:
| Trying to describe music with words is awkward! We need a model
| that is trained on dance
| zdimension wrote:
| The few examples I was able to play are very promising,
| unfortunately the host seems to be getting some sort of HN-hug,
| because all the audio files are buffering every other second --
| they seem to throttle at 32 KiB/s.
| Jeff_Brown wrote:
| Music without changes is boring. I enjoyed the much less stable
| results of OpenAI's JuleBox (2021?) more than any music AI to
| come since. Their sound quality is better but they only seem to
| produce one monotonous texture at a time.
| coldcode wrote:
| As a musician, I found the pieces unremarkable. Of course, a
| lot of contemporary music is forgettable as well, as people try
| to create songs that all sound like hits but, in doing so,
| create uninteresting songs. I wonder what music the model is
| based on. I suppose for game music/sounds, perhaps its good
| enough?
| 3cats-in-a-coat wrote:
| The reconstruction demo is in effect an audio compression codec.
| And I bet it makes existing audio codecs look like absolute toys.
| emadm wrote:
| This is part of a paper on the prior version of the model:
| https://x.com/stableaudio/status/1755558334797685089?s=20
|
| https://arxiv.org/abs/2402.04825
|
| Which outperforms similar music models.
|
| The pace is accelerating and even better ones are coming with far
| greater cohesion and... stuff. Will be quite the year for music.
| emadm wrote:
| Particularly interesting with the scaled up version of
| https://www.text-description-to-speech.com
|
| Do try https://www.stableaudio.com for rights licensed model
| you can use commercially.
| nprateem wrote:
| The problem with music generation is difficulty in editing.
| Photos and text can be easily edited, but music can't be. Either
| the piece needs to be MIDI, with relevant parameterisation of
| instruments, or a UI creating that allows segments of the audio
| to be reworked like in-painting.
| Wistar wrote:
| A small point: Needs to be in something other than 44.1kHz. The
| two to which they make comparisons are at either 32kHz or 48kHz,
| both of which are friendlier for video work, something for which
| I think AI audio will be used a lot.
| m3kw9 wrote:
| Lots of work left to do man
___________________________________________________________________
(page generated 2024-02-13 23:01 UTC)