[HN Gopher] Stable Audio Open
       ___________________________________________________________________
        
       Stable Audio Open
        
       Author : davidbarker
       Score  : 145 points
       Date   : 2024-06-05 17:19 UTC (5 hours ago)
        
 (HTM) web link (stability.ai)
 (TXT) w3m dump (stability.ai)
        
       | minimaxir wrote:
       | Note that this has the typical noncommercial "you have to pay for
       | a membership to use commercially" Stability license.
        
         | simonw wrote:
         | Sigh:                   Stable Audio Open is an open source
         | text-to-audio model [...]
         | 
         | License: https://huggingface.co/stabilityai/stable-audio-
         | open-1.0/blo...                   STABILITY AI NON-COMMERCIAL
         | RESEARCH COMMUNITY LICENSE AGREEMENT
         | 
         | Stability are one of the worse offenders for abusing the term
         | "open source" at the moment.
        
           | littlestymaar wrote:
           | What a shame for something that could have been completely
           | free from copyright issues given the training source... (And
           | they're still no legal ground on such a license claim since
           | training hardly qualify as creative process on their side)
        
           | elpocko wrote:
           | Every time I call out the absurd interpretation of "Open
           | Source" in this space in general, I get showered with
           | downvotes and hateful attacks. One time someone posted their
           | "AI mashup" project on reddit, that egregiously violated the
           | terms of not only one, but several GPL-licensed projects.
           | Calling this out earned me a lot of downvotes and replies
           | with absolutely insane justifications from people with no
           | clue.
           | 
           | No one cares. Not in this space.
        
             | jeroenhd wrote:
             | AI fans don't seem to care much for copyright, unless it's
             | their work being stolen (remember the people that got mad
             | at "prompt stealing"?).
             | 
             | Companies are more risk-averse, though, and hobbyists on
             | Reddit don't have the money to do anything serious with
             | this software.
        
               | elpocko wrote:
               | Nope. Most of the relevant software is OSS made by
               | hobbyists. ComfyUI, llama.cpp, etc. For example, Nvidia
               | is building stuff based on ComfyUI, a GPL-licensed
               | application.
               | 
               | My complaint is about people ignoring the license of open
               | source software made by hobbyists. I disagree with your
               | ignorant "AI fans" generalization.
        
           | immibis wrote:
           | to be fair, open-source is far too corporate-friendly at the
           | moment. it _should_ be more non-commercial. To what extent,
           | is an open question.
        
             | jeroenhd wrote:
             | Open Source, as used by the most influential open source
             | projects, is corporate-friendly by definition.
             | 
             | There are good reasons to use something more aggressive.
             | I'm a big fan of the strict copyleft licenses for this,
             | even if that means companies like Google don't want to that
             | software anymore.
        
       | hehdhdjehehegwv wrote:
       | Highly commendable:
       | 
       | "The new model was trained on audio data from FreeSound and the
       | Free Music Archive. This allowed us to create an open audio model
       | while respecting creator rights."
       | 
       | This should be standard: commons go in, commons go out.
        
         | mastermedo wrote:
         | Except here the out is not commons if I understand correctly.
         | 
         | EDIT: might be cc, non commercial
        
       | samfriedman wrote:
       | Free idea because I'm never going to get around to building it:
       | An "AI 8 track" app; click record and hum a melody, then add a
       | prompt and click generate. The model converts your input to an
       | instrument matching your prompt, keeping the same notes/rhythm
       | you hummed in the original. Record up to 8 tracks and do some
       | simple mixing.
       | 
       | Would be a truly amazing thing for sketching songs! All you need
       | is decent humming/singing/whistling pitch. Hum and generate a
       | bass line, guitar lead, strings, etc. And then sing over it -
       | would make solo musicians able to sketch out a song far easier
       | than transcribing melody to a piano roll.
        
         | michaelbrave wrote:
         | the tech is technically there using 2-5 different AI solutions,
         | it mostly lacks an interface that automatically takes one step
         | to the next
        
         | xnx wrote:
         | Google MusicLM (and probably lots of other tools) do this:
         | "MusicLM .. can transform whistled and hummed melodies
         | according to the style described in a text caption."
         | 
         | https://google-research.github.io/seanet/musiclm/examples/
        
           | samspenc wrote:
           | Sounds like Suno AI will soon have this feature as well
           | https://x.com/suno_ai_/status/1794408506407428215
        
         | ortusdux wrote:
         | App name - Beat-it
         | 
         | https://www.youtube.com/watch?v=eZeYw1bm53Y
         | 
         | https://www.nme.com/blogs/nme-blogs/the-incredible-way-micha...
        
           | eman2d wrote:
           | A killer Ad would be converting each vocal track back into
           | the original song
        
         | beoberha wrote:
         | Facebook's MusicGen can pretty much do this
        
       | bufferoverflow wrote:
       | It produces decent audio, but something unpleasant about its high
       | frequencies. And no voices, it doesn't seem to talk or sing.
       | 
       | Udio, so far, is undefeated.
       | 
       | And ElevenLabs' music demos were very very impressive, but it's
       | still not released.
        
         | genericacct wrote:
         | Have you tried suno? It is quite good at least for some genres
        
           | bufferoverflow wrote:
           | Suno is good at generating music, but its voices sound
           | metallic with a dash of high-frequency noise. Which ruins it
           | for me. It's almost there though, I think they will fix it in
           | the next version.
        
         | hierophantical wrote:
         | None of these are impressive in the least. Anything I have
         | heard from Udio is basically trash. It is the AI art equivalent
         | of synthetic cats and pretty face shots. Who cares.
         | 
         | What is ultimately going to be undefeated is training your own
         | model.
        
           | bufferoverflow wrote:
           | I've heard many good things from Udio and the demo tracks
           | from ElevenLabs are very high quality.
           | 
           | https://www.udio.com/songs/ai2uAaBffRGdWdTNNqAbDx
           | 
           | https://www.udio.com/songs/19xQAMG6E1UXG7wNvP7nDW
           | 
           | https://www.udio.com/songs/mPAFYyFgo7Nqjb8ypeFfh9
           | 
           | https://www.udio.com/songs/7sKM9jMwZrXwTTzmMYN9qv
           | 
           | https://www.udio.com/songs/coixNX1gnJ1oWT8z2LQddk
        
       | Uehreka wrote:
       | > The new model was trained on audio data from FreeSound and the
       | Free Music Archive. This allowed us to create an open audio model
       | while respecting creator rights.
       | 
       | This feels like the "Ethereum merge moment" for AI art. Now that
       | there exists a prominent example with the big ethical obstacle
       | (Proof of Work in the case of Ethereum, nonconsensual data-
       | gathering in the case of generative AI) removed, we can actually
       | have interesting conversations about the ethics of these things.
       | 
       | In the past I've pushed back on people who made the argument that
       | "generative AI intrinsically requires theft from artists", but
       | the terrible quality of models trained on public domain data made
       | it difficult to make that argument in earnest, even if I knew I
       | was right in the abstract.
        
         | rfoo wrote:
         | Why is Proof of Work less ethical than Proof of Rich a.k.a.
         | rich being gradually more rich without doing anything?
         | 
         | Not saying PoW is safer (it's not), but less ethical is pretty
         | a bold claim.
        
           | rpicard wrote:
           | Environmental impact of the proof of work algorithms is my
           | understanding.
        
             | toenail wrote:
             | Proof of work mining is probably the only industry on the
             | planet that has the potential to be carbon negative AND
             | profitable.
        
               | foota wrote:
               | Renewable energy?
        
               | toenail wrote:
               | Waste energy like unused methane, flare gas and the like.
        
               | Uehreka wrote:
               | (sigh)
               | 
               | Is this the methane flaring argument, or Peter Thiel's
               | "windmills in Vermont"?
        
               | toenail wrote:
               | (sigh)
               | 
               | Do you expect a reply when you start like this?
        
               | Uehreka wrote:
               | I don't really care either way. I'm tired of having to
               | debunk the same sloppy arguments year after year.
        
           | Uehreka wrote:
           | The environmental impact. And yes, I know, 0.5%, but my issue
           | was always that if PoW currencies went from being a niche
           | subculture to a point where it was used for everyday exchange
           | (many people were arguing that this would and should happen)
           | that 0.5% would surely go up by a great deal. To a point
           | where crypto had to clear a super high bar of usefulness to
           | counterbalance the harm it would do.
           | 
           | To be fair, AI training also has a big carbon footprint, but
           | I feel like the utility provided by AI makes it easier to
           | argue that its usefulness counterbalances its ecological
           | harm.
        
             | avarun wrote:
             | There is no "environmental impact". Environmental impact
             | comes from energy production, not energy usage. It's
             | incoherent to argue others should tamper down their energy
             | usage because most folks producing energy aren't doing it
             | in an ethical way.
        
               | ben_w wrote:
               | Ultimately any proof-of-work system has to burn joules
               | rather than clock cycles (because any race on cycles-per-
               | joule is rapidly caught up), and that makes it clearer
               | where the waste is: to be economically stable, in the
               | face of adversarial actions by other nation states who
               | sometimes have a vested interest in undermining your
               | currency so actively seek the chaos and loss of trust in
               | a double-spend event, your currency has to be backed by
               | more electricity than any hostile power can spend on
               | breaking it.
        
               | skybrian wrote:
               | It seems you've come up with a proof that there's no such
               | thing as wasting electricity. When you prove an
               | extraordinary claim like that, it's time to go back and
               | figure out how you got it wrong.
        
               | lolinder wrote:
               | > It's incoherent to argue others should tamper down
               | their energy usage because most folks producing energy
               | aren't doing it in an ethical way.
               | 
               | There's a general consensus that paying someone else to
               | do your dirty work doesn't free you of the moral (or,
               | usually, legal) culpability for the damage done. If you
               | knowingly direct your money towards unethical providers,
               | you are directly increasing the demand for unethical
               | behavior.
               | 
               | (That's assuming that the producers themselves are
               | responsible for the ethics. If a producer is doing its
               | best to convert to clean energy as fast as possible, they
               | may be entirely in the clear but POW would _still_ be
               | unethical. In that scenario POW is placing strain on the
               | limited clean energy supplies, forcing the producer to
               | use more fossil fuels than they 'd otherwise need to.)
        
               | jncfhnb wrote:
               | Officer I merely stabbed the man. What he died from was
               | blood loss.
        
             | Sephr wrote:
             | This is greenwashing. You're still positively valuing the
             | past harms from proof of work.
        
           | pa7x1 wrote:
           | How do the rich become gradually richer under PoS? I'm
           | flabbergasted by the level of math education.
           | 
           | Assume we have 2 validators in the network; the first one
           | owns 90% of the network, the second one owns 10%. Lets call
           | them Whale and Shrimpy, respectively.
           | 
           | To make the numbers round let's assume total circulating
           | supply of ETH is 100 initially and that the yield resulting
           | from being a validator is 10% per year. After the first year,
           | 10 new ETH will have been minted. Whale would have gotten 9
           | ETH, and Shrimpy would have gotten 1 ETH. OP is assuming that
           | as 9 is bigger than 1, Whale is getting richer faster than
           | Shrimpy. But, let's look at the final situation globally.
           | 
           | At year 0:
           | 
           | Total ETH circulating supply: 100 ETH
           | 
           | Whale has 90 ETH. Owns 90% of the network.
           | 
           | Shrimpy has 10 ETH. Owns 10% of the network.
           | 
           | At year 1:
           | 
           | Total ETH circulating supply: 110 ETH
           | 
           | Whale has 99 ETH. Owns 90% of the network.
           | 
           | Shrimpy has 11 ETH. Owns 10% of the network.
           | 
           | Whale has exactly the same network ownership after validating
           | for 1 whole year, the network is not centralizing at all! The
           | rich are not getting richer any faster than the poor.
           | 
           | TL;DR: Friends don't let friends skip elementary math
           | classes.
        
             | rfoo wrote:
             | Sure, friends also won't let friends skip the fact that
             | circulating supply of ETH is now decreasing instead of
             | increasing.
             | 
             | Also, only ~30% tokens are staked. The 30% who chose to
             | stake essentially tax the other 70% in use. Each of the
             | validator do the same amount of work (ok, strictly speaking
             | you get to do more when you have more ETH staked, but being
             | a validator is cheap and does not cost significantly more
             | energy even if you are being selected more frequently
             | because running one proposal is too cheap, that's the whole
             | environmental point, right?) except what they receive is
             | proportioned to how much they stake.
             | 
             | I hate being mean, but sorry, remembering to check one's
             | assumption is a habit I gained after elementary school, so
             | maybe that's too hard for you.
        
               | pa7x1 wrote:
               | > Sure, friends also won't let friends skip the fact that
               | circulating supply of ETH is now decreasing instead of
               | increasing.
               | 
               | This changes absolutely nothing of the calculation.
               | Furthermore, the change in circulating supply last year
               | was of 0.07%.
               | 
               | > Also, only ~30% tokens are staked.
               | 
               | Correct.
               | 
               | > The 30% who chose to stake essentially tax the other
               | 70% in use.
               | 
               | There is something called opportunity cost. With the
               | existence of liquid staking derivatives the choice to
               | stake or not is one of opportunity cost. Plenty of people
               | may consider the return observed by staking insufficient
               | given the opportunity cost and additional risks.
               | Participating in staking is fully permissionless, stakers
               | are not taxing non-stakers. They are being remunerated
               | for their work.
               | 
               | > Each of the validator do exactly same amount of work
               | (that's the point, right) except what they receive is
               | proportioned to how much they stake.
               | 
               | Incorrect. A staker does proportionate amount of work to
               | its stake. That's why it gets paid more. A staker gets
               | paid for fulfilling its duties as defined in the protocol
               | (attesting, proposing blocks, participating in sync
               | committees). For each of those things there are some
               | rewards and some punishments in case you fail to fulfill
               | them. If a staker has more validators running you simply
               | fulfill more of those duties more often, hence your
               | reward scales linearly with number of validators.
        
               | rfoo wrote:
               | > Participating in staking is fully permissionless,
               | stakers are not taxing non-stakers. They are being
               | remunerated for their work.
               | 
               | That's just a more polite way to say tax. Being
               | permissionless is cool, but it's still tax in my dict.
               | 
               | > There is something called opportunity cost.
               | 
               | And, who is going to be able to have a larger percentage
               | of their funds staked, a poor or a whale? You need a
               | (mostly) fixed amount of liquidity to use the thing.
               | 
               | > Incorrect. A staker does proportionate amount of work
               | to its stake.
               | 
               | Apologies, I edited my original reply which should answer
               | this.
               | 
               | In short, I don't see anything preventing me to run 10000
               | validators with 32 ETH each with very similar cost to
               | running just one. It's certainly not linear.
        
               | pa7x1 wrote:
               | > That's just a more polite way to say tax. Being
               | permissionless is cool, but it's still tax in my dict.
               | 
               | It most certainly is not. They are doing a work for the
               | network and getting remunerated for it. That's not a tax.
               | That's what is commonly referred to as a job. A kid that
               | delivers newspapers over the weekend is not taxing the
               | kid that decides not to. Both make a free decision on
               | what to do with their time and effort given how much it's
               | worth to them. Running a validator takes skill, time,
               | opportunity cost, and you assume certain risks of capital
               | loss. You are getting remunerated for it.
               | 
               | > And, who is going to be able to have a larger
               | percentage of their funds staked, a poor or a whale? You
               | need a (mostly) fixed amount of liquidity to use the
               | thing.
               | 
               | Indeed, the protocol cannot solve wealth inequality.
               | That's an out of protocol issue. It cannot cure cancer
               | either.
               | 
               | > In short, I don't see anything preventing me to run
               | 10000 validators with 32 ETH each with very similar cost
               | to running just one. It's certainly not linear.
               | 
               | There are some fixed costs, indeed. But they are rather
               | negligible. You need a consumer-grade PC (1000 USD) and
               | consumer-grade broadband to solo stake. Or you can use a
               | Liquid Staking Derivative which will have no fixed costs
               | but will have a 10% cut. The curve of APY as a function
               | of stake is very flat. Almost anything else around us has
               | greater barriers of entry or economies of scale.
        
               | everfree wrote:
               | > And, who is going to be able to have a larger
               | percentage of their funds staked, a poor or a whale?
               | 
               | This is a truth that's fundamental to all types of
               | investing. Advantaged people can set aside millions and
               | not touch it for a year or five or twenty. Disadvantaged
               | people can't invest $20 because there's a good chance
               | they'll need it to buy dinner.
               | 
               | Stocks, bonds, CDs, real estate, it all works like this.
               | You've touched on a fundamental property of wealth.
        
               | hanniabu wrote:
               | > Also, only ~30% tokens are staked. The 30% who chose to
               | stake essentially tax the other 70% in use.
               | 
               | And in PoW miners tax 100% of holders.
               | 
               | > what they receive is proportioned to how much they
               | stake
               | 
               | Wealthy miners with state of the art ASICS benefit more
               | than some kid mining at home with an old GPU.
               | Maintenance/cost of mining equipment benefits from
               | economies of scale too.
               | 
               | I hate being mean, but sorry, remembering to check one's
               | assumption is a habit I gained after elementary school,
               | so maybe that's too hard for you.
        
         | Workaccount2 wrote:
         | The idea that AI trained on artist created content is theft is
         | kind of ridiculous anyway. Transformers aren't large archives
         | of data with needles and thread to sew together pieces. The
         | whole argument is meant to stifle an existential threat, not to
         | halt some illegal transgression. If they cared about the latter
         | a simple copyright filter on the output of the models would be
         | all that's needed.
        
           | JohnKemeny wrote:
           | I think you should read the case material for NY Times v
           | OpenAI and Microsoft.
           | 
           | It literally says that within ChatGPT is stored, verbatim,
           | large archives of NY Times articles and that they were able
           | to retrieve them through their API.
        
             | Workaccount2 wrote:
             | ..which makes no sense. It is either an argument of
             | ignorance or of purposeful deceit. There is no coherent
             | data corpus (compressed or not) in ChatGPT. What is stored
             | are weights that create a string of tokens that can
             | recreate excerpts data that it was trained on, with some
             | imperfect level of accuracy.
             | 
             | Which I agree is problematic, and OpenAI doesn't have the
             | right to disseminate that.
             | 
             |  _But that doesn 't mean OpenAI doesn't have the right to
             | train on it_.
             | 
             | Content creators are doing a purposeful slight of hand to
             | confabulate "outputting copyrighted data" with "training on
             | copyrighted data".
             | 
             | It's illegal for me to read an NYT article and recite it
             | from memory onto my blog.
             | 
             | It's not illegal for me to read an NYT article and write my
             | own summary of the article's contents on my blog. This has
             | been true forever and has forever been a staple in new
             | content creation.
        
               | Philip-J-Fry wrote:
               | When you describe ChatGPT as just a model with weights
               | that can create a string of tokens, is it any different
               | from any lossless compression algorithm?
               | 
               | I'm sure if I had a JPEG of some copyrighted raw image it
               | could still be argued that it is the same image. JPEG is
               | imperfect, the result you get is the same every time you
               | open it but it's not the same as the original input data.
               | 
               | ChatGPT would give you the same output every time, and it
               | does if you turn off the "temperature" setting. Introduce
               | a bit of randomness into a JPEG decoder and functionally
               | what's the difference? A slightly different string of
               | tokens for ChatGPT versus a slightly different collection
               | of pixels for a JPEG.
        
               | CyberDildonics wrote:
               | Did you mean lossy compression algorithm? That would make
               | sense.
        
               | bckr wrote:
               | > There is no coherent data corpus (compressed or not) in
               | ChatGPT.
               | 
               | I disagree.
               | 
               | If you can get the model to output an article verbatim,
               | then that article is stored in that model.
               | 
               | Just because it's not stored in the same format is
               | meaningless. It's the same content regardless of whether
               | it's stored as plaintext, compressed text, PDF, png, or
               | weights in a model.
               | 
               | Just because you need an algorithm such as a specialized
               | prompt to retrieve this memorized data, is also
               | irrelevant. Text files need to be interpreted in order to
               | display them meaningfully, as well.
        
               | cthalupa wrote:
               | > If you can get the model to output an article verbatim,
               | then that article is stored in that model.
               | 
               | You can't get it to do that, though.[1]
               | 
               | The NYT vs OpenAI case, if anything, shows that even with
               | significant effort trying to get a model to regurgitate
               | specific work, it cannot do it. They found articles it
               | had overfit on due to snippets being reposted elsewhere
               | across the internet, and they could only get it to output
               | those snippets, and not in correct order. The NYT,
               | knowing the correct order, re-arranged them to fit the
               | ordering in the article.
               | 
               | Even doing this, they were only able to get a hundred or
               | so words out of the 15k+ word articles.
               | 
               | No one who knows anything about these models disagrees
               | that overfitting can cause this sort of behavior, but the
               | overwhelming majority of the data in these models is not
               | overfit and they take a lot of care to resolve the issue
               | - overfitting isn't desirable for general purpose model
               | performance even if you don't give a shit about copyright
               | laws at all.
               | 
               | People liken it to compression, like the GP mentioned,
               | and in some ways, it really is. But in the most real
               | sense, even with the incredibly efficient "compression"
               | the models do, there's simply no way for them to actually
               | store all this training data people seem to think is
               | hidden in there, if you just prompt it the right way. The
               | reality is only the tiniest fraction of overfit data can
               | be recovered this way. That doesn't mean that the overfit
               | parts can't be copyright infringing, but that's a very
               | separate argument than the general idea that these are
               | constantly putting out a deluge of copyrighted material.
               | 
               | (None of this goes for toy models with tiny datasets,
               | people intentionally training models to overfit on data,
               | etc. but instead the "big" models like GPT, Claude,
               | Llama, etc.)
               | 
               | 1. https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkx
               | bmgpe/...
        
               | bckr wrote:
               | > The NYT, knowing the correct order, re-arranged them to
               | fit the ordering in the article.
               | 
               | > Even doing this, they were only able to get a hundred
               | or so words out of the 15k+ word articles.
               | 
               | OK, that's less material than I believed, which shows the
               | details matter. But we agree that the overfit material,
               | while limited, is stored in the model.
               | 
               | Of course, this can be (and surely is) mitigated by
               | filtering the output, as long as the product is the
               | output and not the model itself.
        
               | semi wrote:
               | >Just because you need an algorithm such as a specialized
               | prompt to retrieve this memorized data, is also
               | irrelevant.
               | 
               | I disagree. Granted I'm a layman and not a lawyer so I
               | have no clue how the court feels. But I can certainly
               | make very specialized algorithms to produce whatever
               | output I want from whatever input I want, and that
               | shouldn't let me declare any input as infringing on any
               | rights.
               | 
               | For the reducto ad absurdum example: I demand everyone
               | stops using spaces, using the algorithm 'remove a space
               | and add my copyrighted text' it produces an identical
               | copy of my copyrighted text.
               | 
               | For the less absurd example.. if I took any clean model
               | without your copyrighted text, and brute forced prompts
               | and settings until I produced your text, is your model
               | violating the copyright or is my inputs?
        
               | SahAssar wrote:
               | > Content creators are doing a purposeful slight of hand
               | to confabulate "outputting copyrighted data" with
               | "training on copyrighted data".
               | 
               | I don't think so, I think it's usually argued as two
               | different things.
               | 
               | The "training on copyrighted data" argument is usually
               | that we never licensed this work for this sort of use and
               | it is different enough from previously licensed uses that
               | it should be treated differently.
               | 
               | The "outputting copyrighted data" argument is somewhat
               | like your output is so similar as to constitute a (at
               | least) partial copy.
               | 
               | Another argument is that licensed data is whitewashed by
               | being run through a model. So you could have GPL licensed
               | code that is open source run through a model and then
               | output exactly the same but because it has been outputted
               | by the model it is considered "cleaned" from the GPL
               | restrictions. Clearly this output should still be GPL:ed.
               | 
               | > It's not illegal for me to read an NYT article and
               | write my own summary of the article's contents on my
               | blog. This has been true forever and has forever been a
               | staple in new content creation.
               | 
               | What if I compress the NYT article with gzip? What if I
               | build a LLM model that always replies with the full
               | article within 99% accuracy? Where is the line?
               | 
               | This is not a technical issue, we need to decide on this
               | just like we did with copyright, trademarks, etc.
               | Regardless of what you think this is not a non-issue and
               | we cant use the same rules as we did up until now unless
               | we treat all ML systems as either duplication or humans
               | and neither seems to solve the issues.
        
               | freedomben wrote:
               | > _Another argument is that licensed data is whitewashed
               | by being run through a model. So you could have GPL
               | licensed code that is open source run through a model and
               | then output exactly the same but because it has been
               | outputted by the model it is considered "cleaned" from
               | the GPL restrictions. Clearly this output should still be
               | GPL:ed._
               | 
               | I don't think anybody is making that argument. The NY
               | Times claims to have gotten ChatGPT to spit out NY Times
               | articles verbatim but there is considerable doubt about
               | that. Regardless, everyone agrees that a verbatim (or
               | close to) copy is copyright violation, even OpenAI. Every
               | serious model has taken steps to prevent that sort of
               | thing.
        
               | neuralRiot wrote:
               | > It's not illegal for me to read an NYT article and
               | write my own summary of the article's contents on my
               | blog. This has been true forever and has forever been a
               | staple in new content creation.
               | 
               | It's not that clear-cut. It falls into the "Fair use
               | doctrine"The cose 107 of the US copyright law states that
               | the resolutiodepends on>
               | 
               | > (1) the purpose and character of the use, including
               | whether such use is of a commercial nature or is for
               | nonprofit educational purposes; (2) the nature of the
               | copyrighted work; (3) the amount and substantiality of
               | the portion used in relation to the copyrighted work as a
               | whole; and (4) the effect of the use upon the potential
               | market for or value of the copyrighted work.
               | 
               | Another thing we need to consider is that the law was
               | redacted with the human mind limitations as a unconcious
               | factor, (i.e not many people would be able to recite War
               | and peace verbatim from memory). This just brings up the
               | fact that copyright law needs a complete re-think.
        
           | mey wrote:
           | Our copyright model isn't sufficient yet. Is putting a work
           | through a training/model sufficient to clear the
           | transformative use bar? That doesn't make you safe from
           | Trademarks. If the model can produce outputs on the other
           | side that aren't sufficiently transformative then that single
           | instance is a copyright violation.
           | 
           | Honestly, instead of trying to cleanup the output, it's much
           | safer to create a licensed input corpus. People haven't
           | because it's expensive and time consuming. Every time I
           | engage with an AI vendor, my first question is do you
           | indemnify from copyright violations of your output. I was
           | shocked that Google Gemini/Bard only added that this year.
        
             | ffsm8 wrote:
             | I'm honestly surprised AI-washing hasn't become way more
             | widespread then it is at this point.
             | 
             | I mean recording a good song is hard. Generating a good
             | song almost impossible. But my gut feeling would've been
             | that recreating a popular song for plausible deniability
             | would be a lot easier.
             | 
             | Same with republishing bestselling books and related media.
             | (I.e. take Lord of the rings and feed it paragraph for
             | paragraph into an LLM that you've prompted to rephrase each
             | to a currently bestselling author.)
        
             | jncfhnb wrote:
             | Nothing will ever protect you from trademark violations
             | because trademarks can be violated completely by accident
             | without knowledge of the real work. Copying is not the
             | issue.
        
           | QuantumGood wrote:
           | NY Times v OpenAI and Microsoft says the opposite, that
           | verbatim, large archives of NY Times articles were retrieved
           | via API. This may or may not matter to how LLMs work, but
           | "large archive" seems accurate, other than semantic arguments
           | (e.g. "Compressed archive" may be semantically more
           | accurate).
        
             | cthalupa wrote:
             | > NY Times v OpenAI and Microsoft says the opposite, that
             | verbatim, large archives of NY Times articles were
             | retrieved via API.
             | 
             | This does not match my understanding of the information
             | available in the complaint. They might claim they were able
             | to do this, but the complaint itself provides some specific
             | examples that OpenAI and Microsoft discuss in a motion to
             | dismiss... and I think the motion does a very strong job of
             | dismantling that argument based on said examples.
             | 
             | https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkxbmgpe
             | /...
        
           | tomcam wrote:
           | Yet before "safeguards" were added a prompt could say "in the
           | style of Studio Ghibli" and you could get exactly that.
           | 
           | Would it be possible if Studio Ghibli images had not been
           | used in the training?
        
             | semi wrote:
             | if it was trained on a sufficient amount of fan art made in
             | the studio Ghibli style and tagged as such, yes.
             | 
             | otherwise those would just be unknown words, same as asking
             | an artist to do that without any examples.
             | 
             | though I am curious how performance would differ between
             | training on only actual studio Ghibli art, only fan art, or
             | a mix. Maybe the fan art could convey what we expect
             | 'studio Ghibli style' to be even more, whereas actual art
             | from them could have other similarities that that tag
             | conveys.
        
             | Unai wrote:
             | I don't understand. If I make a painting (hell, or a whole
             | animated movie) in the style of Studio Ghibli, am I
             | infringing their copyright? I don't think so. A style is
             | just an idea, if you want to protect an idea to the point
             | of no one even getting inspired by it just don't let it out
             | of your brain.
             | 
             | If the produced work is not a copy, why does it matter if
             | it was generated by a biological brain or by a mechanical
             | one?
        
           | jrm4 wrote:
           | I fail to see how the argument is ridiculous; and I'll bet
           | that a jury would find the idea that "there is a copy inside"
           | at least reasonable, especially if you start with the premise
           | that "the machine is not a human being."
           | 
           | What you're left with is a machine that produces "things that
           | strongly resemble the original, that would not have been
           | produced, had you not fed the original into the machine."
           | 
           | The fact that there's no "exact copy inside" the machine
           | seems a lot like splitting hairs; like saying "Well, there's
           | no paper inside the hard drive so the essence of what is
           | copyable in a book can't be in it"
        
             | GaggiX wrote:
             | Having exact copies of the samples inside the model weights
             | would be an extremely inefficient use of space, and also it
             | would not generalize, unless it generated a copy so close
             | to the original that it would violate copyright law if
             | used, I wouldn't find it very reasonable to think that
             | there is a memorized copy inside the model weights
             | somewhere.
        
               | ziofill wrote:
               | A program that can produce copies is the same as a copy.
               | How that copy comes into being (whether out of an
               | algorithm or read from a support) is related, but not
               | relevant.
        
               | LordDragonfang wrote:
               | >A program that can produce copies is the same as a copy.
               | 
               | A program that _always_ produces copies is the same as a
               | copy. A program that merely _can_ produce copies
               | categorically is not.
               | 
               | The Library of Babel[1] can produce copyrighted works,
               | and for that matter so can any random number generator,
               | but in almost every normal circumstance will not. The
               | same is true for LLMs and diffusion models. While there
               | are some circumstance that you can produce copies of a
               | work, in natural use that's only for things that will
               | come up thousands of times in its training set -- by and
               | large, famous works in the public domain, or cultural
               | touch-stones so iconic that they're essentially
               | genericized (one main copyrighted example are the
               | officially released promo materials for movies).
               | 
               | [1] https://libraryofbabel.info/
        
               | GaggiX wrote:
               | Yeah that's right, I doubt that a model would generate an
               | image or text so close to a real one to violate copyright
               | law just by pure chance, the image/text space is
               | incredibly large.
        
               | Arainach wrote:
               | An MP3 file is a lossy copy, but is still copyright
               | infringement.
               | 
               | Copyright infringement doesn't require exact copies.
        
               | GaggiX wrote:
               | I didn't say it takes an exact copy for copyright
               | infringement.
        
             | Workaccount2 wrote:
             | If a made a bot that read amazon reviews and then output a
             | meta-review for me, would that be a violation of amazon's
             | copyright? (I'm sure somewhere in the amazon ToS they claim
             | all ownership rights of reviews).
             | 
             | If it output those reviews verbatim, sure I can see the
             | issue, the model is over fitting. But if I tweak the model
             | or filter the output to avoid verbatim excerpts, does an
             | amazon lawyer have a solid footing for a "violation of
             | copyright" lawsuit?
        
               | jononor wrote:
               | As far as I understand, according to current copyright
               | practices: If you sing a song that someone else has
               | written, or pieces thereof, you are in violation. This is
               | also the case of you switch out the instrumentation
               | completely, say play trumpet instead of guitar, or a male
               | choir sings a female line. If on would make a medley many
               | such parts, it is not automatically not violation anymore
               | either. So we do have examples of things being very far
               | from verbatim copy, being considered violations.
        
             | lisperforlife wrote:
             | I am curious about models like encodec or soundstream. They
             | are essentially meant to be codecs informed by the music
             | they are meant to compress to achieve insane compression
             | ratios. The decompression process is indeed generative
             | since a part of the information that is meant to be decoded
             | is in the decoder weights. Does that pass the smell test
             | from a copyright law's perspective? I believe such a
             | decoder model is powering gpt-4o's audio decoding.
        
           | kimixa wrote:
           | I think the definition between "Lossy Compression" and
           | "Trained AI" is... vague according to the current legal
           | definitions. Or even "lossless" in some cases - as shown by
           | people being able to get written articles output verbatim.
           | 
           | While the extremes are obvious, there's a big stretch of gray
           | in the middle. A similar issue occurs in non-AI art, the
           | difference between inspiration and tracing/copying isn't well
           | defined either, but the current method of dealing with that
           | (being on a case-by-case basis and a human judging the
           | difference) clearly cannot scale to the level that many
           | people intend to use these tools.
        
             | cthalupa wrote:
             | Has anyone been able to actually get a verbatim copy of a
             | written article? The NYT got a ~100 word fragment made up
             | of multiple snippets of a ~15k word article, with the
             | different snippets not even being in order. (The Times had
             | to re-arrange the snippets to match the article after the
             | fact)
             | 
             | I am simply not aware of anyone successfully doing this.
        
               | kimixa wrote:
               | The amount of content required to call it a "Copy" is
               | also a gray area.
               | 
               | Same with the idea of "prompting" and the amount required
               | to generate that copywritten output - again there's the
               | extremes of "The prompt includes copywritten information"
               | to "Vague description".
               | 
               | Arguably some of the same issues exist outside AI, just
               | it's accessibility, scale, and lack of a "Legal
               | Individual" on one side complicates things. For example,
               | if I describe Micky Mouse sufficiently accurately to an
               | artist they reproduce it to the degree it's considered
               | copyright infringement, is it me or the artist that did
               | the infringement? Then what if the artist /had/ seen the
               | previously copywritten artwork, but still produced the
               | same output from that same detailed prompt?
        
           | immibis wrote:
           | What's good for the goose is good for the gander. It may or
           | may not be like theft, but either way, if one of us trained
           | an AI on Hollywood movies, you best believe we'd get sued for
           | eleventy billion dollars and lose. It's only fair that we
           | hold corporations to the same standard.
        
         | hecanjog wrote:
         | I also highly doubt anyone who signed agreements to have their
         | music included in the Free Music Archive would have been OK
         | with this. The particular type of license was important to
         | contributors and there's a difference between allowing for
         | rebroadcast without paying royalties and allowing for
         | derivative works... I don't really care to argue the point, but
         | it's why there were so many different types of licenses for the
         | original FMA. This just glosses over all that.
        
           | blargey wrote:
           | If you look at the repo where the model is actually hosted
           | they specify
           | 
           | > All audio files are licensed under CC0, CC BY, or CC
           | Sampling+.
           | 
           | These explicitly permit derivative works and commercial use.
           | 
           | > Attribution for all audio recordings used to train Stable
           | Audio Open 1.0 can be found in this repository.
           | 
           | So it's not being glossed over, and licenses are being abided
           | by in good faith imo.
           | 
           | I wish they'd just added a sentence to their press release
           | specifying this, though, since I agree it looks suspect if
           | all you have to go by is that one line.
           | 
           | (Link: https://huggingface.co/stabilityai/stable-audio-
           | open-1.0#dat... )
        
         | TaylorAlexander wrote:
         | I'm so happy to see this! I've been saying for a while, if they
         | focused on sample efficiency and building large public
         | datasets, including encouraging Twitter and other social media
         | sites to add image license options and also encouraging people
         | to add alt text (which would also help the vision impaired!),
         | they really could build the models they want while also
         | respecting creatives, thus avoiding pissing a bunch of people
         | off. It's nice to see Stability step up and actually train on
         | open data!
        
         | ancientworldnow wrote:
         | This has been Adobe Firefly's value proposition for months now.
         | It works fine and is already being utilized in professional
         | workflows with the blessing of lawyers.
        
         | hapticmonkey wrote:
         | If you're worried about Proof of Work leading to giant server
         | farms using huge amounts of energy, then I've got something to
         | tell you about AI...
        
         | Sephr wrote:
         | The "Etherium merge moment" is entirely different and it irks
         | me to see it compared favorably with this project. It didn't
         | solve proof of work at all, as it assigned positive value to
         | past environmental harms.
         | 
         | The only 'solution' (more a mitigation) to Etherium proof of
         | work's environmental harms is to devalue it.
         | 
         | Unlike your example, this project actually seems to be a net
         | positive for society that wasn't built on top of clear and
         | obvious harms.
        
       | nickthegreek wrote:
       | I keep hearing about the pending death of Stability, but here we
       | are with another release. I am rootin for them.
        
       | treesciencebot wrote:
       | This looks like the one that got leaked a couple weeks ago, so i
       | guess they decided its better to open source at this point after
       | the leak [0].
       | 
       | [0]: https://x.com/cto_junior/status/1794632281593893326
        
         | tmabraham wrote:
         | it was already planned for open-sourcing, the leak did not
         | affect the plans in any way
        
         | washadjeffmad wrote:
         | It is. The model.ckpt from petra-hi-small matches the official
         | HF repo.
         | 
         | SHA256: 6049ae92ec8362804cb4cb8a2845be93071439da2daff9997c285f8
         | 119d7ea40
        
       | mg wrote:
       | When they released Stable Audio 2.0, I tried to create "unusual"
       | songs with prompts like "roaring dragon tumbling rocks stormy
       | morning". The results are quite interesting:
       | 
       | https://www.youtube.com/@MarekGibney/videos
       | 
       | I find it fascinating that you can put all information needed to
       | recreate a whole complex song into a string like
       | rough stormy morning car rocks hammering          drum solo
       | roaring dragon downtempo          audiosparx-v2-0 seed 5
       | 
       | This means a whole album of these songs could easily fit into a
       | single TCP/IP packet.
       | 
       | If a music genre evolves in which each song is completely defined
       | by its title, maybe it will be called "promptmusic".
       | 
       | I will try the new model with the same prompts and upload the
       | results.
        
         | TeMPOraL wrote:
         | That's a great example of the fact that information about
         | something, say a song, isn't entirely encoded only in the
         | medium you use to transfer it - it's partially there, and
         | partially in the device you're using to read it! An MP3 file is
         | just gibberish without a program that can decode it.
         | 
         | In this case, the whole album could indeed fit into a single
         | TCP/IP packet - because the bulk of information that make up
         | those songs is contained in the model, which weights however
         | many gigabytes it does. The packet carrying your album is
         | meaningless until the recipient also procures the model.
         | 
         | (Tangent: this observation was my first _mind. blown._
         | experience when reading GEB over a decade ago.)
        
       | drivebyhooting wrote:
       | From announcement I couldn't figure out if it can do audio to
       | audio.
       | 
       | Text to audio is too limiting. I'd rather input a melody or a
       | drum beat and have the AI compose around it.
        
         | duranduran wrote:
         | This kind of exists, but I doubt there are any commercial
         | solutions based on it yet.
         | https://crfm.stanford.edu/2023/06/16/anticipatory-music-tran...
         | 
         | Their paper says that they trained it on the Lakh MIDI dataset,
         | and they have a section on potential copyright issues as a
         | result.
         | 
         | Assuming you don't care for legal issues, theoretically you
         | could do: raw signal -> something like Spotify Basic Pitch
         | (outputs MIDI) -> Anticipatory (outputs composition) -> Logic
         | Pro/Ableton/etc + Native Instruments plugin suite for full song
        
       | ben_w wrote:
       | > Warm arpeggios on an analog synthesizer with a gradually rising
       | filter cutoff and a reverb tail
       | 
       | I appreciate that the underlying tech is completely different and
       | much more powerful, but it is a pretty strange feeling to find a
       | major AI lab's example sounding so similar to an actual Markov
       | chain MIDI generator I made 14-15 years ago:
       | https://youtu.be/depj8C21YHg?si=74a4DHP14EFCeYrB
       | 
       | (Not _that_ similar, just enough for me to go  "huh, what a
       | coincidence").
        
       | lancesells wrote:
       | "a drummer could fine-tune on samples of their own drum
       | recordings to generate new beats"
       | 
       | Yes, this is the reason someone becomes a drummer.
        
       ___________________________________________________________________
       (page generated 2024-06-05 23:01 UTC)