Post AddsrahaAT4Amb7bea by BeAware@social.beaware.live
 (DIR) More posts by BeAware@social.beaware.live
 (DIR) Post #AddrhftntGNm3FXu8e by Wolven@ourislandgeorgia.net
       2024-01-08T16:49:46Z
       
       0 likes, 2 repeats
       
       So, just to get this out of the way: The idea that you can't train #LargeLanguageModels or other "#generativeAI" systems without copyrighted materials is a) bullshit, and b) a very subtle deflection away from the actual point of these conversations. Consent, compensation, and control of one's works have always been the point. You ask permission, you pay people fairly, and you let them opt out whenever they want.Is this way of building "#AI" substantially more difficult and expensive than than the current way? Yep. Would it have been way cheaper and easier to do at the outset AND saved you literally years of ongoing and upcoming civil and criminal litigation? You bet your ass it would have.So. I dunno, y'all. Seems like a skill issue.https://www.theguardian.com/technology/2024/jan/08/ai-tools-chatgpt-copyrighted-material-openai
       
 (DIR) Post #AddsAdCezb1W5XAoEq by kellogh@hachyderm.io
       2024-01-08T16:54:54Z
       
       0 likes, 0 repeats
       
       @Wolven i think you're 100% correct. On the other hand, when you look at early stage tech innovation, it's usually more helpful to look at momentum than cost/revenue/profit lines. so while your conclusion is likely quite true, i think it's more true simply because a lot of momentum wouldn't have been created, so the user base would've been smaller, meaning fewer lawsuits, but also slower innovation and less ideas flowing
       
 (DIR) Post #AddsrahaAT4Amb7bea by BeAware@social.beaware.live
       2024-01-08T17:02:12Z
       
       0 likes, 0 repeats
       
       @Wolven or....just hear me out....OR, the ones who don't want their work stolen, don't upload it to the internet where anyone and their mom can Right Click > Save as....just a thought.
       
 (DIR) Post #AddssTE3Rx7vswjIlU by castironflower@hachyderm.io
       2024-01-08T17:01:23Z
       
       0 likes, 0 repeats
       
       @Wolven i agree in general and am anti llm/etcbut i dont think they could have made the same product without stealing or paying for(from large publishers and media companies) copyrighted materialyou can obviously train it on a out of copyright, creative commons, and other large free  sources but it would, i believe, also limit the quality and scope of what it could do 1/2
       
 (DIR) Post #AddtC34SP0msYjZmLo by arghdos@heads.social
       2024-01-08T17:03:23Z
       
       0 likes, 0 repeats
       
       @Wolven I'm somewhat surprised that, given the news of their recent deals with rights-holders [1][2] that they're not backtracking and insisting on "regulation" (that they, or some friendly party will write) such that only large players who can handle this sort of stuff are allowed.  I.e., building their moat after they have directly benefited from the 'open-water' [3].[1]: https://www.ap.org/press-releases/2023/ap-open-ai-agree-to-share-select-news-content-and-technology-in-new-collaboration[2]: https://openai.com/blog/axel-springer-partnership[3]: https://www.semianalysis.com/p/google-we-have-no-moat-and-neither
       
 (DIR) Post #AddtC6GyUvaCUQSJmK by Wolven@ourislandgeorgia.net
       2024-01-08T17:06:16Z
       
       0 likes, 0 repeats
       
       @kellogh Honestly? Don't see it. If you told a bunch of people that you had a tool trained on a ridiculous amount of open-source/public domain works that they could play with, and that they could a) be compensated to b) help train further? Face apps where you could make yourself look like old cartoons and 1920's movie stars, as we move into and through the 2020's?I think the adoption of those would have been IMMEDIATE.
       
 (DIR) Post #AddtPLN3ftGz52cGkC by kellogh@hachyderm.io
       2024-01-08T17:08:44Z
       
       0 likes, 0 repeats
       
       @Wolven i think there's a few more dynamics going on. the elephant in the room is that it cost $100M to train GPT-4. no open source project has ever gotten that level of funding, and any corporation has to answer to their investors
       
 (DIR) Post #AddtYUr7mNDz6U2mpM by castironflower@hachyderm.io
       2024-01-08T17:05:48Z
       
       0 likes, 0 repeats
       
       @Wolven as an example how could you do "make me a short story in the prose of <21st century poet> about pikachu" without  having trained on both pokemon books and the past 15yrs of published poets?for me that means we shouldnt make it but it does mean a company is going to try
       
 (DIR) Post #AddtYWlseqDb2nbcvo by Wolven@ourislandgeorgia.net
       2024-01-08T17:10:34Z
       
       0 likes, 0 repeats
       
       @castironflower You could instead train them on conversations and writing you have commissioned, as well as a wealth of public domain things; like I said to @kellogh: If you told a bunch of people that you had a tool trained on a ridiculous amount of open-source/public domain works that they could play with, and that they coulda) be compensated tob) help train further? Face apps where you could make yourself look like old cartoons and 1920's movie stars, as we move into and through the 2020's?I think the adoption of those would have been IMMEDIATE.
       
 (DIR) Post #Addtd0C1D8iNS4xud6 by Wolven@ourislandgeorgia.net
       2024-01-08T17:11:25Z
       
       0 likes, 0 repeats
       
       @BeAware that's some real shitty victim blaming you got there. Bye.
       
 (DIR) Post #AdduE2VZZ6bDLblwsC by mia@hcommons.social
       2024-01-08T17:18:03Z
       
       0 likes, 0 repeats
       
       @Wolven 'In its submission, OpenAI said it believed that “legally, copyright law does not forbid training”' is disingenuous at best
       
 (DIR) Post #AdduRnh5RPeBSKigkK by Wolven@ourislandgeorgia.net
       2024-01-08T17:20:37Z
       
       0 likes, 0 repeats
       
       @kellogh Using open source materials doesn't always mean you have to make the end result of that project open source; like, it's an expected norm, but in terms of things they're already contravening, this would just be rude, rather than a violation of someone's rights and dignity. That's all to say, they could use OS/PD stuff, and still keep their code in-house and make their money on it. Again: Harder? Sure. Definitely doable, if they cared about consent.
       
 (DIR) Post #AdduZBo0C3JYGPTg6y by kellogh@hachyderm.io
       2024-01-08T17:21:51Z
       
       0 likes, 0 repeats
       
       @Wolven i'm completely agreeing with you, and hearing when you say "harder" and reflecting it back as, "that's why it wasn't done"
       
 (DIR) Post #AddvedT0sNOdSLoy0G by enkiv2@eldritch.cafe
       2024-01-08T17:33:49Z
       
       0 likes, 0 repeats
       
       @Wolven Problems related to the necessary scale of training data are, I think, real & worth taking seriously because they represent a serious limitation of this kind of statistical method. People use it to justify questionable decisions they might have wanted to make anyhow, but some of these decisions (especially those involving IP) would have been avoided if the tech could reasonably manage it. For instance, everybody with a large enough corpus of stock photos they own are trying to created siloed GANs with clear licensing (the problem being there are only a couple players who can do it). Similarly, I've heard from insiders that 2 or 3 of the big 5 publishers are doing the same on the LLM side (unclear how seriously).GANs and LLMs are unpredictable in ways that those using them don't want, and that unpredictability stems in part from the difficulty of even *vetting* that much training data.It's theoretically possible (if not typically financially feasible) to license corpus data so you own the model outright & pay humans to vet the data in order to omit certain kinds of things. Moreso than the substantial resources this would require, it'd require a bit of foresight -- which might instead lead people to use the existing, mature technologies that do the things they want done. With only a few (very fringe) exceptions, commercial use of GANs and LLMs is not the product of rational long-term planning but of hype; the bubble will pop pretty soon.
       
 (DIR) Post #Addw6EeWjEFibXMozo by herrold@dice.camp
       2024-01-08T17:38:59Z
       
       0 likes, 0 repeats
       
       @Wolven hard to build those pyramids with slaves
       
 (DIR) Post #AddwNDQmWXvpjq2DGS by Wolven@ourislandgeorgia.net
       2024-01-08T17:41:59Z
       
       0 likes, 0 repeats
       
       @corbin No, I really don't assume that. But I do assume that they should have sought consent form the people whose work very clearly IS under a valid copyright claim.
       
 (DIR) Post #AddwOu0rgsae5riUwi by Wolven@ourislandgeorgia.net
       2024-01-08T17:42:23Z
       
       0 likes, 0 repeats
       
       @kellogh And I'm saying "that's why they're getting sued"
       
 (DIR) Post #AddwrOULybVKij5z5k by kellogh@hachyderm.io
       2024-01-08T17:47:32Z
       
       0 likes, 0 repeats
       
       @Wolven yep. imo it was a calculated decision (unless i'm overestimating them, which i entirely possible given how that fiasco went down)
       
 (DIR) Post #AddxKu7FSwNPD0MN96 by JakeQuokkaMCM@kolektiva.social
       2024-01-08T17:52:46Z
       
       0 likes, 0 repeats
       
       @Wolven if they had to play fair this AI stuff would be as stupid as the promoters.
       
 (DIR) Post #Addzy5MdmL27ISN1EG by Wolven@ourislandgeorgia.net
       2024-01-08T18:22:19Z
       
       0 likes, 0 repeats
       
       @corbin That is… not how training of musicians or artists works, at all. Educational fair use is a specific carve out, for one thing, and the vast majority of collections of art and music on which other are trained are licensed. Even books that students use in class are (understood to be) either individually bought, or licensed to the library for student access. And ultimately, the point here is that if I tell you I want my work used in some ways but not others, does that not matter? If I say that I don't want a massive corporation to make literal billions of dollars by scraping my work, aping my style, and then promising to put me out of work, does that not matter?Consent and dignity matter.
       
 (DIR) Post #Ade5QquB7ETWcaarx2 by Quilo@elquilosonriente.com
       2024-01-08T19:23:36Z
       
       0 likes, 0 repeats
       
       @Wolven "So. I dunno, y'all. Seems like a skill issue." 🔥The good doctor with the shaaaade 😆But completely agree, as if there weren't several ethical sources of knowledge they could have tapped.
       
 (DIR) Post #AdeC5EGWHUoINegDrc by squared99@mastodon.coffee
       2024-01-08T20:38:11Z
       
       0 likes, 0 repeats
       
       @Wolven "DRM for me but not for thee!"
       
 (DIR) Post #AdeDmATaLT0r2CpWUK by matthewmaybe@sigmoid.social
       2024-01-08T20:57:11Z
       
       0 likes, 1 repeats
       
       @Wolven there are two noteworthy experiments with training LLMs on copyright-free data: Bigcode/Huggingface Starcoder and Microsoft's Phi-1.5, both of which have yielded such surprisingly good results that it has changed how people are thinking about data quality vs. data quantity. as such I'm not sure there is even a technical basis for OpenAI's claim anymore.
       
 (DIR) Post #AdeDyi5NDBv4kPUAlM by Netux@mastodon.sdf.org
       2024-01-08T20:59:23Z
       
       0 likes, 0 repeats
       
       @Wolven maybe tech companies should work to get copyright back down to 14 years.  Would put the vast majority of stuff into public domain. Win for everyone but Disney and the record labels.
       
 (DIR) Post #AdeEsL8Tj4jrmoSqwa by samhainnight@mstdn.social
       2024-01-08T21:09:27Z
       
       0 likes, 1 repeats
       
       @Wolven Boy! There's a lot of people responding to this who are neither artists nor writers but feel the need to speak as though they understand being one. Want to use my art or writing to make money?Pay me. I created it and my time is valuable  And I reserve the right to not do business with anyone.
       
 (DIR) Post #AdeHURnlafIdNB4jaa by Wolven@ourislandgeorgia.net
       2024-01-08T21:38:41Z
       
       0 likes, 0 repeats
       
       @samhainnight Precisely this
       
 (DIR) Post #AdeHgdm8tjCbz48716 by Wolven@ourislandgeorgia.net
       2024-01-08T21:40:55Z
       
       0 likes, 0 repeats
       
       If you ever want to have some of the just silliest, most willfully ignorant arguments in your life, attack the status quo of LLM/"AI" development on linkedIn 😂
       
 (DIR) Post #AdeHtcsLYTTwxQDvwu by Wolven@ourislandgeorgia.net
       2024-01-08T21:43:04Z
       
       0 likes, 0 repeats
       
       @maddiefuzz EXACTLY
       
 (DIR) Post #AdeHxu2nfGInMa9WXA by aud@fire.asta.lgbt
       2024-01-08T21:42:47.201Z
       
       0 likes, 0 repeats
       
       @Wolven@ourislandgeorgia.net I've already spent so much time and money on planning my bank heist though.  This isn't fair!  I'm a white guy!  I deserve everything I want!(also RIP your mentions.  Some real boot lickers in here.  "Your argument must inherently assume [false] that [bunch of bullshit].  If humans were to do this, it [argument that is trivial to prove is false and would require overwhelming proof and is designed to waste your time]".It's like fucking LinkedIn all of the sudden.
       
 (DIR) Post #AdeHyIOF9BiOs2MzI0 by Wolven@ourislandgeorgia.net
       2024-01-08T21:43:50Z
       
       0 likes, 0 repeats
       
       @aud LITERALLY THIS 😂
       
 (DIR) Post #AdeHzQzYeR9jq2MmAa by lina@neuromatch.social
       2024-01-08T21:44:10Z
       
       0 likes, 0 repeats
       
       @WolvenI had a few of those at a Star Trek convention last year
       
 (DIR) Post #AdeI7vo5xu7Zj63Yy8 by Wolven@ourislandgeorgia.net
       2024-01-08T21:45:40Z
       
       0 likes, 1 repeats
       
       People: There are literally trillions of lines of human text available for free in the public domain, to say nothing of the living present day authors who would LOVE to be part of an "AI" project if they were a) allowed to consent to it and b) fairly compensated FOR it, so to claim that these companies would have to endure too much inconvenience without scraping copyrighted works is just absurd on its face.This argument is like Tonya Harding thinking to herself that it's impossible for her to be the greatest ice skater of all time and inspire a generation without paying someone to hit that pesky Nancy Kerrigan in the knee with a pipe! There's just no other choice!Silliness.
       
 (DIR) Post #AdeIIIXgI63D8OK29g by paninid@mastodon.world
       2024-01-08T21:46:47Z
       
       0 likes, 0 repeats
       
       @Wolven There is an entire generation that doesn’t grok this reference and I am here for it.
       
 (DIR) Post #AdeImedQiOtuLDUCP2 by dalias@hachyderm.io
       2024-01-08T21:53:16Z
       
       0 likes, 0 repeats
       
       @Wolven The people who made the decision to do it the way they did *know* this isn't a profitable business and don't care about the litigation. They've already planned their exits and have no personal liability for crimes the businesses committed. But they needed to move fast to dupe investors into pouring money in. 🙃
       
 (DIR) Post #AdeKljpg6ZuA50uERU by JonnyT@mastodon.me.uk
       2024-01-08T22:15:29Z
       
       0 likes, 0 repeats
       
       @Wolven @samhainnight There's also a few who very clearly don't know what is and isn't permitted by Copyright, either (and that something that's permissible in one country may not be permissible in another). Though it isn't a trivial subject to get your head around, so that isn't too surprising I guess (having been embroiled in a dispute I'm in the "I know enough about it to know I really do not know enough about it... where's our legal expert" camp)
       
 (DIR) Post #AdeUsDsQ0ilQByckjo by Wolven@ourislandgeorgia.net
       2024-01-08T23:45:48Z
       
       0 likes, 0 repeats
       
       @corbin Yikes. There are WAY too many misapprehensions of my point and assumptions about my beliefs based on this one particular mastodon exchange and I have neither the time no inclination to lay out all of my positions and arguments to a stranger on the internet, so I'm  gonna just… Nope on out of this conversation. Have a good one.
       
 (DIR) Post #AdeVZHQ2iSX7d9snCK by bwaber@hci.social
       2024-01-08T22:42:09Z
       
       0 likes, 1 repeats
       
       @Wolven I'm still waiting for the hate mail to come in from my HBR piece today https://hbr.org/2024/01/is-genais-impact-on-productivity-overblown?ab=HP-hero-featured-text-1
       
 (DIR) Post #AdeXXogiFgs88p6sPQ by Wolven@ourislandgeorgia.net
       2024-01-08T22:45:46Z
       
       0 likes, 0 repeats
       
       @bwaber Give it until tomorrow afternoon, and then I guarantee it will 😂
       
 (DIR) Post #AdeYpmhsSNlDABmoZE by bwaber@hci.social
       2024-01-08T22:58:23Z
       
       0 likes, 0 repeats
       
       @Wolven Another piece I have coming up with an economic historian is about the issues with hagiography, and we explicitly point out problems with Steve Jobs, Elon Musk, and Adam Smith. I figure that will get a pretty heroic amount of hate mail 😅
       
 (DIR) Post #AdeZJ3SjEr5HWneqRM by swelljoe@mas.to
       2024-01-09T00:58:26Z
       
       0 likes, 0 repeats
       
       @Wolven did she, though? Maybe I'm misremembering, but I thought it was her skeevy abusive husband's plan, and her only crime was participating in the cover-up after the fact. She had a reasonable shot at being the best skater in the world, even with her strongest competitor still skating. Stupid to risk everything on crime, though it's not like she had any good influences or support from decent friends or family. I recall feeling sorry for her anyway...
       
 (DIR) Post #Adeb9uDiQnFUAMOuBM by gulovsen@mastodon.social
       2024-01-08T23:33:18Z
       
       0 likes, 0 repeats
       
       @Wolven It's true! 😂 Which isn't surprising considering the number of people who have been posting nothing but positive takes on LLM/"AI" there for months (or is it years now? 🤔) because all they use LinkedIn for is recycling the Tech Bro messaging du jour so by defending it & arguing with you they hope maybe Tech Bro Senpai will notice them or they'll finally get the attention of someone willing to invest in their shower thought AI startup.
       
 (DIR) Post #Adeiv5dmUHLSsL2TAW by CatHat@mstdn.party
       2024-01-09T02:46:11Z
       
       0 likes, 0 repeats
       
       @Wolven ouch i feel OLD.  I remember when that mess unfolded.   I thought at the time that the only person who would cheat to win a relatively fair contest is someone who knows they're a loser.   Nothing has changed my mind since then
       
 (DIR) Post #AdekmnH95Hu7Vbcgt6 by zakalwe@plasmatrap.com
       2024-01-09T02:58:42.586Z
       
       0 likes, 0 repeats
       
       @Wolven@ourislandgeorgia.net Would it really have been SO HARD to train them on a corpus consisting solely of public-domain information?
       
 (DIR) Post #AdekmpIdYiHlmoKuUS by Wolven@ourislandgeorgia.net
       2024-01-09T03:07:01Z
       
       0 likes, 0 repeats
       
       @zakalwe No it very definitely would not have
       
 (DIR) Post #Adeuiw9B9othPGhEbA by raytraced@pnw.zone
       2024-01-09T04:58:24Z
       
       0 likes, 0 repeats
       
       @Wolven great pull. 👍🏽
       
 (DIR) Post #AdffcE8CachT0CXeKG by jCarttarBrooke@mastodon.social
       2024-01-09T13:43:48Z
       
       0 likes, 0 repeats
       
       @Wolven I don't have the copyright, but I believe possession is still 90% of the law. However, good statistics are behind a {~%|!# pay Wall..  :-[
       
 (DIR) Post #Adg5yPOO02cf6gLiee by fifilamoura@eldritch.cafe
       2024-01-09T18:39:08Z
       
       0 likes, 0 repeats
       
       @Wolven @kellogh  💯 It never ceases to surprise me how many people underestimate our individual and collective creativity and urge to play/learn because they buy into bullshit ideas about how people only do things for money or personal gain. It reveals how little they actually understand about humans as a species (and other living creatures), innovation and creativity. I suspect many don't even recognize creativity, which is why they keep mistaking rentier layers that generate profit for innovation and creativity.
       
 (DIR) Post #Adg6y14cW4aZdaTsC8 by Aviva_Gary@noc.social
       2024-01-09T18:50:33Z
       
       0 likes, 0 repeats
       
       @fifilamoura @Wolven @kellogh This ^But I would also add it tells on themselves an awful lot too... 👀