Post AUomvugtntSv5ZFZTM by intelwire@mastodon.social
(DIR) More posts by intelwire@mastodon.social
(DIR) Post #AUomvt1224NhtQecZk by intelwire@mastodon.social
2023-04-19T16:47:16Z
0 likes, 1 repeats
When VDARE is just the tip of the iceberg in your training data, well, you have a problem https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
(DIR) Post #AUomvugtntSv5ZFZTM by intelwire@mastodon.social
2023-04-19T16:55:46Z
0 likes, 0 repeats
Google’s AI training data includes text scraped from VDARE, Stormfront, Kiwifarms, threepercentpatriots and 4chan. In case you were wondering. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/
(DIR) Post #AUomvwFg04sVviWrJo by intelwire@mastodon.social
2023-04-19T17:56:50Z
0 likes, 0 repeats
But wait, there's more! The neo-Nazi VNN forum, Ben Klassen's Creativity Alliance, the massive Christian Identity archive at Christogenea and at least two other Christian Identity sites, White supremacist media giant Red Ice, virulent anti-gov site Bunkerville, Hunter Wallace's Occidental Dissent, David Duke's website, a major incel website. Many of this sites include vast collections of PDF books, so they probably punch above their weight in the dataset.
(DIR) Post #AUomvxqw32HAtYy824 by intelwire@mastodon.social
2023-04-19T18:06:52Z
0 likes, 0 repeats
Even a lackluster effort to clean the dataset would catch David Duke, Stormfront, Red Ice, and the Daily Stormer, which is, yes, also part of the training set according to the WaPo search tool included with the story. The only thing I thought to check that was missing from the dataset was 8kun, which is a pretty fucking low bar.
(DIR) Post #AUomvzJKcwaJPvGJvs by intelwire@mastodon.social
2023-04-19T18:19:50Z
0 likes, 0 repeats
When I say "punching above their weight," Christogenea, a relatively obscure Christian Identity site, ranked 7,727 out of 15 million sites. Stormfront ranked 27,505, Red Ice 157,582, David Duke 229,432. all easily within the top 5 percent of sampled domains.
(DIR) Post #AUonYsy88knH2dtRku by warmbeverageenjoyer@freespeechextremist.com
2023-04-19T18:36:43.674784Z
0 likes, 0 repeats
@intelwire @intelwire there are like 4,000 websites about jewish things, including jewishworldreview ranked at #366 and 6 total ranked above christogenea. a handful of sites with islam in the name. loads and loads of scientific publication sites, every kind of journalist site (from reputable to disreputable), etc.why exactly does it matter that they are casting a wide net? especially when all of the publicly accessible ai are neutered to avoid saying anything remotely politically incorrect