Post AmKldD6GuGEZHXG1Fg by tim_lavoie@cosocial.ca
(DIR) More posts by tim_lavoie@cosocial.ca
(DIR) Post #AmJgLUboNQwRRVmYi0 by j@noise.j-w.au
2024-09-24T02:29:26Z
1 likes, 3 repeats
Giant Corporations™ are scraping my little git server to feed their ever-hungry, planet-destroying plagiarism machines.So now, instead of getting my code, they get a 10GB treat.Fucking THIEVES.
(DIR) Post #AmKW9GI6ymkFB83aAy by tim_lavoie@cosocial.ca
2024-09-24T05:06:10Z
0 likes, 0 repeats
@j Zip bombs?
(DIR) Post #AmKW9H8voOznowllVQ by stunder@mastodon.sdf.org
2024-09-24T13:37:51Z
0 likes, 0 repeats
@tim_lavoie @j I came here just to suggest this. I would imagine they pull anything it can and would try to open a zip file locally. At that point they have an issue.
(DIR) Post #AmKWNaBlnTJ2wbv3Zo by Drezil@toot.kif.rocks
2024-09-24T08:16:53Z
0 likes, 1 repeats
@Hex @j Hey! I did my Master in such things.English (and all other languages) are basically distributed by zipfs law (https://en.wikipedia.org/wiki/Zipf's_law)So you just sample from the "opposite" distribution. I.e. draw random from a distribution of words weighted by their rank and generate random text with it.This should then be an example that should basically "unlearn" every knowledge gained about the used words.Better: Don't take words & their frequency, but just the embeddings (https://openai.com/index/new-embedding-models-and-api-updates/).
(DIR) Post #AmKWWBV0xgOoye1S4W by jlecour@mastodon.evolix.org
2024-09-24T08:23:57Z
0 likes, 1 repeats
@j @nixCraft As much as I love the idea of dumping shit in their garden it will cost a lot to Hetzner and transit/storage all the way down. A simple http header (418, 410 or 403) would achieve the same “fuck off” but at a much lower cost. Just my 2 cts.
(DIR) Post #AmKldD6GuGEZHXG1Fg by tim_lavoie@cosocial.ca
2024-09-24T16:31:23Z
0 likes, 0 repeats
@stunder @j Sure, they'd need some sort of mechanism to limit the blast radius of time and space. A separate process could use something like ulimit, but I'd need more coffee before considering an in-process mechanism.
(DIR) Post #AmKuivXfzypk9SsPFA by anna@hexile.witches.live
2024-09-24T03:02:04.543853Z
1 likes, 0 repeats
@j i hope it's a bunch of poisoned ai generated shit to feed back into the model and degrade it
(DIR) Post #AmKulAo8DHaU4ke0gq by Purple@woof.tech
2024-09-24T10:14:58Z
1 likes, 0 repeats
@j Another good trick is when they try to request images, to feed them poisoned images that hurt their dataset lol A 10GB file will likely never been downloaded in full, but a poisoned image will very likely make it into the model, ruining their efforts ;)
(DIR) Post #AmKun5LXUMQLiXBM92 by dimin@mastodon.social
2024-09-24T11:41:55Z
1 likes, 0 repeats
@j I at one point thought of just drip feeding these with /dev/random at just enough of a rate to make them not disconnect. Wasting minimal bandwidth and maximum time on their end! Bonus points if the start looks like a delicious html page but the contents will just keep coming slowly until they timeout.