Post B1yp2SUcIXgrqyCecC by gsuberland@chaos.social
(DIR) More posts by gsuberland@chaos.social
(DIR) Post #B1yp2SUcIXgrqyCecC by gsuberland@chaos.social
2026-01-05T16:09:51Z
0 likes, 0 repeats
at some point I'm going to have to write up a blog post on my silly "make a computer vision model that can find computers in TV and movies" project, and when I do I'm going to have to cover some very funny things about the datasets for animated images (spoiler: sooooo much porn)
(DIR) Post #B1yp2Tut0MIWGjV9CS by gsuberland@chaos.social
2026-01-05T16:18:09Z
0 likes, 1 repeats
boorus constitute by far the largest collection of tagged anime-style images on the planet, and people are remarkably rigorous and judicious with the tagging, so it's kinda excellent as an image corpus if you're not a prude (although using them for commercial purposes or training generative models makes you a thief and a supreme fuckwad)but also you have to deal with a terabyte of VERY diverse porn. rule34 truly knows no bounds.
(DIR) Post #B1yp343IeYQ0LFiNJQ by ozzelot@mstdn.social
2026-01-05T16:19:48Z
0 likes, 0 repeats
@gsuberlandOnly a terabyte?
(DIR) Post #B1ypSGdoz5kGcqob68 by gsuberland@chaos.social
2026-01-05T16:24:19Z
0 likes, 0 repeats
@ozzelot of still images, with size limits applied and re-encoded to a fixed JPEG quality. it's enough source material that filesystem overhead turns into a major factor and I had to invent a custom archive format to avoid it.