Post B0qMCbZoc7rAibAfBo by okennedy@discuss.systems
(DIR) More posts by okennedy@discuss.systems
(DIR) Post #B0myBnu4JkSyHjqjoW by ricci@discuss.systems
2025-12-01T01:13:34Z
0 likes, 0 repeats
The strategy of "just stick five hundred thousand json files in a directory for processing" is surprisingly ... just fine in 2025. Sometimes being lazy is pretty okay actually.
(DIR) Post #B0mySICVRMT8ssKtRQ by ricci@discuss.systems
2025-12-01T01:16:31Z
0 likes, 0 repeats
I can process em all in python (admittedly, parallelized) in 45 seconds
(DIR) Post #B0mybmGP2KxujEgJ9M by ricci@discuss.systems
2025-12-01T01:18:14Z
0 likes, 0 repeats
@SnoopJ extremely truesee also: kubernetes
(DIR) Post #B0myh9KqgTNtjSK3oO by faraiwe@mstdn.social
2025-12-01T01:19:11Z
0 likes, 0 repeats
@ricci 😱
(DIR) Post #B0mytX7c4X84yo2Ko4 by ricci@discuss.systems
2025-12-01T01:21:27Z
0 likes, 0 repeats
@josh0 I have a VM consuming two entire copies of the bluesky firehose (okay not as big as twitters but) on a machine in my living room.It's also the same machine that's processing all these json files.Oh and running several hundred torrents comprising a few hundred TB of data for #sciop No sweat, still has a couple dozen CPU threads doing nothing
(DIR) Post #B0myvY1vfLr6NNL3nk by ricci@discuss.systems
2025-12-01T01:21:51Z
0 likes, 0 repeats
@faraiwe zfs is pretty rad
(DIR) Post #B0myzDvmKuPRCKDMrw by acdha@code4lib.social
2025-12-01T01:22:28Z
0 likes, 0 repeats
@ricci the other one which reminds me how nice the switch to SSDs was is using Git to manage them. Sure, there are more sophisticated tools but I’ve never lost data this way and that’s not true of some alternatives.
(DIR) Post #B0n3ERflvUfDygF1hg by snorerot13@mstdn.social
2025-12-01T02:10:01Z
0 likes, 0 repeats
@ricci"Parallelized"
(DIR) Post #B0nD3zlTpcZi4JZDRg by ricci@discuss.systems
2025-12-01T04:00:11Z
0 likes, 0 repeats
@JustinDerrick zfs on FreeBSD. `ls -l` takes about 5 seconds. `ls -1` takes only 2.4s.We'll see if I still think this is an okay idea once this grows to 5M files, which is where I expect it to end up
(DIR) Post #B0pUp5sF1UHwA9h3Oi by ricci@discuss.systems
2025-12-02T06:28:37Z
0 likes, 0 repeats
(DIR) Post #B0pUviSEUih1YVgaQ4 by scott@sfba.social
2025-12-02T06:29:50Z
0 likes, 0 repeats
@ricci 😂
(DIR) Post #B0pUzPWko5NS0qYvlA by dev@discuss.systems
2025-12-02T06:30:30Z
0 likes, 0 repeats
@ricci wait you found my mongo db instance
(DIR) Post #B0pV56Lz0VVYHfEoTo by ricci@discuss.systems
2025-12-02T06:31:31Z
0 likes, 0 repeats
@dev oh yeah that might have been a good way to store them
(DIR) Post #B0pVRbwOFOVycY05Oy by dev@discuss.systems
2025-12-02T06:35:30Z
0 likes, 0 repeats
@ricci wait if you want to store lots of semi structured data why not arrow or parquet
(DIR) Post #B0pVVvwfoRyxp2C63E by ricci@discuss.systems
2025-12-02T06:36:23Z
0 likes, 0 repeats
@dev a what now?
(DIR) Post #B0pVa3qqMPQhV5R3VQ by ricci@discuss.systems
2025-12-02T06:37:08Z
0 likes, 0 repeats
@dev I don't understand what archery or parakeets have to do with any of this
(DIR) Post #B0pVeS5SC8jZpAMZ2e by dev@discuss.systems
2025-12-02T06:37:55Z
0 likes, 0 repeats
@ricci they’re good Apache eggs
(DIR) Post #B0pVrPKRlEr72CYNOa by dev@discuss.systems
2025-12-02T06:38:52Z
0 likes, 0 repeats
@ricci as someone who lives near Telegraph hill, I should learn archery, it will solve my parakeet problem
(DIR) Post #B0pVrQmqL9AFYYqZIO by ricci@discuss.systems
2025-12-02T06:40:14Z
0 likes, 0 repeats
@dev oh okay then you can dump them in my data lake
(DIR) Post #B0pW3IvpN3fYfYi8Tw by ives@mstdn.social
2025-12-02T06:42:18Z
0 likes, 0 repeats
@ricci We had a customer once with about one billion files in... one billion directories. 🙈
(DIR) Post #B0pWJCtnZHuUanLRD6 by ricci@discuss.systems
2025-12-02T06:45:14Z
0 likes, 0 repeats
@ives squad goals
(DIR) Post #B0pYHLaqTJUet1ijZ2 by jadp@mastodon.social
2025-12-02T07:07:20Z
0 likes, 0 repeats
@ricci yes
(DIR) Post #B0qFIJveRROw2jA9bc by ricci@discuss.systems
2025-12-02T15:09:20Z
0 likes, 0 repeats
Lawful neutral: The filesystem is a database
(DIR) Post #B0qHFCo0fXgczSENlI by gnomon@mastodon.social
2025-12-02T15:31:09Z
0 likes, 0 repeats
@ricci it even has indices¹ ² ³!!¹: https://www.nongnu.org/ext2-doc/ext2.html#contrib-performance²: https://www.kernel.org/doc/ols/2002/ols2002-pages-425-438.pdf³: I know you're using BSD and thus likely not ext[2-4], but I expect that whatever filesystem you're using there has a similar optimization
(DIR) Post #B0qHT4pr6C9yKDi02K by ricci@discuss.systems
2025-12-02T15:33:42Z
0 likes, 0 repeats
@gnomon zfs, fwiw
(DIR) Post #B0qHwWiCX90xBGAfcO by mhkohne@mastodon.social
2025-12-02T15:39:00Z
0 likes, 0 repeats
@ricci I mean, they never shipped it, but the WinFS thing from Microsoft was one version of making that explicit.
(DIR) Post #B0qMCbZoc7rAibAfBo by okennedy@discuss.systems
2025-12-02T16:26:45Z
0 likes, 0 repeats
@ricci Don't knock it if it works...
(DIR) Post #B0qRQ8YzmSuKYo3kIa by mdione@en.osm.town
2025-12-02T17:25:04Z
0 likes, 0 repeats
@ricci that's what Hans Reiser wanted to do with reiserfs4. A shame he was also a psycopath :(
(DIR) Post #B0qWRK3Z2DvGGXk4zA by ricci@discuss.systems
2025-12-02T18:21:22Z
0 likes, 0 repeats
@josh0 turtles I think
(DIR) Post #B0qwWdlZmMadeQWbT6 by ricci@discuss.systems
2025-12-02T23:13:45Z
0 likes, 0 repeats
@josh0 elephants