Post AxpK7dYIBpgjOlCw9Q by dave@podcastindex.social
(DIR) More posts by dave@podcastindex.social
(DIR) Post #AxpK7bLSOaEUYNBero by js@podcastindex.social
2025-09-01T16:51:13Z
0 likes, 0 repeats
@dave is there something going on with the crawlers?The feeds updated in the last N days numbers are really slumping, and although it typically slumps a bit in the summer, never to this degree
(DIR) Post #AxpK7cQoMA6rvGhRqq by js@podcastindex.social
2025-09-02T18:54:49Z
0 likes, 0 repeats
@dave continued unprecedented drop
(DIR) Post #AxpK7dYIBpgjOlCw9Q by dave@podcastindex.social
2025-09-03T10:52:23Z
0 likes, 0 repeats
@js Looking
(DIR) Post #AxpLr2eZoINOEaMTwm by dave@podcastindex.social
2025-09-03T11:11:49Z
0 likes, 0 repeats
@js Found the problem. Fixing...
(DIR) Post #AxpMY5AOLH6WTrlYwq by dave@podcastindex.social
2025-09-03T11:19:36Z
0 likes, 0 repeats
@js Fixed. Thanks for catching that and alerting me. A bad feed had aggregator 1 stuck in a loop. Check out the itunes:author tag in this feed:https://podcastindex.org/podcast/7174450I was not truncating itunes:author properly so this one (over 2000 characters) borked the SQL insert.
(DIR) Post #AxpYoOgsidGJ7IVcci by js@podcastindex.social
2025-09-03T13:37:02Z
0 likes, 0 repeats
@dave thanks! any idea how long it was borked?
(DIR) Post #Axpk3mQDrrCQYdFqb2 by dave@podcastindex.social
2025-09-03T15:43:02Z
0 likes, 0 repeats
@js I don't. My guess would be some time in August based on it being agg1, which is a high volume crawler. agg0 and agg1 do as much volume individually as the other 8 crawlers do combined. That precipitous drop on your chart is probably the trigger date.
(DIR) Post #AxpkD700jPGQz0Racq by dave@podcastindex.social
2025-09-03T15:44:46Z
0 likes, 0 repeats
@js The PI ID's are front loaded with the more popular podcasts, meaning the lower the PI ID, the more likely it is to be a bigger, longer lived show. Since the aggregators split the Index evenly across themselves there is a natural taper down in update frequency as the Index ID's get higher.
(DIR) Post #AxpkJCG05z3r1lpBOC by dave@podcastindex.social
2025-09-03T15:45:52Z
0 likes, 0 repeats
@js Net result is crawler problems on lower numbered aggregators are more impactful. agg7 == small issue. agg0 == big problem.
(DIR) Post #AxpogM5K7butz882q0 by js@podcastindex.social
2025-09-03T16:34:52Z
0 likes, 0 repeats
@dave got it, thanks - slashdot commenter wants to know why not split the index work by the _hash_ of the id instead of the id itself? like if you have 8 crawlers, take the first hex char of the sha hash of the pi id and give 0-1 to 0, 2-3, to 1, ... e-f to 7things look to be catching up. a live look at my internal new episodes found queue right now:https://www.youtube.com/watch?v=c9vggKd9VPs
(DIR) Post #AxrUWCnuAD97XOmfgG by dave@podcastindex.social
2025-09-04T11:58:21Z
0 likes, 0 repeats
@js Does sha1 give enough of a random distribution to even out the loads? I guess that’s the whole point of a good has isn’t it. The more evenly distributed the output, the less predictable it is. 🤔
(DIR) Post #AxrV6gwHbDt9QQ6Ime by dave@podcastindex.social
2025-09-04T12:04:58Z
0 likes, 0 repeats
@js Does the first byte of the hash output share the same entropy as the full output? I mean, does being evenly distributed across 100 trillion numbers equate to even distribution across 0-9?
(DIR) Post #AxrVBorJa4rfR6DpBo by js@podcastindex.social
2025-09-04T12:05:52Z
0 likes, 0 repeats
@dave yes hash functions are the closest thing we have to magic - every time I do this load-balancing/splitting in projects I'm amazed at how well it works. So many things use this property under the hood as well: git, etc
(DIR) Post #AxrVKf5nGI3fkhD7q4 by js@podcastindex.social
2025-09-04T12:07:29Z
0 likes, 0 repeats
@dave https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-last-bits-from-a-sha-256-hash
(DIR) Post #AxrVMojQOVmmE7m9Zo by DamonHD@mastodon.social
2025-09-04T12:07:52Z
0 likes, 0 repeats
@dave @js First digit after conversion to decimal frim fixed-length binary is almost certainly NOT evenly weighted, assuming all decimal values are padded to a fixed length.You could do a test uniformly generating some samples across the input range and sample the top output digit. n=1000 would give you a decent insight IMHO.