[HN Gopher] The Bluesky Dictionary
___________________________________________________________________
The Bluesky Dictionary
Author : gaws
Score : 48 points
Date : 2025-08-06 20:43 UTC (2 hours ago)
(HTM) web link (www.avibagla.com)
(TXT) w3m dump (www.avibagla.com)
| neaden wrote:
| Is this not working or am I missing something, it just shows as
| seeing 0 words for me. Firefox on a PC.
| SirFatty wrote:
| Same... maybe you need a Bluesky account, which I don't have.
| gpm wrote:
| It doesn't... I can open it in a private browsing window.
| GalaxyNova wrote:
| It's working fine for me on Firefox
| accrual wrote:
| You may need to allow scripts from the domain avibagla.com, it
| shows 0 when the scripts are blocked.
| zem wrote:
| ugh, it ought to be building the results on the server and
| serving up static pages.
| AgentME wrote:
| For me it took a minute to start loading data and switch from
| just showing 0.
| GalaxyNova wrote:
| fascinating! I think it's really cool that this is possible, and
| at the same time kine of sad that the norm is slowly moving
| towards more locked-down APIs.
| timeon wrote:
| > slowly moving towards
|
| Depends what we accept as norm.
| 75345d4c wrote:
| I just saw it indexed "eluvium," but the post was referring to a
| band with that same name
| Kye wrote:
| GeologySky will get to it soon enough.
| atlgator wrote:
| I checked out the author's other projects and this is common
| issue. For example, he has a "lean checker" for bluesky that
| claims it is right-leaning simply because of all the people
| saying "That's right," "He was right," etc. None of the
| supposed right-leaning posts were actually conservative in
| nature. They just used to word right to mean correct.
| avibagla1 wrote:
| one, thank you for checking my website. two, that is the
| joke, 100% - at the time people kept talking about how "left
| leaning" bsky was and that idea came to mind
| wantlotsofcurry wrote:
| I'm very curious as to how this works in the backend. I realize
| it uses Bluesky's firehose to get the posts, but I'm more curious
| on how it's checking whether a post contains any of the available
| words. Any guesses?
| bangaladore wrote:
| Maybe I'm being naive, but with only ~275k words to check
| against, this doesn't seem like a particularly hard problem.
| Ingest post, split by words, check each word via some db,
| hashmap, etc... and update metadata.
| gpm wrote:
| Probably just a big hashtable mapping word -> the number of
| times it's been seen, and another hashset of all the words it
| hasn't seen. When a post comes in you hash all the words in it
| and look them up in the hashtable, increment it, and if the old
| value was 0 remove it from the hash set.
|
| 250k words at a generous 100 bytes per word is only 25MB of
| memory...
| f311a wrote:
| You can probably fit all words under 10-15MB of memory, but
| memory optimisations are not even needed for 250k words...
|
| Trie data structures are memory-efficient for storing such
| dictionaries (2-4x better than hashmaps). Although not as fast
| as hashmaps for retrieving items. You can hash the top 1k of
| the most common words and check the rest using a trie.
|
| The most CPU-intensive task here is text tokenizing, but there
| are a ton of optimized options developed by orgs that work on
| LLMs.
| stwrzn wrote:
| I very much hope that the backend uses one of the bluesky
| jetstream endpoints. When you only subscribe to new posts, it
| provides a stream of around 20mbit/s last time I checked, while
| the firehose was ~200mbit/s.
| avibagla1 wrote:
| yes it does!
| avibagla1 wrote:
| Hey! this is my site - it's not all that complex, i'm just
| using a sqlite db with two tables - one for stats, the other
| for all the words that's just word | count | first use | last
| use | post.
|
| I... did not expect this to be so popular
| spullara wrote:
| I did this against a pretty large tweet archive and got hits on
| about 125k of the words in the unix dictionary.
| pona-a wrote:
| For a moment I thought it would be an AT-Proto based Urban
| Dictionary clone.
| tough wrote:
| Words We Haven't Seen
|
| - Search unseen words
|
| made me chuckle
___________________________________________________________________
(page generated 2025-08-06 23:00 UTC)