[HN Gopher] History of Hacker News Search from 2007 to 2024
       ___________________________________________________________________
        
       History of Hacker News Search from 2007 to 2024
        
       Author : skeptrune
       Score  : 63 points
       Date   : 2024-08-12 20:27 UTC (2 hours ago)
        
 (HTM) web link (trieve.ai)
 (TXT) w3m dump (trieve.ai)
        
       | cactusplant7374 wrote:
       | When is the launch?
        
         | skeptrune wrote:
         | Not sure yet! Fingers crossed it'll be this week or next.
        
       | swyx wrote:
       | i want to publicly thank Algolia for providing an excellent HN
       | search for so long. i was on a call with Linus Lee and we both
       | were referencing something on HN and i started pulling it up and
       | he said "i know exactly what website you're on" without seeing my
       | screen and it was of course the Algolia HN search. unbelievable
       | mind sync.
       | 
       | Idk if it can be replaced (i guess i could do with semantic
       | search + content crawling to start?), but even if it is replaced,
       | Algolia will always have a special place in my heart for doing
       | such a great job for free. thank you whoever worked on it
       | (Algolians - is there a behind the scenes writeup somewhere?)
        
         | squeaky-clean wrote:
         | It is free, but Algolia is a YCombinator backed company (YC
         | W14) so for them it's probably very useful as a sort of low-
         | stakes phase-1-prod environment. Basically a win-win-win.
        
         | davisr wrote:
         | Now, if only it could be used without necessitating
         | JavaScript...
        
       | spasibot wrote:
       | I wonder if they'll let us search flagged and dead posts and
       | comments.
       | 
       | I remember reading some insightful exchanges back in the day that
       | got flagged because of being a controversial topic that other
       | users didn't like.
       | 
       | No way to find them now, even knowing some keywords and
       | approximate month and year.
        
         | yorwba wrote:
         | They're using the firebase API,
         | https://github.com/devflowinc/trieve-hn-discovery/tree/main/...
         | so no showdead, I think.
        
         | greenchair wrote:
         | contributes to the echo chamber
        
         | dang wrote:
         | We had to exclude [dead] and eventually even just [flagged]
         | posts from the public API because many third-party clients and
         | sites were displaying them as if they were regular posts. For
         | the ever-fragile HN ecosystem, that is catastrophic. We would
         | get angry emails saying "how can your _expletive_ site possibly
         | condone such _expletive_ _expletive_ comments as  <link>" ...
         | and then it would turn out that <link> was a post by some
         | account that had been banned for years.
         | 
         | It's fine if users turn 'showdead' on in their profile to read
         | everything--just please remember that by doing that, you're
         | subscribing to various bottoms of various barrels. But it's
         | definitely not ok when people browse HN with some app we have
         | nothing to do with, run into horrible things, understandably
         | are outraged and then forever have their view of HN imprinted.
         | 
         | IMO this issue is existential for HN. We've spent years and so
         | much energy trying to find a balance between internet openness
         | and human decency, a task which oscillates between barely-
         | possible and simply-doomed, so the idea that anybody anywhere
         | sees anything labeled "Hacker News" that pours all the toxic
         | waste back into the commons is physically painful to me. Much
         | as I dislike the idea of restricting anyone's curiosity about
         | the entire corpus of what gets posted, I don't see what choice
         | we have.
        
           | spasibot wrote:
           | Thank you dang, I very much appreciate you taking the time to
           | explain.
           | 
           | I was going to suggest maybe the public API could have a
           | "showdead" flag too but I guess that too easily enables the
           | problem you're trying to prevent? As in an enterprising app
           | developer could turn the "showdead" tap to "yes" with every
           | request and then the waste gushes out once more.
        
           | dredmorbius wrote:
           | I can appreciate that concern and see it even with flagging /
           | dead / killed posts and submissions.
           | 
           | I've had my own concerns about HN's moderation, both excesses
           | and insufficiencies. When I've done occasional polls about
           | what people's issues are about HN I'm very often pointed to
           | comments which now show as flagged. I'd found a few which
           | _hadn 't_ been flagged and forwarded those to dang, who
           | (admittedly long after the fact) flagged them. As dang's said
           | many time, moderators don't see everything, most moderation
           | is by members, and mods step in relatively rarely.
           | 
           | Based on Whaly's 2021 analysis and looking at dang's own
           | comment post history (via Algolia), HN nets roughly 4 million
           | comments/year and 400k submissions, with about 150k active
           | members. Over his ten years as moderator dang's averaged
           | about 20 comments per day, though there's a great deal more
           | moderation occurring (some automated, some member-based, some
           | manual but not noted with comments, which tend to be reserved
           | for established accounts).
           | 
           | My read is that HN _mostly_ tends toward its stated goals
           | and, frankly, good-netizen behaviour. It _does_ have a
           | pronounced status-quo bias, though it seems to be self-aware
           | on this point. I 've a few further concerns I'm still
           | thinking through.
           | 
           | The problem with an overly-open archive is that this makes
           | possible misconstrued assertions about what HN does or
           | doesn't tolerate. An open-access archive and third-party apps
           | which don't reflect moderation actions, say, a third-party
           | app which explicitly _only_ showed flagged, killed, and /or
           | dead posts, comments, and users, would paint a distinctly
           | different picture of HN, and one which would greatly harm the
           | reputation of the site.
           | 
           | There are some ... possible ways around this. HN uses
           | sequentially-numbered IDs for posts and comments (both are
           | treated the same so far as I can tell). UserIDs seem to have
           | an internal representation which is similar (I've seen, for
           | example, names which change over time), but the internal
           | representation doesn't seem to be publicly exposed. If you
           | want to find my own content you'd do it with
           | "UserID=dredmorbius" and not by some numeric identifier.
           | 
           | But the numeric content ID means that a determined scraper
           | could walk (sequentially or randomly) through the entire
           | database, pull out every post and comment, and then glue
           | those back together. That's somewhat north of 40 million
           | items presently.... (There are benefits to using sparse,
           | random / arbitrary UUIDs for systems.)
        
           | saagarjha wrote:
           | I figured as such. Can I ask that you change the website
           | itself to make them visible to logged out accounts? I
           | understand exactly why you did this but I feel like if you
           | showed them collapsed by default on the single-comment page
           | and you have to actively click on a "banned" to expand them
           | you really are out of line when you complain about how Hacker
           | News hosts horrible content or whatever.
        
             | dang wrote:
             | You mean for [dead] comments? Sorry, but strong no. Logged-
             | in-with-showdead-turned-on has proven to be the correct
             | height for that gate. Anyone who wants to can easily clear
             | it, but the small amount of effort and information required
             | means that most people become core community members before
             | turning it on. If we lowered it, naive-casual readers would
             | (through no fault of their own) misunderstand what they
             | were looking at and the dynamic I just described would kick
             | in.
             | 
             | The longer I've worked on HN the more I've come to
             | appreciate PG's design of this critical aspect of the site.
             | No content is hidden from users who want to see it*, but
             | the worst is (mostly) cordoned off so it doesn't destroy
             | the community. Banned users can continue to post, but their
             | comments are autokilled, so they're cordoned off by
             | default.
             | 
             | We're often asked: why allow banned users to continue to
             | post? The answer is that if we didn't, they'd just create
             | new accounts, and then they'd be posting with unbanned
             | accounts until we caught them and banned them again: a
             | strictly worse situation. This is one aspect of PG's design
             | that took me years to appreciate and got me thinking it
             | might even be optimal.
             | 
             | The one major change we made to the original design was
             | adding 'vouching'
             | (https://news.ycombinator.com/item?id=10298512), which lets
             | the community transfer cordoned-off posts back to the
             | commons if there's nothing wrong with them. That bit has
             | worked out really well.
             | 
             | * (Except for [deleted] posts. If you see [deleted] it
             | always means that either the author deleted it or asked us
             | to do that for them.)
        
         | dredmorbius wrote:
         | It's possible to favourite such items. They'll not be visible
         | unless you're logged in, but at least you can go through your
         | fave list and find them.
         | 
         | Or, of course, bookmark them yourself for later reference.
         | 
         | I've run into this issue myself, though dang's reply (still
         | being edited as I write this) does hit on some valid points.
        
       | olalonde wrote:
       | I was looking at Algolia's website recently and it seems they
       | really went all in with the "AI" marketing/SEO.
        
         | swyx wrote:
         | "we put the AI in AIgolia"
        
       | re wrote:
       | I'm a little confused by the context and comments here. Is Trieve
       | associated with HN at all or is this an independent/third-party
       | offering? Is the Algolia search going anywhere? Will the search
       | field on HN still take me to Algolia search or is that changing?
        
         | dang wrote:
         | It's independent/thirdparty, not officially associated with HN,
         | though Trieve is YC-funded
         | (https://www.ycombinator.com/companies/trieve) and in that
         | sense there's an affiliation.
         | 
         | There's no current plan to change HN Search though I wouldn't
         | rule it out. PG often used to integrate recent YC startups into
         | HN in various ways (search as detailed by the OP, but also an
         | SSO startup at one point, a carbon-reduction startup, I forget
         | what else) as a way of giving them a boost and I could imagine
         | us doing that again. (side note: I guess the mental model in my
         | head is that earlier-stage startups are more closely bonded to
         | YC and that as they succeed and expand, there's certainly still
         | a friendly connection but the attachment becomes a bit weaker.
         | For startups that have been around 10+ years, for example, I'm
         | not sure it still makes sense to have frontpage job ads on HN.)
         | 
         | The change I'd really like to make to HN Search is to bring the
         | front-end part of it into the HN codebase so the search results
         | can be 'real' HN pages*. It's never been a priority to
         | implement that though, and the Algolia system has been fabulous
         | for a long time.
         | 
         | * Funnily enough, I made almost the same point using my pre-
         | dang account back when Algolia was just getting started:
         | https://news.ycombinator.com/item?id=7126635. Credit to pvg,
         | who misses nothing, for spotting this. The reply was from
         | Algolia cofounder ndessaigne who is now a group partner at YC!
         | He was good on his word, btw.
        
       | ChrisArchitect wrote:
       | Related:
       | 
       |  _Vote on Algolia vs. Trieve HN Dataset Blind Search Relevance
       | Poll?_
       | 
       | https://news.ycombinator.com/item?id=41172033
        
       | NKosmatos wrote:
       | Can we get an option to search for users/usernames? Or even
       | better, searching for users based on their karma? ;-)
        
         | dredmorbius wrote:
         | User-specific search is possible in Algolia using
         | "by:<username> <query>". So "by:dredmorbius privacy" will find
         | my posts or comments on privacy.
        
       | dredmorbius wrote:
       | There are a few capabiliti es lacking from Algolia which I'd
       | really like to see in a replacement:
       | 
       | - Negative search / exclusion: the ability to _exclude_ terms
       | from a search, as in  "procfs -linux", which would look for any
       | references to "procfs" which did not _also_ reference  "linux".
       | 
       | - Replies to a specific user, e.g., "by:dredmorbius
       | inreplyto:skeptrune <search terms>". I'm often looking for a
       | specific context of my own previous comments.
       | 
       | - An improved date-bounding interface. If there's one thing that
       | frustrates me about Algolia's interface, it's the GUI (and
       | syntax) for defining dates. It's cumbersome, and at least on my
       | browser, the dates are generally hard to read or invisible. Going
       | back years is especially cumbersome.
       | 
       | I'll add: Algolia _has_ been massively useful, and the fact that
       | I _can_ search HN, especially for my own content, has been a huge
       | part of the value of the site, and is worlds ahead of other
       | online platforms. (Mastodon  / the Fediverse _is_ catching up
       | here, Diaspora* 's lack of search was among my main frustrations
       | with the site and explains my absence there after more than a
       | decade of participation.)
        
         | dang wrote:
         | Algolia does do search term exclusion. Compare
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
         | and https://hn.algolia.com/?dateRange=all&page=0&prefix=false&q
         | u....
         | 
         | On the second point - HN has an undocumented endpoint
         | https://news.ycombinator.com/replies?id=skeptrune&by=dredmor...
         | but that doesn't give you search of course.
        
           | tptacek wrote:
           | _whoah_
        
             | spasibot wrote:
             | Just guessed another two undocumented endpoints
             | "deadcomments" and "deadstories", these take you to a
             | special admin login page if you're logged out, or say
             | "Unknown." if logged in:
             | 
             | https://news.ycombinator.com/deadcomments
             | 
             | https://news.ycombinator.com/deadstories
             | 
             | And the endpoint "flagged", which is empty for me:
             | 
             | https://news.ycombinator.com/flagged
             | 
             | But if you add a username, e.g. "?id=dang", it says "Can't
             | display that." instead:
             | 
             | https://news.ycombinator.com/flagged?id=dang
             | 
             | Interesting to stumble upon these even if they do nothing
             | for a non-admin user!
        
               | tptacek wrote:
               | `/flagged` is the list of stories you've flagged.
        
               | spasibot wrote:
               | That makes sense, thank you.
        
           | dredmorbius wrote:
           | Search term exclusion: TIL! I swear I've tried that w/o luck
           | before. Possibly was confounding that with OR pairing,
           | usually given as "(termA|termB)". I'm pretty sure that
           | doesn't work as expected (and just tested it). It's also
           | annoyingly absent from DDG search, which is out of scope for
           | here.
           | 
           | I'd been made aware of the undocumented endpoint but didn't
           | want to spill the beans ;-) It's apparently expensive. OK to
           | run manually, but don't script it.
        
         | skeptrune wrote:
         | >Replies to a specific user
         | 
         | Will definitely make sure to add that to our search before
         | launch! That is a really good idea.
         | 
         | >Improved date-bounding interface
         | 
         | Note taken :)
         | 
         | We also have negated terms which will work the same way.
        
       | simonw wrote:
       | On the topic of Hacker News search... a useful trick that not
       | enough people know is that you can take the ID of a story and use
       | search to return all comments ordered by most recent first -
       | great for keeping up with what's new in a specific conversation.
       | 
       | Eg for this thread the most recent comments can be found here:
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
       | 
       | I built an Observable notebook to save me from having to manually
       | construct those searches here:
       | https://observablehq.com/@simonw/hacker-news-homepage
        
         | pogue wrote:
         | HN could really use some client software. If nothing else for
         | choosing how to read thread replies (ie: most recent, most
         | upvoted/popular, most downvoted, by replies from the OP, etc) +
         | a more advanced built in search.
        
           | skeptrune wrote:
           | > more advanced built-in search
           | 
           | Do you have any specific feature requests? I would love more
           | suggestions and ideas!
        
       | skeptrune wrote:
       | Updated the blog at the link to include PG's HN post and the
       | archive-available ycombinator.com post documenting the
       | Octopart/ThriftDB search launch.
       | 
       | Commit here - https://github.com/devflowinc/trieve-
       | website/commit/ab563475...
       | 
       | Links: - https://news.ycombinator.com/item?id=2619736
       | 
       | - https://web.archive.org/web/20110618105517/http://ycombinato...
        
       | krackers wrote:
       | One thing I'd like is ability to search your (or others')
       | favorites.
        
         | skeptrune wrote:
         | That's a really cool idea and also likely doable. Probably
         | won't ship it before we release, but will add it to our
         | backlog.
        
       ___________________________________________________________________
       (page generated 2024-08-12 23:00 UTC)