[HN Gopher] History of Hacker News Search from 2007 to 2024
___________________________________________________________________
History of Hacker News Search from 2007 to 2024
Author : skeptrune
Score : 63 points
Date : 2024-08-12 20:27 UTC (2 hours ago)
(HTM) web link (trieve.ai)
(TXT) w3m dump (trieve.ai)
| cactusplant7374 wrote:
| When is the launch?
| skeptrune wrote:
| Not sure yet! Fingers crossed it'll be this week or next.
| swyx wrote:
| i want to publicly thank Algolia for providing an excellent HN
| search for so long. i was on a call with Linus Lee and we both
| were referencing something on HN and i started pulling it up and
| he said "i know exactly what website you're on" without seeing my
| screen and it was of course the Algolia HN search. unbelievable
| mind sync.
|
| Idk if it can be replaced (i guess i could do with semantic
| search + content crawling to start?), but even if it is replaced,
| Algolia will always have a special place in my heart for doing
| such a great job for free. thank you whoever worked on it
| (Algolians - is there a behind the scenes writeup somewhere?)
| squeaky-clean wrote:
| It is free, but Algolia is a YCombinator backed company (YC
| W14) so for them it's probably very useful as a sort of low-
| stakes phase-1-prod environment. Basically a win-win-win.
| davisr wrote:
| Now, if only it could be used without necessitating
| JavaScript...
| spasibot wrote:
| I wonder if they'll let us search flagged and dead posts and
| comments.
|
| I remember reading some insightful exchanges back in the day that
| got flagged because of being a controversial topic that other
| users didn't like.
|
| No way to find them now, even knowing some keywords and
| approximate month and year.
| yorwba wrote:
| They're using the firebase API,
| https://github.com/devflowinc/trieve-hn-discovery/tree/main/...
| so no showdead, I think.
| greenchair wrote:
| contributes to the echo chamber
| dang wrote:
| We had to exclude [dead] and eventually even just [flagged]
| posts from the public API because many third-party clients and
| sites were displaying them as if they were regular posts. For
| the ever-fragile HN ecosystem, that is catastrophic. We would
| get angry emails saying "how can your _expletive_ site possibly
| condone such _expletive_ _expletive_ comments as <link>" ...
| and then it would turn out that <link> was a post by some
| account that had been banned for years.
|
| It's fine if users turn 'showdead' on in their profile to read
| everything--just please remember that by doing that, you're
| subscribing to various bottoms of various barrels. But it's
| definitely not ok when people browse HN with some app we have
| nothing to do with, run into horrible things, understandably
| are outraged and then forever have their view of HN imprinted.
|
| IMO this issue is existential for HN. We've spent years and so
| much energy trying to find a balance between internet openness
| and human decency, a task which oscillates between barely-
| possible and simply-doomed, so the idea that anybody anywhere
| sees anything labeled "Hacker News" that pours all the toxic
| waste back into the commons is physically painful to me. Much
| as I dislike the idea of restricting anyone's curiosity about
| the entire corpus of what gets posted, I don't see what choice
| we have.
| spasibot wrote:
| Thank you dang, I very much appreciate you taking the time to
| explain.
|
| I was going to suggest maybe the public API could have a
| "showdead" flag too but I guess that too easily enables the
| problem you're trying to prevent? As in an enterprising app
| developer could turn the "showdead" tap to "yes" with every
| request and then the waste gushes out once more.
| dredmorbius wrote:
| I can appreciate that concern and see it even with flagging /
| dead / killed posts and submissions.
|
| I've had my own concerns about HN's moderation, both excesses
| and insufficiencies. When I've done occasional polls about
| what people's issues are about HN I'm very often pointed to
| comments which now show as flagged. I'd found a few which
| _hadn 't_ been flagged and forwarded those to dang, who
| (admittedly long after the fact) flagged them. As dang's said
| many time, moderators don't see everything, most moderation
| is by members, and mods step in relatively rarely.
|
| Based on Whaly's 2021 analysis and looking at dang's own
| comment post history (via Algolia), HN nets roughly 4 million
| comments/year and 400k submissions, with about 150k active
| members. Over his ten years as moderator dang's averaged
| about 20 comments per day, though there's a great deal more
| moderation occurring (some automated, some member-based, some
| manual but not noted with comments, which tend to be reserved
| for established accounts).
|
| My read is that HN _mostly_ tends toward its stated goals
| and, frankly, good-netizen behaviour. It _does_ have a
| pronounced status-quo bias, though it seems to be self-aware
| on this point. I 've a few further concerns I'm still
| thinking through.
|
| The problem with an overly-open archive is that this makes
| possible misconstrued assertions about what HN does or
| doesn't tolerate. An open-access archive and third-party apps
| which don't reflect moderation actions, say, a third-party
| app which explicitly _only_ showed flagged, killed, and /or
| dead posts, comments, and users, would paint a distinctly
| different picture of HN, and one which would greatly harm the
| reputation of the site.
|
| There are some ... possible ways around this. HN uses
| sequentially-numbered IDs for posts and comments (both are
| treated the same so far as I can tell). UserIDs seem to have
| an internal representation which is similar (I've seen, for
| example, names which change over time), but the internal
| representation doesn't seem to be publicly exposed. If you
| want to find my own content you'd do it with
| "UserID=dredmorbius" and not by some numeric identifier.
|
| But the numeric content ID means that a determined scraper
| could walk (sequentially or randomly) through the entire
| database, pull out every post and comment, and then glue
| those back together. That's somewhat north of 40 million
| items presently.... (There are benefits to using sparse,
| random / arbitrary UUIDs for systems.)
| saagarjha wrote:
| I figured as such. Can I ask that you change the website
| itself to make them visible to logged out accounts? I
| understand exactly why you did this but I feel like if you
| showed them collapsed by default on the single-comment page
| and you have to actively click on a "banned" to expand them
| you really are out of line when you complain about how Hacker
| News hosts horrible content or whatever.
| dang wrote:
| You mean for [dead] comments? Sorry, but strong no. Logged-
| in-with-showdead-turned-on has proven to be the correct
| height for that gate. Anyone who wants to can easily clear
| it, but the small amount of effort and information required
| means that most people become core community members before
| turning it on. If we lowered it, naive-casual readers would
| (through no fault of their own) misunderstand what they
| were looking at and the dynamic I just described would kick
| in.
|
| The longer I've worked on HN the more I've come to
| appreciate PG's design of this critical aspect of the site.
| No content is hidden from users who want to see it*, but
| the worst is (mostly) cordoned off so it doesn't destroy
| the community. Banned users can continue to post, but their
| comments are autokilled, so they're cordoned off by
| default.
|
| We're often asked: why allow banned users to continue to
| post? The answer is that if we didn't, they'd just create
| new accounts, and then they'd be posting with unbanned
| accounts until we caught them and banned them again: a
| strictly worse situation. This is one aspect of PG's design
| that took me years to appreciate and got me thinking it
| might even be optimal.
|
| The one major change we made to the original design was
| adding 'vouching'
| (https://news.ycombinator.com/item?id=10298512), which lets
| the community transfer cordoned-off posts back to the
| commons if there's nothing wrong with them. That bit has
| worked out really well.
|
| * (Except for [deleted] posts. If you see [deleted] it
| always means that either the author deleted it or asked us
| to do that for them.)
| dredmorbius wrote:
| It's possible to favourite such items. They'll not be visible
| unless you're logged in, but at least you can go through your
| fave list and find them.
|
| Or, of course, bookmark them yourself for later reference.
|
| I've run into this issue myself, though dang's reply (still
| being edited as I write this) does hit on some valid points.
| olalonde wrote:
| I was looking at Algolia's website recently and it seems they
| really went all in with the "AI" marketing/SEO.
| swyx wrote:
| "we put the AI in AIgolia"
| re wrote:
| I'm a little confused by the context and comments here. Is Trieve
| associated with HN at all or is this an independent/third-party
| offering? Is the Algolia search going anywhere? Will the search
| field on HN still take me to Algolia search or is that changing?
| dang wrote:
| It's independent/thirdparty, not officially associated with HN,
| though Trieve is YC-funded
| (https://www.ycombinator.com/companies/trieve) and in that
| sense there's an affiliation.
|
| There's no current plan to change HN Search though I wouldn't
| rule it out. PG often used to integrate recent YC startups into
| HN in various ways (search as detailed by the OP, but also an
| SSO startup at one point, a carbon-reduction startup, I forget
| what else) as a way of giving them a boost and I could imagine
| us doing that again. (side note: I guess the mental model in my
| head is that earlier-stage startups are more closely bonded to
| YC and that as they succeed and expand, there's certainly still
| a friendly connection but the attachment becomes a bit weaker.
| For startups that have been around 10+ years, for example, I'm
| not sure it still makes sense to have frontpage job ads on HN.)
|
| The change I'd really like to make to HN Search is to bring the
| front-end part of it into the HN codebase so the search results
| can be 'real' HN pages*. It's never been a priority to
| implement that though, and the Algolia system has been fabulous
| for a long time.
|
| * Funnily enough, I made almost the same point using my pre-
| dang account back when Algolia was just getting started:
| https://news.ycombinator.com/item?id=7126635. Credit to pvg,
| who misses nothing, for spotting this. The reply was from
| Algolia cofounder ndessaigne who is now a group partner at YC!
| He was good on his word, btw.
| ChrisArchitect wrote:
| Related:
|
| _Vote on Algolia vs. Trieve HN Dataset Blind Search Relevance
| Poll?_
|
| https://news.ycombinator.com/item?id=41172033
| NKosmatos wrote:
| Can we get an option to search for users/usernames? Or even
| better, searching for users based on their karma? ;-)
| dredmorbius wrote:
| User-specific search is possible in Algolia using
| "by:<username> <query>". So "by:dredmorbius privacy" will find
| my posts or comments on privacy.
| dredmorbius wrote:
| There are a few capabiliti es lacking from Algolia which I'd
| really like to see in a replacement:
|
| - Negative search / exclusion: the ability to _exclude_ terms
| from a search, as in "procfs -linux", which would look for any
| references to "procfs" which did not _also_ reference "linux".
|
| - Replies to a specific user, e.g., "by:dredmorbius
| inreplyto:skeptrune <search terms>". I'm often looking for a
| specific context of my own previous comments.
|
| - An improved date-bounding interface. If there's one thing that
| frustrates me about Algolia's interface, it's the GUI (and
| syntax) for defining dates. It's cumbersome, and at least on my
| browser, the dates are generally hard to read or invisible. Going
| back years is especially cumbersome.
|
| I'll add: Algolia _has_ been massively useful, and the fact that
| I _can_ search HN, especially for my own content, has been a huge
| part of the value of the site, and is worlds ahead of other
| online platforms. (Mastodon / the Fediverse _is_ catching up
| here, Diaspora* 's lack of search was among my main frustrations
| with the site and explains my absence there after more than a
| decade of participation.)
| dang wrote:
| Algolia does do search term exclusion. Compare
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
| and https://hn.algolia.com/?dateRange=all&page=0&prefix=false&q
| u....
|
| On the second point - HN has an undocumented endpoint
| https://news.ycombinator.com/replies?id=skeptrune&by=dredmor...
| but that doesn't give you search of course.
| tptacek wrote:
| _whoah_
| spasibot wrote:
| Just guessed another two undocumented endpoints
| "deadcomments" and "deadstories", these take you to a
| special admin login page if you're logged out, or say
| "Unknown." if logged in:
|
| https://news.ycombinator.com/deadcomments
|
| https://news.ycombinator.com/deadstories
|
| And the endpoint "flagged", which is empty for me:
|
| https://news.ycombinator.com/flagged
|
| But if you add a username, e.g. "?id=dang", it says "Can't
| display that." instead:
|
| https://news.ycombinator.com/flagged?id=dang
|
| Interesting to stumble upon these even if they do nothing
| for a non-admin user!
| tptacek wrote:
| `/flagged` is the list of stories you've flagged.
| spasibot wrote:
| That makes sense, thank you.
| dredmorbius wrote:
| Search term exclusion: TIL! I swear I've tried that w/o luck
| before. Possibly was confounding that with OR pairing,
| usually given as "(termA|termB)". I'm pretty sure that
| doesn't work as expected (and just tested it). It's also
| annoyingly absent from DDG search, which is out of scope for
| here.
|
| I'd been made aware of the undocumented endpoint but didn't
| want to spill the beans ;-) It's apparently expensive. OK to
| run manually, but don't script it.
| skeptrune wrote:
| >Replies to a specific user
|
| Will definitely make sure to add that to our search before
| launch! That is a really good idea.
|
| >Improved date-bounding interface
|
| Note taken :)
|
| We also have negated terms which will work the same way.
| simonw wrote:
| On the topic of Hacker News search... a useful trick that not
| enough people know is that you can take the ID of a story and use
| search to return all comments ordered by most recent first -
| great for keeping up with what's new in a specific conversation.
|
| Eg for this thread the most recent comments can be found here:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
|
| I built an Observable notebook to save me from having to manually
| construct those searches here:
| https://observablehq.com/@simonw/hacker-news-homepage
| pogue wrote:
| HN could really use some client software. If nothing else for
| choosing how to read thread replies (ie: most recent, most
| upvoted/popular, most downvoted, by replies from the OP, etc) +
| a more advanced built in search.
| skeptrune wrote:
| > more advanced built-in search
|
| Do you have any specific feature requests? I would love more
| suggestions and ideas!
| skeptrune wrote:
| Updated the blog at the link to include PG's HN post and the
| archive-available ycombinator.com post documenting the
| Octopart/ThriftDB search launch.
|
| Commit here - https://github.com/devflowinc/trieve-
| website/commit/ab563475...
|
| Links: - https://news.ycombinator.com/item?id=2619736
|
| - https://web.archive.org/web/20110618105517/http://ycombinato...
| krackers wrote:
| One thing I'd like is ability to search your (or others')
| favorites.
| skeptrune wrote:
| That's a really cool idea and also likely doable. Probably
| won't ship it before we release, but will add it to our
| backlog.
___________________________________________________________________
(page generated 2024-08-12 23:00 UTC)