[HN Gopher] Show HN: Quality News - Towards a fairer ranking alg...
       ___________________________________________________________________
        
       Show HN: Quality News - Towards a fairer ranking algorithm for
       Hacker News
        
       Hello HN!  TLDR;  - Quality News is a Hacker News client that
       provides additional data and insights on submissions, notably, the
       upvoteRate metric.  - We propose that this metric could be used to
       improve the Hacker News ranking score.  - In-depth explanation:
       https://github.com/social-protocols/news#readme  The Hacker News
       ranking score is directly proportional to upvotes, which is a
       problem because it creates a feedback loop: higher rank leads to
       more upvotes leads to higher rank, and so on...
       -                  /     \         Higher Rank   More Upvotes
       \     /                     -       As a consequence, success on HN
       depends almost entirely on getting enough upvotes in the first hour
       or so to make the front page and get caught in this feedback loop.
       And getting these early upvotes is largely a matter of timing,
       luck, and moderator decisions. And so the best stories don't always
       make the front page, and the stories on the front page are not
       always the best.  Our proposed solution is to use upvoteRate
       instead of upvotes in the ranking formula. upvoteRate is an
       estimate of how much more or less likely users are to upvote a
       story compared to the average story, taking account how much
       attention the story as received, based on a history of the ranks
       and times at which it has been shown. You can read about how we
       calculate this metric in more detail here:
       https://github.com/social-protocols/news#readme  About 1.5 years
       ago, we published an article with this basic idea of counteracting
       the rank-upvotes feedback loop by using attention as negative
       feedback. We received very valuable input from the HN community
       (https://news.ycombinator.com/item?id=28391659). Quality News has
       been created based largely on this feedback.  Currently, Quality
       News shows the upvoteRate metric for live Hacker News data, as well
       as charts of the rank and upvote history of each story. We have not
       yet implemented an alternative ranking algorithm, because we don't
       have access to data on flags and moderator actions, which are a
       major component of the HN ranking score.  We'd love to see the
       Hacker News team experiment with the new formula, perhaps on an
       alternative front page. This will allow the community to evaluate
       whether the new ranking formula is an improvement over the current
       one.  We look forward discussing our approach with you!  Links:
       Site: https://news.social-protocols.org/  Readme:
       https://github.com/social-protocols/news#readme  Previous Blog
       Post: https://felx.me/2021/08/29/improving-the-hacker-news-
       ranking...  Previous Discussion:
       https://news.ycombinator.com/item?id=28391659
        
       Author : manx
       Score  : 114 points
       Date   : 2023-03-16 15:37 UTC (7 hours ago)
        
 (HTM) web link (news.social-protocols.org)
 (TXT) w3m dump (news.social-protocols.org)
        
       | password4321 wrote:
       | I wonder how long this project will run. So many Hacker News
       | interface reimplementations have gone dark over the years.
       | 
       | Success to you!
        
         | manx wrote:
         | Thank you! We built a lightweight page on purpose (server-side
         | rendered go templates), to never involve a big hosting cost. In
         | fact, right now, it's hosted on a fly.io free tier. But it's
         | open source and anybody could host it, if we're deciding to
         | shut it down.
        
       | 23B1 wrote:
       | I guess I don't understand what the problem is with the current
       | way of doing things? Like, HN is one of the few communities
       | online now where the content and the conversation seem
       | interesting, varied, and polite.
       | 
       | Whats the core problem you're trying to solve here?
        
         | user3939382 wrote:
         | My biggest gripe, which may not be solvable and isn't unique to
         | HN, is this vote/point system that ends up as "truth by
         | consensus". Controversial opinions, which may actually be the
         | correct ones, are hidden and buried and even have their text
         | slowly disappear and fade out which I think is ridiculous.
         | 
         | It promotes groupthink and encourages users to just repeat
         | mainstream opinions.
        
           | SllX wrote:
           | Not as much as you may think.
           | 
           | "showdead" is a feature if you have enough karma and it's
           | usually easy to see why something is dead.
           | 
           | Controversial arguments made well, or at least to the best of
           | your ability that fall within the guidelines can and do get
           | upvoted. A lot of the dead comments I see are really just ad
           | hominem attacks or near enough to it.
        
         | jwarden wrote:
         | The problem certainly isn't that HN content and conversation
         | isn't good. But that doesn't mean it couldn't be better.
         | 
         | The core problem we are teying to solve is that the community
         | sometimes misses out on the opportunity to discuss content that
         | many people would find valuable and that would engender quality
         | discussion.
        
         | cycomanic wrote:
         | I can tell you one issue I observe as someone living in a
         | different timezone. HN ranking highly depends on timing. As a
         | consequence during hours the US is asleep my impression (I have
         | not investigated this thoroughly) is that front page stories
         | are dominated by old stories and new submissions don't make
         | enough votes to get to the front page. Then once the US wakes,
         | New stories get enough votes and make it to the front page.
         | This leads to biases in the news, which I find unfortunate,
         | because I believe everyone would benefit from news being
         | geographically (for lack of a better word) broader.
         | 
         | I certainly welcome someone playing with different algorithms
         | to see how they affect ranking.
        
           | jnakayama wrote:
           | If you're interested in investigating timing effects: We
           | collected a dataset last year where we took a snapshot of the
           | newest 1500 stories on HN every minute for several months
           | which should contain the information required. Feel free to
           | play around with it and get in touch with us if you find
           | something interesting!
           | 
           | [1] Dataset: https://osf.io/bnysw/
           | 
           | [2] Exploratory analyses: https://github.com/social-
           | protocols/hacker-news-data
        
         | chasebank wrote:
         | The only problem I have with HN is the endless content. I wish
         | you could say, 'Show me stories that reached the front page',
         | then hide or watch each story as you deem. That way I could
         | clear out the front page like I do my inbox.
        
           | yesenadam wrote:
           | Sounds like https://hckrnews.com/ would be perfect for you.
        
           | manx wrote:
           | Some feed readers like feedly have a fifo mode, where you,
           | for example, can go through all the stories which appeared on
           | hacker news best.
        
       | supernova87a wrote:
       | Are you sure / do you have info that HN doesn't use some kind of
       | "holding pen" for stories to have a fixed amount of time to see
       | if they get a certain % of votes before being kicked down the
       | list?
       | 
       | This is a classic problem with forums, and I wonder if HN already
       | has something in place that you might not have factored in (which
       | could then just be tuned better).
        
         | jwarden wrote:
         | I think there may well be some sort of "holding pen" system.
         | But we don't have the info. We only know what the "raw" ranking
         | formula is. But the actual rankings differ significantly from
         | what the raw ranking formula says. You can actually see the
         | difference in charts on our site. For example:
         | https://news.social-protocols.org/stats?id=35183317
         | 
         | We have noticed that the "raw" rank (black line) will sometimes
         | initially put a story on page 1, while it still has no the
         | actual rank (orange line). But then sometimes the orange line
         | suddenly jumps up. This seems to support the "holding pen"
         | hypothesis.
        
       | mhb wrote:
       | Would just increasing the number of posts on the front page be an
       | improvement?
        
         | manx wrote:
         | I think so. The first page gets much more upvotes than the
         | second page. There is a visible step in the data:
         | https://github.com/social-protocols/news#upvote-share-by-ran...
         | 
         | Once a story drops to the second page, it receives fewer
         | upvotes and can't sustain any growth anymore. Having a longer
         | front page (we're showing 90 ranks), smooths out that effect.
        
       | mtVessel wrote:
       | This is interesting, but it would even better if I could see it
       | sorted by descending upvoteRate.
        
         | manx wrote:
         | That's a great idea! Similar to the hacker news /best page,
         | where stories of the past 7 days are sorted by their upvote
         | count, we could provide a page where those stories are sorted
         | by their upvoteRate. Should be easy to do.
        
       | MilnerRoute wrote:
       | I'd like to see these alternate algorithms implemented. The API
       | exists - and isn't that really the only way to ultimately judge
       | if it's better or worse?
       | 
       | Another random idea: have the parameters affecting rankings be
       | visible and adjustable with interactive sliders- so you could
       | customize the various weights to try to attain the ideal mix of
       | stories for you.
       | 
       | Or does that defeat the purpose. Is the joy of HN in knowing that
       | when a story reaches the front page, you know it's on everyone's
       | front page...
        
         | jwarden wrote:
         | One thing that prevents us from actually implementing the
         | algorithm is that there is some "secret sauce" to HN rankings
         | that is not publicly available. There are flags, vote ring
         | detectors, domain penalties, the second chance queue, and other
         | means by which HN moderators change the rank of stories. And
         | these make a *huge* difference. Our initial implementation of
         | an alternative ranking algorithm was not an improvement over
         | the existing HN home page for this reason.
        
           | zamalek wrote:
           | > secret sauce
           | 
           | This is a feature FWIW. It prevents blatant gaming of
           | rankings.
        
         | jnakayama wrote:
         | We played around with customization options like that (URL
         | parameters etc.), but ultimately decided against it. The
         | reasoning behind it was that a lack of personalization might be
         | a _feature_ not a _bug_ for a news aggregator like HN. One
         | issue that arises with personalization is that it is
         | detrimental to a sense of shared experience and we thought that
         | the global frontpage might be a distinct reason for the sense
         | of community on HN.
         | 
         | This issue was also discussed previously on HN: -
         | https://news.ycombinator.com/item?id=31375092
        
       | exolymph wrote:
       | I like https://hckrnews.com/ as an alternative front page
        
       | akomtu wrote:
       | HN needs to separate emotion from reason in upvotes and
       | downvotes. I bet that many readers here confuse the upvote button
       | with "I like it" and the downvote button with "I dislike it"
       | while it should be about "is this comment truthful and
       | informative, does it add something novel to the discussion?"
       | 
       | HN could implement this with a cosmetic change: all upvotes and
       | downvotes would show a form to provide explanation. Those
       | explanations will be reviewed, randomly, to spot emotional users
       | and suspend their voting power for a month or two. As for those
       | who can't be botheted with explaining their voting decision, they
       | shouldn't be able to influence the global ranking. Rage
       | downvoting and hive-mind upvoting will be gone very quickly.
        
         | manx wrote:
         | I think this boils down to:
         | 
         | - Users upvote, because they want that story to get MORE
         | attention, BECAUSE they agree
         | 
         | - Users downvote, because they want that story to get LESS
         | attention, BECAUSE they disagree
         | 
         | So the intent is still attention control, but the reason is
         | (dis)agreement.
         | 
         | But in the case of HN, the downvote is a moderation mechanism,
         | instead of a community poll. So this might be confusing to the
         | user. Treating downvotes differently, based on a top-level
         | reason (disagreement, violating ToC, false or misleading, not
         | interesting, etc) makes a lot of sense to me.
        
           | JohnFen wrote:
           | Usually (but not always), I upvote not because I agree, but
           | because someone said something that I think was worth
           | reading. Usually (but not always), I downvote not because I
           | disagree, but because someone said something that I think is
           | worth negative attention (trolling, etc.).
           | 
           | But I would love a separate Like/Dislike mechanism. It's a
           | bit painful to upvote an insightful (thus upvote-worthy)
           | comment that expresses a view that I disagree with.
        
         | layer8 wrote:
         | You expect emotional up-/downvoters to be objective and
         | truthful about their up-/downvote motivation?
        
       | abecedarius wrote:
       | Idea: if you know all of a user's votes, you can estimate that
       | they at least glanced over the items up to the lowest-placed one
       | they voted on on the same page. This is a bit more information
       | than "users tend to read higher-placed items following a known
       | distribution" like the formula from your readme. I guess you'd
       | have to be HN to implement this.
        
         | jwarden wrote:
         | Interesting idea. Yes that's probably true. One issue is that a
         | story could appear on multiple pages (top, new, show, etc.),
         | and we don't know where the upvote came from. But I think we
         | could deal with that issue and we might be able to use that as
         | a datapoint to refine the upvoteRate calculation, and we could
         | experiment with adding that to our model.
        
           | kqr wrote:
           | What percentage do you believe does not come from the front
           | page? Is it big enough to actually be worried about?
        
             | jwarden wrote:
             | If it's on rank 1 of the front page, then the vast majority
             | of votes come from the front page. But if it's at rank 90
             | of the front page, and rank 1 of the new or the best page,
             | then in those cases only a minority of upvotes may be
             | coming from the front page.
             | 
             | If HN implemented this, they would know where the vote is
             | coming from. But on Quality News we could just assume there
             | was an X% chance the vote was from the front page, a Y% it
             | was the new page, etc., and adjust our upvoteRate formula
             | based on that.
        
       | troydavis wrote:
       | How does your model compare to using:
       | 
       | (Users who upvoted a given submission / Users who saw a page that
       | includes the submission and its vote icon)
       | 
       | This would be a percent between 0 (no one who saw a page
       | containing a given submission upvoted it) and 100 (everyone who
       | saw it upvoted it). Receiving more impressions wouldn't change
       | that percentage.
       | 
       | Weaknesses: It can only be calculated by HN itself. On pages that
       | list lots of submissions (like the home page), it need may to
       | compensate for relative position on the page. These pages may
       | already randomize position enough for this not to be an issue, or
       | to only be an issue for the first 3-5 items on the home page.
        
         | jwarden wrote:
         | Interesting question.
         | 
         | One difference is that upvoteRate formula adjusts for _where_
         | the submission appears on the page (the rank). It also adjusts
         | for _how many site-wide upvotes_ occurred during that time
         | period.
         | 
         | You are right, since we don't know the number of users who saw
         | the page + the vote icon, so we can't calculate the probability
         | Pr(upvote|saw submission with upvote button). But the
         | upvoteRate formula would be _proportional_ to this probability,
         | times additional factors for rank and time.
         | 
         | We talked about this in our original blog article here:
         | https://felx.me/2021/08/29/improving-the-hacker-news-ranking...
        
       | jrussino wrote:
       | Lots of discussion here about this approach and alternatives. Not
       | sure how feasible this is but I think it would be even cooler to
       | turn this into a site where users can define their own custom
       | ranking algorithm and/or select from a set of available
       | algorithms (including the one you're currently using). Maybe even
       | provide a meta-ranking of the most popular ranking algorithms?
        
       | taubek wrote:
       | Can I somehow see the historical data for some of mine old
       | submits? I seems to me that on https://news.social-protocols.org/
       | I can get the data for past 24 hours.
       | 
       | I think that you have point with feedback loop.
        
         | manx wrote:
         | We just reset the history yesterday and plan to keep data for
         | about one month for now. In the future, it should definitely be
         | possible to retain a much longer time span.
        
       | mostcallmeyt wrote:
       | See also https://news.ycombinator.com/item?id=23286140
        
       | sokoloff wrote:
       | There's a related question of "what's the _purpose_ of the
       | ranking algorithm? "
       | 
       | Is it to ensure that the #1 article is strictly "better" (via
       | whatever function) than the #2 and the #2 better than the #3?
       | 
       | Or is it to ensure that at least N of the top 30 (page 1)
       | submissions will tend to be interesting to many users on the site
       | (driving engagement and discussion)?
       | 
       | As a user, I'm a lot more interested in the second goal than I am
       | the first goal. This change seems to serve the first goal much
       | more than it serves the second goal. The reinforcement loop of
       | "on the front page => gets more votes" is a property that
       | supports the second goal more than it supports the first. Looking
       | at the top 30 on social-protocols (this algorithm) vs the front
       | page on HN, I saw 1 additional story on HN that would motivate me
       | to click through (5 vs 4), so not a massive difference.
        
         | jwarden wrote:
         | I would say the ranking algorithm has many purposes. Driving
         | quality engagement and discussion, as you suggest, is probably
         | the most important. But I think simply being "interesting
         | enough" to many users is not the goal. I think the goal is to
         | make the front page as interesting as possible. That's why HN
         | has already out so much effort into ranking algorithms and
         | moderation and why we think it is still worth improving if
         | possible.
         | 
         | We actually haven't implemented a new algorithm (for reasons
         | discussed in the readme). What you see when you click on our
         | site is the exact same rankings, but with the upvoteRate next
         | to each in addition to the score, which you can click on to see
         | charts with a history of the story's rank and upvoteRate.
        
       | unethical_ban wrote:
       | I would like the HN UI to have a "favorite" button equally
       | accessible to the "upvotes" button.
       | 
       | I know favorites are a feature, but they require clicking into
       | the comments. I end up using upvote as a bookmark function, not
       | as a method of approving of a post, because that's easier.
       | 
       | As it relates to this post, the HN UI encourages the feedback
       | loop this submission is trying to fix.
       | 
       | Put a bookmark icon next to the upvote icon. Provide a unified
       | view of upvotes+bookmarked for a user so they can see everything
       | that got their interest.
        
         | jwarden wrote:
         | Good point, I hadn't thought that some people are using the
         | upvote button effectively as a bookmark button. Interesting
         | idea the unified upvotes+bookmarked view.
         | 
         | And your comment raises the question, what does an upvote mean?
         | Why do people upvote? There may be lots of strange reasons. But
         | whatever upvotes mean -- whether it means people want to
         | bookmark it, or people find it valuable, or people want to
         | bring something they _disagree_ with to the attention of other
         | people -- an upvote is a rough signal of  "this should get more
         | attention". The whole concept of a link aggregator like HN only
         | makes sense if we assume that upvotes can be interpreted as a
         | proxy for what people think deserves the attention of other
         | users.
        
         | nick__m wrote:
         | If something is interesting enough to be bookmarked it is
         | surely interesting enough to be upvoted !
        
       | wpietri wrote:
       | As long as we're talking about redoing this, let me suggest
       | letting authors see the names of upvoters (and only upvoters).
       | 
       | Quora had this and it did a fair bit to create positive community
       | feelings for me. It also let people signal agreement/support
       | without having to create a comment to do so, which I would find
       | handy.
        
         | 999900000999 wrote:
         | I actually wouldn't like this, it would make me afraid to
         | upvote unpopular opinions I agree with. As is hacker News falls
         | into the same trap as Reddit where there's a bit of a hive mind
         | effect.
        
         | JohnFen wrote:
         | If this were the system, I'd just stop voting. Which may or may
         | not be a bad thing.
        
         | luckylion wrote:
         | Wouldn't that lead to you seeing a pattern in who upvotes you
         | which would make you more likely to upvote their submissions or
         | comments, slowly guiding you towards bubble-forming?
         | 
         | And if someone doesn't upvote your "let's not eat babies"
         | comment, do you go after them for being pro-baby-eating?
        
       | woollyhat wrote:
       | I think it would be interesting if users could spend their karma
       | on performing moderator actions, perhaps with some sort of
       | algorithmic exchange rate that converts acquired karma into
       | modcoins.
       | 
       | For example, it might cost 1000 modcoins to pin an article to the
       | top of the page for ten minutes. Or perhaps more of a bold
       | change: 2000 modcoins to make the text of your comment glow with
       | a golden hue to make it more noticeable. 5000 modcoins to display
       | an image of Paul Graham at the top of the thread, smiling
       | beatifically at all the comments below Him. And so on.
       | 
       | This would of course be of no interest to users such as myself
       | who habitually generate throwaway accounts and discard them, but
       | I would be curious to see how high karma users would use such a
       | feature.
        
         | mtlmtlmtlmtl wrote:
         | Interesting idea, but this reminds me too much of Tinder.
        
       | PaulHoule wrote:
       | When my RSS reader shows me an arXiv paper about ML with 'fair'
       | in the title I hit the reject button. What is 'fair' is
       | subjective and what I want is a feed relevant to my interests
       | (also subjective.)
       | 
       | This is 2023 and text classification problems that I struggled
       | with at a startup 5 years ago are now _easy_ and the power of
       | transformer models is obscured by the ChatGPT hype. It is time
       | that we turn our back in the collaborative filtering algorithms
       | that made social media a hellscape and embrace content-based
       | filtering.
       | 
       | I have a model that predicts if an article will front page or get
       | a high ratio of comments/votes. It has a terrible ROCAUC because
       | it is such a fuzzy problem but it is well calibrated and just
       | today my RSS reader told me a story I thought was a nothingburger
       | would succeed on both metrics and... It did!
       | 
       | I did make an attempt to take into account the factors you're
       | concerned about and I was surprised that the AUC didn't go up.
       | Probably I did it wrong though.
       | 
       | Look up my profile, I'd love to chat about it.
        
         | petercooper wrote:
         | _it is well calibrated and just today my RSS reader told me a
         | story I thought was a nothingburger would succeed on both
         | metrics and... It did!_
         | 
         | I was originally going to joke that maybe you should turn your
         | script on to the stock market, but I'm guessing with your
         | background you may have some experience in that regard!
        
           | PaulHoule wrote:
           | I've tried that and failed but I was using a crappy
           | commercial sentiment analysis engine.
           | 
           | These guys succeeded though and wrote a great book about it
           | 
           | https://www.amazon.com/Trading-Sentiment-Power-Markets-
           | Finan...
        
             | petercooper wrote:
             | That sounds like the sort of thing I'd enjoy reading -
             | thanks!
        
         | mgraczyk wrote:
         | I don't agree that "fair" is inherently subjective. There are
         | many sensible ways to objectively define fair. For example you
         | could say HN ranking is "fair" if the probability any viewer
         | would upvote that article is independent of its position-
         | history on HN. That is an objective definition that is "fair"
         | with respect to positions.
         | 
         | This and other notions of "fairness" are very common problems
         | in ranking (I used to rank things on Instagram) that have to be
         | addressed, even if you're only doing content based ranking.
        
           | JohnFen wrote:
           | "Fair" is inherently subjective because, in a vacuum,
           | everyone has a subjective definition of the word. It can
           | legitimately mean many different things, after all. People
           | will choose the definition they read it as according to their
           | own subjective experiences.
           | 
           | I reacted poorly to the use of the word "fair" here, too,
           | because I didn't see how "fairness" really entered into it.
           | Naturally, you can provide a specific definition of the word
           | to make it objectively measurable, but if you use a word
           | before you've said what your definition is, people are going
           | to use the common definition -- and therefore, it's
           | subjective.
        
             | mistermann wrote:
             | The word "is" is also subjective in this context, since
             | "subjective" is not a binary.
        
         | manx wrote:
         | Just looked up your profile. There's some super interesting
         | stuff you worked on. We'll get in touch!
        
         | nico wrote:
         | > It is time that we turn our back in the collaborative
         | filtering algorithms that made social media a hellscape and
         | embrace content-based filtering.
         | 
         | Except people deeply care about what other people are doing.
         | 
         | That was the whole point of Google's Pagerank algorithm.
         | 
         | So, it might not be what you personally want. But to a lot of
         | people, it's more important to read/consume something popular
         | (ie. that a lot of others care about), rather than something
         | related to their own interests.
        
           | ClapperHeid wrote:
           | >So, it might not be what you personally want. But to a lot
           | of people, it's more important to read/consume something
           | popular...
           | 
           | Count me in the "don't care" group. Which is why I always
           | browse HN by "New" [0]. I neither know nor care which stories
           | have been voted to prominence on the default frontpage, nor
           | how this ranking has been determined. I just want to see
           | what's new and select for myself what looks interesting.
           | 
           | [0] https://news.ycombinator.com/newest
        
             | jwarden wrote:
             | I would suggest that one of the reasons you find the
             | submissions on the New page valuable is that submitters are
             | actively seeking out stories they think the HN community
             | would upvote. So the content of the New page indirectly
             | reflects popularity within the HN community.
        
           | jwarden wrote:
           | Further in community like HN, it's not just about
           | collaborative filtering to find things that you as an
           | individual will like. It is about focusing the _collective
           | attention of the community_ on a small set of topics to drive
           | rich discussions.
        
           | PaulHoule wrote:
           | A good system uses both but it's not trivial to blend them.
           | My system is right now showing me maybe 30% of what it
           | ingests, if I was seeing just 3% I'd have to cut back more
           | harshly and a popularity score would help. Fundamentally a
           | popularity score has a much larger dynamic range than a
           | relevance score.
           | 
           | Google has both a document-query relevance score plus a
           | document quality score.
           | 
           | I've heard from a lot of people who like reading HN from a
           | comment-centric point of view and I tried feeding all the
           | comments into my system and it was really too much. When I
           | fed in high-scoring comments, however, I like the results. I
           | had somebody suggest comments from Metafilter and I think
           | that could be a winner but of course comments have a network
           | structure of relatedness to other comments and the submission
           | that a comment-oriented reader could take advantage of.
        
         | jwarden wrote:
         | Yes we understand taking issue with the word _fair_. But we
         | should say we mean _fair_ in a very specific way. We would say
         | our algorithm is more fair in the sense that it, in some ways,
         | it more _fairly reflects the intent of the HN community as
         | revealed by their upvote behavior_. We talk about this more in
         | the Readme: https://github.com/social-protocols/news#readme
        
       ___________________________________________________________________
       (page generated 2023-03-16 23:01 UTC)