codevoid.de

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   Auto-grading decade-old Hacker News discussions with hindsight
       
       
        xpe wrote 13 hours 26 min ago:
        Many people are impressed by this, and I can see why. Still, this much
        isn't surprising: the Karpathy + LLM combo can deliver quickly. But
        there are downsides of blazing speed.
        
        If you dig in, there are substantial flaws in the project's analysis
        and framing, such as the definition of a prediction, assessing
        comments, data quality overall, and more. Go spelunking through the
        comments here and notice people asking about methodology and checking
        the results.
        
        Social science research isn't easy; it requires training, effort, and
        patience. I would be very happy if Karpathy added a Big Flashing Red
        Sign to this effect. It would raise awareness and focus community
        attention on what I think are the  hardest and most important aspects
        of this kind of project: methodology, rigor, criticism, feedback, and
        correction.
       
        bretpiatt wrote 15 hours 24 min ago:
        10 Years Ago, December 11, 2015 - Introducing Open AI -- very meta: [1]
        The company has changed and it seems the mission has as well.
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/2015-12-11/index.html#article-10...
       
          bspammer wrote 7 hours 39 min ago:
          Yes very funny to see their own model betray them like this:
          
          > The original ânonâprofit, open, patents sharedâ promise now
          reads almost like an alternate timeline. Today OpenAI is a
          cappedâprofit entity with a massive corporate partner, closed
          frontier models, and an aggressive product roadmap.
       
        nomel wrote 15 hours 39 min ago:
        > I realized that this task is actually a really good fit for LLMs
        
        I've found the opposite, since these models still fail pretty wildly at
        nuance. I think it's a conceptual "needle in the haystack sort of
        problem.
        
        A good test is to find some thread where there's a disagreement and
        have it try to analyze the discussion. It will usually strongly
        misrepresent what was being said, by each side, and strongly align with
        one user, missing the actual divide that's causing the disagreement (a
        needle).
       
          gowld wrote 15 hours 10 min ago:
          As always, which model versions did you use in your test?
       
            nomel wrote 10 hours 48 min ago:
            Claude Opus 4.5, Gemini 3 Pro, ChatGPT 5.1. Haven't tried ChatGPT
            5.2.
            
            It requires that the discussion has nuance, to see the failure.
            Gemini is, by far the, worst at this (which fits my suspicion that
            they heavily weighted reddit posts).
            
            I don't think this is all that strange though. The human, on one
            side of the argument, is also missing the nuance, which is the
            source of the conflict. Is there a belief that AI has surpassed the
            average human, with conversational nuance!?
       
        tgtweak wrote 16 hours 36 min ago:
        Cool - now make it analyze all of those and come up with the 10
        commandments of commenting factually and insightfully on HN posts...
       
        jeffnappi wrote 17 hours 31 min ago:
        The analysis of the 2015 article about Triplebyte is fascinating [1].
        Particularly the Awards section.
        
        1.
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/2015-12-08/index.html#article-10...
       
        bbcisking wrote 18 hours 56 min ago:
        Why not rank ESP for each HN user, with evidence?
       
        alister wrote 21 hours 11 min ago:
        > [1] I wonder why ChatGPT refused to analyze it?
        
        The HN article was "Brazil declares emergency after 2,400 babies are
        born with brain damage" but the page says "No analysis available".
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/2015-12-24/index.html#article-10...
       
          bspammer wrote 19 hours 29 min ago:
          My guess is that itâs because thereâs a lot of very negative
          comments about Brazil in that article. Trying to grade people for
          their opinions on a topic like that gets into dangerous territory.
       
        pnt12 wrote 22 hours 4 min ago:
        On the site itself:
        
        it's great that this was produced in 1h with 60$. This is amazing to
        create small utilities, explore your curiosity, etc.
        
        But the site is also quite confusing and messy. OK for a vibe coded
        experiment, sure, but wouldn't be for a final product. But I fear we're
        gonna see more and more of this. Big companies downsizing their tech
        departments and embracing vibe coded. Comparing to inflation,
        shrinkflation and skimpflation/ enshittification , will we soon adopt
        some word for this? AIflation? LLMflation?
        
        And how will this comment score in a couple of years? :)
       
        nixpulvis wrote 1 day ago:
        Quick give everyone colors to indicate their rank here and ban anyone
        with a grade less than C-.
        
        Seriously, while I find this cool and interesting, I also fear how
        these sorts of things will work out for us all.
       
        DeathArrow wrote 1 day ago:
        >I believe it is quite possible and desirable to train your forward
        future predictor given training and effort.
        
        That's interesting. I wouldn't have thought that a decent generic
        forward future predictor would be possible.
       
        Tossrock wrote 1 day ago:
        So where do I collect my prize for this 2015 comment?
        
 (HTM)  [1]: https://news.ycombinator.com/item?id=9882217
       
          johncolanduoni wrote 1 day ago:
          Never call a man happy until he is dead. Also I donât think your
          argument generalizes well - there are plenty of private research
          investment bubbles that have popped and not reached their original
          peaks (e.g. VR).
       
            Tossrock wrote 1 day ago:
            It wasn't a generalized argument, though, it was a specific one,
            about AI.
       
              xpe wrote 11 hours 53 min ago:
              Here is one sentence from the referenced prediction:
              
              > I don't think there will be any more AI winters.
              
              This isn't enough to qualify as a testable prediction, in the
              eyes of people that care about such things, because there is no
              good way to formulate a resolution criteria for a claim that
              extends indefinitely into the future. See [1] for a great
              introduction.
              
              [1] 
              
 (HTM)        [1]: https://www.astralcodexten.com/p/prediction-market-faq
       
              johncolanduoni wrote 22 hours 15 min ago:
              Okay, but the only part thatâs specific to AI (that the
              companies investing the money are capturing more value than
              theyâre putting into it) is now false. Even the hyperscalers
              are not capturing nearly the value theyâre investing, though
              theyâre not using debt to finance it. OpenAI and Anthropic are
              of course blowing through cash like itâs going out of style,
              and if investor interest drops drastically theyâll likely need
              to look to get acquired.
       
        anshulbhide wrote 1 day ago:
        I often summarise HN comments (which are sometimes more insightful than
        the original article) using an LLM. Total game-changer.
       
        NooneAtAll3 wrote 1 day ago:
        UX feedback: I wish clicking on a new thread scrolled right side to the
        top again
        
        reading from the end isn't really useful, y'know :)
       
        popinman322 wrote 1 day ago:
        It doesn't look like the code anonymizes usernames when sending the
        thread for grading. This likely induces bias in the grades based on
        past/current prevailing opinions of certain users. It would be
        interesting to see the whole thing done again but this time randomly
        re-assigning usernames, to assess bias, and also with procedurally
        generated pseudonyms, to see whether the bias can be removed that way.
        
        I'd expect de-biasing would deflate grades for well known users.
        
        It might also be interesting to use a search-grounded model that
        provides citations for its grading claims. Gemini models have access to
        this via their API, for example.
       
          ProllyInfamous wrote 16 hours 44 min ago:
          What a human-like critizicism of human-like behavior.
          
          I [as a human] also do the same thing when observing others in IRL
          and forum interactions. Reputation mattersâ¢
          
          ----
          
          A further question is whether a bespoke username could influence the
          bias of a particular comment (e.g. A username of something like
          HatesPython might influence the interpretation of that commenter's
          particular perception of the Python coding language, which might
          actually be expressing positivity â the username's irony lost to
          the AI?).
       
          khafra wrote 1 day ago:
          You can't anonymize comments from well-known users, to an LLM:
          
 (HTM)    [1]: https://gwern.net/doc/statistics/stylometry/truesight/index
       
            WithinReason wrote 23 hours 4 min ago:
            That's an overly strong claim, an LLM could also be used to
            normalise style
       
              wetpaws wrote 18 hours 23 min ago:
              How would you possibly grade comments if you change them?
       
                koakuma-chan wrote 17 hours 49 min ago:
                You donât need comments, just facts in them to see if
                theyâre accurate.
       
                strken wrote 17 hours 49 min ago:
                Extract the concrete predictions, evaluate them as
                true/false/indeterminate, and grade the user on the number of
                true vs false?
       
                  Natsu wrote 13 hours 48 min ago:
                  This doesn't even seem to look at "predictions" if you dig
                  into what it actually did.  Looking at my own example (#210
                  on [1] with 4 comments), very little of what I said could be
                  construed as "predictions" at all.
                  
                  I got an A for commenting on DF saying that I had not
                  personally seen save corruption and listing weird bugs.  It's
                  true that weird bugs have long been a defining feature of DF,
                  but I didn't predict it would remain that way or say that
                  save corruption would never be a big thing, just that I
                  hadn't personally seen it.
                  
                  Another A for a comment on Google wallet just pointing out
                  that users are already bad at knowing what links to trust. 
                  Sure, that's still true (and probably will remain true until
                  something fundamental changes), but it was at best half a
                  prediction as it wasn't forward looking.
                  
                  Then something on hospital airships from the 1930s.  I
                  pointed out that one could escape pollution, I never said I
                  thought it would be a big thing.  Airships haven't really
                  ever been much of a thing, except in fiction.  Maybe that
                  could change someday, but I kinda doubt it.
                  
                  Then lastly there was the design patent famously referred to
                  as the "rounded corner" patent.  It dings me for simplifying
                  it to that label, despite my actual statements being that
                  yes, there's more, but just minor details like that can be
                  sufficient for infringement.  But the LLM says I'm right
                  about ties to the Samsung case and still oversimplifying it. 
                  Either way, none of this was really a prediction to begin
                  with.
                  
 (HTM)            [1]: https://karpathy.ai/hncapsule/hall-of-fame.html
       
        apparent wrote 1 day ago:
        > And then when you navigate over to the Hall of Fame, you can find the
        top commenters of Hacker News in December 2015, sorted by imdb-style
        score of their grade point average.
        
        Now let's make a Chrome extension that subtly highlights these users'
        comments when browsing HN.
       
        DonHopkins wrote 1 day ago:
        I'd love to see an "Annie Hall" analysis of hn posts, for incidents
        where somebody says something about some piece of software or whatever,
        and the person who created it replies, like Marshall McLuhan stepping
        out from behind a sign in Annie Hall.
        
 (HTM)  [1]: https://www.youtube.com/watch?v=vTSmbMm7MDg
       
        Uptrenda wrote 1 day ago:
        dude, please do this for every year until today. This idea is actually
        amazing. If you need more money for API credits im sure people here
        could help donate.
       
        intheitmines wrote 1 day ago:
        Interesting that for the "December 16 2015 geohot is building Comma" it
        graded geohot's comments on the thread as only B
       
          snowwrestler wrote 1 day ago:
          Presumably because of how things went with Comma since then.
       
        npunt wrote 1 day ago:
        One of the few use cases for LLMs that I have high hopes for and feel
        is still under appreciated is grading qualitative things. LLMs are the
        first tech (afaik) that can do top-down analysis of phenomena in a
        manner similar to humans, which means a lot of important human use
        cases that are judgement-oriented can become more standardized, faster,
        and more readily available.
        
        For instance, one of the unfortunate aspects of social media that has
        become so unsustainable and destructive to modern society is how it
        exposes us to so many more people and hot takes than we have ability to
        adequately judge. We're overwhelmed. This has led to conversation being
        dominated by really shitty takes and really shitty people, who rarely
        if ever suffer reputational consequence.
        
        If we build our mediums of discourse with more reputational awareness
        using approaches like this, we can better explore the frontier of
        sustainable positive-sum conversation at scale.
        
        Implementation-wise, the key question is how do we grade the grader and
        ensure it is predictable and accurate?
       
          Arodex wrote 11 hours 51 min ago:
          This is wrong, just look at this comment here: [1] LLM can't grade
          reliably human text. It doesn't understand it.
          
 (HTM)    [1]: https://news.ycombinator.com/item?id=46222523
       
        Sophira wrote 1 day ago:
        It somehow feels right to see what GPT-5 thinks of the article titled
        "Machine learning works spectacularly well, but mathematicians arenât
        sure why" and its discussion:
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/2015-12-04/index.html#article-10...
       
        pierrec wrote 1 day ago:
        "the distributed âtrillions of Tamagotchiâ vision never
        materialized"
        
        I begrudgingly accept my poor grade.
       
        dw_arthur wrote 1 day ago:
        Reading this I feel the same sense of dread I get watching those highly
        choreographed Chinese holiday drone shows.
       
        sigmar wrote 1 day ago:
        Gotta auto grade every HN comment for how good it is at predicting
        stock market movement then check what the "most frequently correct"
        user is saying about the next 6 months.
       
          xpe wrote 1 day ago:
          I hope this is a joke.
          
          Forecasting and the meta-analysis of forecasters is fairly well
          studied. [1] is a good place to start.
          
          [1] 
          
 (HTM)    [1]: https://en.wikipedia.org/wiki/Superforecaster
       
            sigmar wrote 1 day ago:
            > The conclusion was that superforecasters' ability to filter out
            "noise" played a more significant role in improving accuracy than
            bias reduction or the efficient extraction of information.
            
            >In February 2023, Superforecasters made better forecasts than
            readers of the Financial Times on eight out of nine questions that
            were resolved at the end of the year.[19] In July 2024, the
            Financial Times reported that Superforecasters "have consistently
            outperformed financial markets in predicting the Fed's next move"
            
            >In particular, a 2015 study found that key predictors of
            forecasting accuracy were "cognitive ability [IQ], political
            knowledge, and open-mindedness".[23] Superforecasters "were better
            at inductive reasoning, pattern detection, cognitive flexibility,
            and open-mindedness".
            
            I'm really not sure what you want me to take from this article? Do
            you contend that everyone has the same competency at forecasting
            stock movements?
       
              xpe wrote 11 hours 16 min ago:
              > I'm really not sure what you want me to take from this article?
              
              I linked to the Wikipedia page as a way of pointing to the book
              Superforecasters by Tetlock and Gardner. If forecasting interests
              you, I recommend using it as a jumping off point.
              
              > Do you contend that everyone has the same competency at
              forecasting stock movements?
              
              No, and I'm not sure why you are asking me this. Superforecasters
              does not make that claim.
              
              > I'm really not sure what you want me to take from this article?
              
              If you read the book and process and internalize its lessons
              properly, I predict you will view what you wrote above in a
              different different light:
              
              > Gotta auto grade every HN comment for how good it is at
              predicting stock market movement then check what the "most
              frequently correct" user is saying about the next 6 months.
              
              Namely, you would have many reasons to doubt such a project from
              the outset and would pursue other more fruitful directions.
       
          Rychard wrote 1 day ago:
          As the saying goes, "past performance is not indicative of future
          results"
       
        SequoiaHope wrote 1 day ago:
        This is great! Now I want to run this to analyze my own comments and
        see how I score and whether my rhetoric has improved in
        quality/accuracy over time!
       
        jacquesm wrote 1 day ago:
        Predictions are only valuable when they're actually made ahead of the
        knowledge becoming available. A man will walk on mars by 2030 is
        falsifiable, a man will walk on mars is not. A lot of these entries
        have very low to no predictive value or were already known at the time,
        but just related. Would be nice if future 'judges' put in more work to
        ensure quality judgments.
        
        I would grade this article B-, but then again, nobody wrote it... ;)
       
        LeroyRaz wrote 1 day ago:
        I am surprised the author thought the project passed quality control.
        The LLM reviews seem mostly false.
        
        Looking at the comment reviews on the actual website, the LLM seems to
        have mostly judged whether it agreed with the takes, not whether they
        came true, and it seems to have an incredibly poor grasp of it's actual
        task of accessing whether the comments were predictive or not.
        
        The LLM's comment reviews are of often statements like "correctly
        characterized [program language] as [opinion]."
        
        This dynamic means the website mostly grades people on having the most
        confirmist take (the take most likely to dominate the training data,
        and be selected for in the LLM RL tuning process of pleasing the
        average user).
       
          andy99 wrote 1 day ago:
          I havenât looked at the output yet, but came here to say,LLM
          grading is crap. They miss things, they ignore instructions, bring in
          their own views, have no calibration and in general are extremely
          poorly suited to this task. âGoodâ LLM as a judge type products
          (and none are great) use LLMs to make binary decisions - âdo these
          atomic facts match yes / noâ type stuff - and aggregate them to get
          a score.
          
          I understand this is just a fun exercise so itâs basically what
          LLMs are good at - generating plausible sounding stuff without regard
          for correctness. I would not extrapolate this to their utility on
          real evaluation tasks.
       
          LeroyRaz wrote 1 day ago:
          Examples: tptacek gets an 'A' for his comment on DF which the LLM
          claiming that the user 
          "captured DF's unforgiving nature, where 'can't do x or it crashes is
          just another feature to learn' which remained true until it was fixed
          on ..."
          
          Link to LLM review: [1] .
          
          So the LLM is praising a comment as describing DF as unforgiving (a
          characterization of the present then, not a statement about the
          future). And worse, it seems like tptacek may in fact be implying the
          opposite of the future (e.g., x will continue to crash when it was
          eventually fixed.)
          
          Here is the original comment: " 
          tptacek on Dec 2, 2015 | root | parent | next [â]
          
          If you're not the kind of person who can take flaws like crashes or
          game-stopping frame-rate issues and work them into your gameplay, DF
          is not the game for you. It isn't a friendly game. It can take hours
          just to figure out how to do core game tasks. "Don't do this thing
          that crashes the game" is just another task to learn."
          
          Note: I am paraphrasing the LLM review, as the website is also poorly
          designed, with one unable to select the text of the LLM review!
          
          N.b., this choice of comment review is not overly cherry picked. I
          just scanned the "best commentators" and tptacek was number two, with
          this particular egregiously unrelated-to-prediction LLM summary given
          as justifying his #2 rating.
          
 (HTM)    [1]: https://karpathy.ai/hncapsule/2015-12-02/index.html#article-...
       
          hathawsh wrote 1 day ago:
          Are you sure? The third section of each review lists the âMost
          prescientâ and âMost wrongâ comments. That sounds exactly like
          what you're looking for. For example, on the "Kickstarter is Debt"
          article, here is the LLM's analysis of the most prescient comment.
          The analysis seems accurate and helpful to me. [1] phire
          
            > âOculus might end up being the most successful product/company
          to be kickstartedâ¦ > Product wise, Pebble is the most successful so
          farâ¦ Right now they are up to major version 4 of their product.
          Long term, I don't think they will be more successful than Oculus.â
          
            With hindsight:
          
            Oculus became the backbone of Metaâs VR push, spawning the
          Rift/Quest series and a multiâbillionâdollar strategic bet.
            Pebble, despite early success, was shut down and absorbed by Fitbit
          barely a year after this thread.
          
            Thatâs an excellent call on the relative trajectories of the two
          flagship Kickstarter hardware companies.
          
 (HTM)    [1]: https://karpathy.ai/hncapsule/2015-12-03/index.html#article-...
       
            karmickoala wrote 1 day ago:
            I get what you're saying, but looking at some examples, they look
            kinda of right, but there are a lot of misleading facts sprinkled,
            making his grading wrong. It is useful, but I'd suggest to be
            careful to use this to make decisions.
            
            Some of the issues could be resolved with better prompting (it was
            biased to always interpret every comment through the lens of
            predictions) and LLM-as-a-judge, but still. For example,
            Anthropic's Deep Research prompts sub-agents to pass original
            quotes instead of paraphrasing, because it can deteriorate the
            original message.
            
            Some examples:
            
              Swift is Open Source (2015)
              ===========================
            
            sebastiank123 got a C-, and was quoted by the LLM as saying:
            
              > âIt could become a serious Javascript competitor due to its
            elegant syntax, the type safety and speed.â
            
            Now, let's read his full comment:
            
              > Great news! Coding in Swift is fantastic and I would love to
            see it coming to more platforms, maybe even on servers. It could
            become a serious Javascript competitor due to its elegant syntax,
            the type safety and speed.
            
            I don't interpret it as a prediction, but a desire. The user is
            praising Swift. If it went the server way, perhaps it could replace
            JS, to the user's wishes. To make it even clearer, if someone asked
            the commenter right after: "Is that a prediction? Are you saying
            Swift is going to become a serious Javascript competitor?" I don't
            think its answer would be 'yes' in this context.
            
              How to be like Steve Ballmer (2015)
              ===================================
              
              Most wrong
              ----------
              
              >    corford (grade: D) (defending Ballmerâs iPhone
            prediction):
              >        Cited an IDC snapshot (Android 79%, iOS 14%) and
            suggested Ballmer was âkind of rightâ that the iPhone
            wouldnât gain significant share.
              >        In 2025, iOS is one half of a global duopoly, dominates
            profits and premium segments, and is often majority share in key
            markets. Any reasonable definition of âsignificantâ is
            satisfied, so Ballmerâs original claimâand this defense of
            itâdid not age well.
            
            Full quote:
            
              > And in a funny sort of way he was kind of right :)
            http://www.forbes.com/sites/dougolenick/2015/05/27/apple-ios...
              > Android: 79% versus iOS: 14%
            
            "Any reasonable definition of 'significant' is satisfied"? That's
            not how I would interpret this. We see it clearly as a duopoly in
            North America. It's not wrong per se, but I'd say misleading. I
            know we could take this argument and see other slices of the data
            (premium phones worldwide, for instance), I'm just saying it's not
            as clear cut as it made it out to be.
            
              > volandovengo (grade: C+) (ill-equipped to deal with
            Apple/Google):
              >  
              >    Wrote that Ballmerâs fast-follower strategy âworked
            greatâ when competitors were weak but left Microsoft ill-equipped
            for âgood ones like Apple and Google.â
              >    This is half-true: in smartphones, yes. But in cloud,
            office suites, collaboration, and enterprise SaaS, Microsoft became
            a primary, often leading competitor to both Apple and Google. The
            blanket claim underestimates Microsoftâs ability to adapt outside
            of mobile OS.
            
            That's not what the user was saying:
            
              > Despite his public perception, he's incredibly intelligent. He
            has an IQ of 150.
              > 
              > His strategy of being a fast follower worked great for
            Microsoft when it had crappy competitors - it was ill equipped to
            deal with good ones like Apple and Google.
            
            He was praising him and he did miss opportunities at first. OC did
            not make predictions of his later days.
            
              [Let's Encrypt] Entering Public Beta (2015)
              ===========================================
            
              - niutech: F "(endorsed StartSSL and WoSign as free options; both
            were later distrusted and effectively removed from the trusted
            ecosystem)"
            
            Full quote:
            
              > There are also StartSSL and WoSign, which provide the A+
            certificates for free (see example WoSign domain audit:
            https://www.ssllabs.com/ssltest/analyze.html?d=checkmyping.c...)
              > 
              > pjbrunet: F (dismissed HTTPS-by-default arguments as paranoid,
            incorrectly asserted ISPs had stopped injection, and underestimated
            exactly the use cases that later moved to HTTPS)
            
            Full quote:
            
              > "We want to see HTTPS become the default."
              > 
              > Sounds fine for shopping, online banking, user authorizations.
            But for every website? If I'm a blogger/publisher or have a
            brochure type of website, I don't see point of the extra overhead.
              > 
              > Update: Thanks to those who answered my question. You pointed
            out some things I hadn't considered. Blocking the injection of
            invisible trackers and javascripts and ads, if that's what this is
            about for websites without user logins, then it would help to
            explicitly spell that out in marketing communications to promote
            adoption of this technology. The free speech angle argument is not
            as compelling to me though, but that's just my opinion.
            
            I thought the debate was useful and so did pjbrunet, per his
            update.
            
            I mean, we could go on, there are many others like these.
       
            xpe wrote 1 day ago:
            Until someone publishes a systematic quality assessment, we're
            grasping at anecdotes.
            
            It is unfortunate that the questions of "how well did the LLM do?"
            and "how does 'grading' work in this app?" seem to have gone out
            the window when HN readers see something shiny.
       
              voidhorse wrote 1 day ago:
              Yes. And the article is a perfect example of the dangerous sort
              of automation bias that people will increasingly slide into when
              it comes to LLMs. I realize Karpathy is sort of incentivized
              toward this bias given his career, but he doesn't even spend a
              single sentence even so much as suggesting that the results would
              need further inspection, or that they might be inaccurate.
              
              The LLM is consulted like a perfect oracle, flawless in its
              ability to perform a task, and it's left at that. Its results are
              presented totally uncritically.
              
              For this project, of course, the stakes are nil. But how long
              until this unfounded trust in LLMs works its way into high stakes
              problems? The reign of deterministic machines for the past few
              centuries has ingrained a trust in the reliability of machines in
              us that should be suspended when dealing with an inherently
              stochastic device like an LLM.
       
        dschnurr wrote 1 day ago:
        Nice! Something must be in the air â last week I built a very similar
        project using the historical archive of all-in podcast episodes:
        
 (HTM)  [1]: https://allin-predictions.pages.dev/
       
          sanex wrote 1 day ago:
          I'll use this as evidence supporting my continued demand for a
          Friedberg only spinoff.
       
        godelski wrote 1 day ago:
        > I was reminded again of my tweets that said "Be good, future LLMs are
        watching". You can take that in many directions, but here I want to
        focus on the idea that future LLMs are watching. Everything we do today
        might be scrutinized in great detail in the future because doing so
        will be "free". A lot of the ways people behave currently I think make
        an implicit "security by obscurity" assumption. But if intelligence
        really does become too cheap to meter, it will become possible to do a
        perfect reconstruction and synthesis of everything. LLMs are watching
        (or humans using them might be). Best to be good.
        
        Can we take a second and talk about how dystopian this is? Such an
        outcome is not inevitable, it relies on us making it. The future is not
        deterministic, the future is determined by us. Moreso, Karpathy has
        significantly more influence on that future than your average HN user.
        
        We are doing something very *very* wrong if we are operating under the
        belief that this future is unavoidable. That future is simply
        unacceptable.
       
          acyou wrote 1 day ago:
          I call this the "judgement day" scenario. I would be interested if
          there is some science fiction based on this premise.
          
          If you believe in God of a certain kind, you don't think that being
          judged for your sins is unacceptable or even good or bad in itself,
          you consider it inevitable. We have already talked it over for 2000
          years, people like the idea.
       
            godelski wrote 1 day ago:
            You'll be interested in Clarke's "The Light of Other Days".
            Basically a wormhole where people can look back at any point in
            time, ending all notion of privacy.
            
            God is different though. People like God because they believe God
            is fair and infallible. That is not true for machines nor men.
            Similarly I do not think people will like this idea. I'm sure there
            will be some but look at people today and their religious fever. Or
            look in the past. They'll want it, but it is fleeting. Cults don't
            last forever, even when they're governments. Sounds like a great
            way to start wars. Every one will be easily justified
            
 (HTM)      [1]: https://en.wikipedia.org/wiki/The_Light_of_Other_Days
       
          jacquesm wrote 1 day ago:
          Given the quality of the judgment I'm not worried, there is no value
          here.
          
          To properly execute this idea rather than to just toss it off without
          putting in the work to make it valuable is exactly what irritates me
          about a lot of AI work. You can be 900 times as productive at
          producing mental popcorn, but if there was value to be had here we're
          not getting it, just a whiff of it. Sure, fun project. But I don't
          feel particularly judged here. The funniest bit is the judgment on
          things that clearly could not yet have come to pass (for instance
          because there is an exact date mentioned that we have not yet
          reached). QA could be better.
       
            godelski wrote 1 day ago:
            I think you're missing the actual problem.
            
            I'm not worried about this project but instead harvesting, 
            analyzing all that data and deanonymizing people.
            
            That's exactly what Karparthy is saying. He's not being shy about
            it. He said "behave because the future panopticon can look into the
            past". Which makes the panopticon effectively exist now.
            
              Be good, future LLMs are watching
              ...
              or humans using them might be
            
            That's the problem. Not the accuracy of this toy project, but the
            idea of monitoring everyone and their entire history.
            
            The idea that we have to behave as if we're being actively watched
            by the government is literally the setting of 1984 lol. The idea
            that we have to behave that way now because a future government
            will use the Panopticon to look into the past is absolutely
            unhinged. You don't even know what the rules of that world will be!
            
            Did we forget how unhinged the NSA's "harvest now, decrypt later"
            strategy is? Did we forget those giant data centers that were all
            the news talked about for a few weeks?
            
            That's not the future I want to create, is it the one you want?
            
            To act as if that future is unavoidable is a failure of *us*
       
              jacquesm wrote 1 day ago:
              Yes, you are right, this is a real problem. But it really is just
              a variation on 'the internet never forgets', for instance in
              relation to teen behavior online. But AI allows for weaponization
              of such information. I wish the wannabe politicians of 2050 much
              good luck with their careers, they are going to be the most
              boring people available.
       
                godelski wrote 1 day ago:
                The internet never forgets but you could be anonymous. Or at
                least somewhat. But that's getting harder and harder
                
                If such a thing isn't already possible (it is to a certain
                extent), we are headed towards a point where your words alone
                will be enough to fingerprint you.
       
                  jacquesm wrote 23 hours 20 min ago:
                  Stylometry killed that a long time ago. There was a website,
                  stylometry.net that coupled HN accounts based on text
                  comparison and ranked the 10 best candidates. It was
                  incredibly accurate and allowed id'ing a bunch of people that
                  had gotten banned but that came back again. Based on that I
                  would expect that anybody that has written more than a few KB
                  of text to be id'able in the future.
       
                    godelski wrote 22 hours 44 min ago:
                    You need a person's text with their actual identity to pull
                    that off. Normally that's pretty hard, especially since
                    you'll get different formats. Like I don't write the same
                    way on Twitter as HN. But yeah, this stuff has been
                    advancing and I don't think it is okay.
       
                      jacquesm wrote 22 hours 28 min ago:
                      The AOL scandal pretty much proved that anonymity is a
                      mirage. You may think you are anonymous but it just takes
                      combining a few unrelated databases to de-anonymize you.
                      HN users think they are anonymous but they're not, they
                      drop factoids all over the place about who they are. 33
                      bits... it is one of my recurring favorite themes and
                      anybody in the business of managing other people's data
                      should be well aware of the risks.
       
                        godelski wrote 6 hours 44 min ago:
                        I think you're being too conspiracy theorist here by
                        making everything black and white.
                        
                        Besides, the main problem of how difficult it is to
                        deanonymize, not if possible.
                        
                        Privacy and security both have to perfect defense. For
                        example, there's no passwords that are unhackable.
                        There are only passwords that cannot be hacked with our
                        current technology, budgets, and lifetime. But you
                        could brute force my HN password, it would just take
                        billions of years.
                        
                        The same distinction it's important here. My threat
                        model on HN doesn't care if you need to spend millions
                        of dollars nor thousands of hours to deanonymize me. My
                        handle is here to discourage that and to allow me to
                        speak more freely about certain topics. I'm not trying
                        to hide from nation states, I'm trying to hide from my
                        peers in AI and tech. So I can freely discuss my
                        opinions, which includes criticizing my own community
                        (something I think everyone should do! Be critical of
                        the communities we associate with). And moreso I want
                        people to consider my points on their merit alone, not
                        on my identity nor status.
                        
                        If I was trying to hide from nation states I'd do
                        things very very differently, such as not posting on
                        HN.
                        
                        I'm not afraid of my handle being deanonymized, but I
                        still think we should recognize the dangers of the
                        future we are creating.
                        
                        By oversimplifying you've created the position that
                        this is a lost cause, as if we already lost and that
                        because we lost we can't change. There are multiple
                        fallacies here. The future has yet to be written.
                        
                        If you really believe it is deterministic then what is
                        the point to anything? To have desires it opinions? Are
                        were just waiting to see which algorithm wins out? Or
                        are we the algorithms playing themselves out? If it's
                        deterministic wouldn't you be happy if the freedom
                        algorithm won and this moment is an inflection in your
                        programming? I guess that's impossible to say in an
                        objective manner but I'd hope that's how it plays out
       
        ComputerGuru wrote 1 day ago:
        Looking at the results and the prompt, I would tweak the prompt to
        
        * ignore comments that do not speculate on something that was unknown
        or had not achieved consensus as of the date of yyyy-mm-dd
        
        * at the same time, exclude speculations for which there still isnât
        a definitive answer or consensus today
        
        * ignore comments that speculate on minor details or are stating a
        preference/opinion on a subjective matter
        
        * it is ok to generate an empty list of users for a thread if there are
        no comments meeting the speculation requirements laid out above
        
        * etc
       
          losvedir wrote 1 day ago:
          Agreed. I feel like it's more just a collection of good comments. It
          doesn't surprise me to see tptacek, patio11, etc there. I think the
          "prediction" aspect is under weighted.
          
          But it reminds me that I miss Manishearth's comments! What ever
          happened to him? I recall him being a big rust contributor. I'd think
          he'd be all over the place, with rust's adoption since then. I also
          liked tokenadult. interesting blast from the past.
       
          xpe wrote 1 day ago:
          Good points. To summarize: for a given comment, one presumably must
          downselect to the ones that can reasonably be interpreted as
          forecasts. I see some indicators that  the creator of the project
          (despite his amazing reputation) skated over this part.
       
          janalsncm wrote 1 day ago:
          You would also need to exclude âpredictionsâ for things which
          already happened at the time they were predicted.
       
        smugma wrote 1 day ago:
        I believe that the GPA calculation is off, maybe just for F's.
        
        I scrolled to the bottom of the hall of fame/shame and saw that entry
        #1505 and 3 F's and a D, with an average grade of D+ (1.46).
        
        No grade better than a D shouldn't average to a D+, I'd expect it to be
        closer to a 0.25.
       
        karmickoala wrote 1 day ago:
        I understand the exercise, but I think it should have a disclaimer,
        some of the LLM reviews are showing a bias and when I read the comments
        they turned out not to be as bad as the LLM made them. As this hits the
        front page, some people will only read the title and not the
        accompanying blog post, losing all of the nuance.
        
        That said, I understand the concept and love what you did here. By this
        being exposed to the best disinfectant, I hope it will raise awareness
        and show how people and corporations should be careful about its usage.
        Now this tech is accessible to anyone, not only big techs, in a couple
        of hours.
        
        It also shows how we should take with a grain of salt the result of any
        analysis of such scale by a LLM. Our private channels now and messages
        on software like Teams and Slack can be analyzed to hell by our AI
        overlords. I'm probably going to remove a lot of things from cloud
        drives just in case. Perhaps online discourse will deteriorate to more
        inane / LinkedIn style content.
        
        Also, I like that your prompt itself has some purposefully leaked bias,
        which shows other risksâÂ¹for instance, "fsflover: F", which may
        align the LLM to grade worse the handles that are related to free
        software and open source).
        
        As a meta concept of this, I wonder how I'll be graded by our AI
        overlords in the future now that I have posted something dismissive of
        it.
        
        Â¹Alt+0151
       
        Rperry2174 wrote 1 day ago:
        One thing this really highlights to me is how often the "boring" takes
        end up being the most accurate. The provocative, high-energy threads
        are usually the ones that age the worst.
        
        If an LLM were acting as a kind of historian revisiting todayâs
        debates with future context, Iâd bet it would see the same pattern
        again and again: the sober, incremental claims quietly hold up, while
        the hyperconfident ones collapse.
        
        Something like "Lithium-ion battery pack prices fall to $108/kWh" is
        classic cost-curve progress. Boring, steady, and historically extremely
        reliable over long horizons. Probably one of the most likely headlines
        today to age correctly, even if it gets little attention.
        
        On the flip side, stuff like "New benchmark shows top LLMs struggle in
        real mental health care" feels like high-risk framing. Benchmarks
        rotate constantly, and âstruggleâ headlines almost always age badly
        as models jump whole generations.
        
        I bet theres many "boring but right" takes we overlook today and I
        wondr if there's a practical way to surface them before hindsight does
       
          schoen wrote 1 day ago:
          I predict that, in 2035, 1+1=2. I also predict that, in 2045, 2+2=4.
          I also predict that, in 2055, 3+3=6.
          
          By 2065, we should be in possession of a proof that 0+0=0. Hopefully
          by the following year we will also be able to confirm that 0*0=0.
          
          (All arithmetic here is over the natural numbers.)
       
          0manrho wrote 1 day ago:
          It's because algorithmic feeds based on "user engagement" rewards
          antagonism. If your goal is to get eyes on content, being boring,
          predictable and nuanced is a sure way to get lost in the ever
          increasing noise.
       
          jimbokun wrote 1 day ago:
          The one about LLMs and mental health is not a prediction but a
          current news report, the way you phrased it.
          
          Also, the boring consistent progress case for AI plays out in the end
          of humans as viable economic agents requiring a complete reordering
          of our economic and political systems in the near future.  So the
          âboring but rightâ prediction today is completely terrifying.
       
            p-e-w wrote 1 day ago:
            âBoringâ predictions usually state that things will continue to
            work the way they do right now. Which is trivially correct, except
            in cases where it catastrophically isnât.
            
            So the correctness of boring predictions is unsurprising, but also
            quite useless, because predicting the future is precisely about
            predicting those events which donât follow that pattern.
       
          xpe wrote 1 day ago:
          > One thing this really highlights to me is how often the "boring"
          takes end up being the most accurate.
          
          Would the commenter above mind sharing the method behind of their
          generalization? Many people would spot check maybe five items --
          which is enough for our brains to start to guess at potential
          patterns -- and stop there.
          
          On HN, when I see a generalization, one of my mental checklist items
          is to ask "what is this generalization based on?" and "If I were to
          look at the problem with fresh eyes, what would I conclude?".
       
          copperx wrote 1 day ago:
          Is this why depressed people often end up making the best
          predictions?
          
          In personal situations there's clearly a self fulfilling prophecy
          going on, but when it comes to the external world, the predictions
          come out pretty accurate.
       
          johnfn wrote 1 day ago:
          This suggests that the best way to grade predictions is some sort of
          weighting of how unlikely they were at the time. Like, if you were to
          open a prediction market for statement X, some sort of grade of the
          delta between your confidence of the event and the âexpectedâ
          value, summed over all your predictions.
       
            jacquesm wrote 1 day ago:
            Exactly, that's the element that is missing. If there are 50
            comments against and one pro and that pro has it in the longer term
            then that is worth noticing, not when there are 50 comments pro and
            you were one of the 'pros'.
            
            Going against the grain and turning out right is far more valuable
            than being right consistently when the crowd is with you already.
       
              mcmoor wrote 1 day ago:
              Yeah a simple of total points of pro comments vs total points of
              con comments may be simple and exact enough to simulate a
              prediction market. I don't know if it can be included in the
              prompt or better to be vibecoded in directly.
       
          yunwal wrote 1 day ago:
          "Boring but right" generally means that this prediction is already
          priced in to our current understanding of the world though. Anyone
          can reliably predict "the sun will rise tomorrow", but I'm not giving
          them high marks for that.
       
            Gravityloss wrote 1 day ago:
            something like correctness^2 x novel information content rank?
       
              Gravityloss wrote 16 hours 21 min ago:
              Actually now thinking about it, incorrect information has
              negative value so the metric should probably reflect that.
       
            onraglanroad wrote 1 day ago:
            I'm giving them higher marks than the people who say it won't.
            
            LLMs have seen huge improvements over the last 3 years. Are you
            going to make the bet that they will continue to make similarly
            huge improvements, taking them well past human ability, or do you
            think they'll plateau?
            
            The former is the boring, linear prediction.
       
              Dylan16807 wrote 1 day ago:
              LLMs aren't getting better that fast.  I think a linear
              prediction says they'd need quite a while to maybe get "well past
              human ability", and if you incorporate the increases in training
              difficulty the timescale stretches wide.
       
              bigiain wrote 1 day ago:
              LaunchHN: Announcing Twoday, our new YC backed startup coming out
              of stealth mode.
              
              Weâre launching a breakthrough platform that leverages frontier
              scale artificial intelligence to model, predict, and dynamically
              orchestrate solar luminance cycles, unlocking the worldâs first
              synthetic second sunrise by Q2 2026. By combining physics
              informed multimodal models with real time atmospheric
              optimisation, weâre redefining whatâs possible in climate
              scale AI and opening a new era of programmable daylight.
       
                rznicolet wrote 1 day ago:
                You joke, but, alas, there is a _real_ company kinda trying to
                do this.  Reflect Orbital[1] wants to set up space mirrors, so
                you can have daytime at night for your solar panels!  (Various
                issues, like around light pollution and the fact that looking
                up at the proposed satellites with binoculars could cause eye
                damage... don't seem to be on their roadmap.)  This is one idea
                that's going to age badly whether or not they actually launch
                anything, I suspect.
                
                Battery tech is too boring, but seems more likely to manage
                long-term effectiveness.
                
 (HTM)          [1]: https://www.reflectorbital.com
       
                  mananaysiempre wrote 17 hours 59 min ago:
                  Reflecting sunlight from orbit is an idea that had been
                  talked about for a couple of decades even before Znamya-2[1]
                  launched in 1992. The materials science needed to unfurl
                  large surfaces in space seems to be very difficult, whether
                  mirrors or sails.
                  
 (HTM)            [1]: https://en.wikipedia.org/wiki/Znamya_(satellite)
       
              bryanrasmussen wrote 1 day ago:
              >The former is the boring, linear prediction.
              
              right, because if there is one thing that history shows us again
              and again is that things that have a period of huge improvements
              never plateau but instead continue improving to infinity.
              
              Improvement to infinity, that is the sober and wise bet!
       
                pixl97 wrote 1 day ago:
                Tiger: humans will never beat tigers because tigers are purpose
                built killing machines and they are just generalist --40,000BC
       
                  OccamsMirror wrote 1 day ago:
                  You don't think humans hunted tigers in 40,000BC?
       
                p-e-w wrote 1 day ago:
                The prediction that a new technology that is being heavily
                researched plateaus after just 5 years of development is
                certainly a daring one. I canât think of an example from
                history where that happened.
       
                  OccamsMirror wrote 1 day ago:
                  Perhaps the fact that you think this field is only 5 years
                  old means you're probably not enough of an authority to
                  comment confidently on it?
       
                    p-e-w wrote 1 day ago:
                    Claiming that AI in anything resembling its current form is
                    older than 5 years is like claiming the history of the
                    combustion engine started when an ape picked up a burning
                    stick.
       
                      OccamsMirror wrote 22 hours 39 min ago:
                      Your analogy fails because picking up a burning stick
                      isnât a combustion engine, whereas decades of
                      neural-net and sequence-model work directly enabled
                      modern LLMs. LLMs arenât âfive years oldâ; the
                      scaling-transformer regime is. The components are old,
                      the emergent-capability configuration is new.
                      
                      Treating the age of the lineage as evidence of future
                      growth is equivocation across paradigms. Technologies
                      plateau when their governing paradigm saturates, not when
                      the calendar says they should continue. Supersonic flight
                      stalled immediately, fusion has stalled for seventy
                      years, and neither cared about âtime invested.â
                      
                      Early exponential curves routinely flatten: solar cells,
                      battery density, CPU clocks, hard-disk areal density. The
                      only question that matters is whether this paradigm shows
                      signs of saturation, not how long it has existed.
       
                        bryanrasmussen wrote 20 hours 29 min ago:
                        I think this is the first time I have ever posted one
                        of these but thank you for making the argument so well.
       
                  gitremote wrote 1 day ago:
                  Neural network research and development existed since the
                  1980s at least, so at least 40 years. One of the bottlenecks
                  before was not enough compute.
       
              yunwal wrote 1 day ago:
              >  Are you going to make the bet that they will continue to make
              similarly huge improvements
              
              Sure yeah why not
              
              > taking them well past human ability,
              
              At what? They're already better than me at reciting historical
              facts. You'd need some actual prediction here for me to give you
              "prescience".
       
                Terr_ wrote 1 day ago:
                I imagine "better" in this case depends on how one scores "I
                don't know" or confident-sounding falsehoods.
                
                Failures aren't just a ratio, they're a multi-dimensional
                shape.
       
                irishcoffee wrote 1 day ago:
                > At what? They're already better than me at reciting
                historical facts.
                
                I wonder what happens if you ask deepseek about Tiananmen
                Squareâ¦
                
                Edit: my âsubtleâ point was, we already know LLMs censor
                history. Trusting them to honestly recite historical facts is
                how history dies. âThe victor writes historyâ has never
                been more true. Terrifying.
       
                  Dylan16807 wrote 1 day ago:
                  > Edit: my âsubtleâ point was, we already know LLMs
                  censor history. Trusting them to honestly recite historical
                  facts is how history dies.
                  
                  I mean, that's true but not very relevant.  You can't trust a
                  human to honestly recite historical facts either.  Or a book.
                  
                  > âThe victor writes historyâ has never been more true.
                  
                  I don't see how.
       
                janalsncm wrote 1 day ago:
                âAt what?â is really the key question here.
                
                A lot of the press likes to paint âAIâ as a uniform field
                that continues to improve together. But really itâs a bunch
                of related subfields. Once in a blue moon a technique from one
                subfield crosses over into another.
                
                âAIâ can play chess at superhuman skill. âAIâ can also
                drive a car. That doesnât mean Waymo gets safer when we
                increase Stockfishâs elo by 10 points.
       
                onraglanroad wrote 1 day ago:
                At every intellectual task.
                
                They're already better than you at reciting historical facts.
                I'd guess they're probably better at composing poems (they're
                not great but far better than the average person).
                
                Or you agree with me? I'm not looking for prescience marks, I'm
                just less convinced that people really make the more boring and
                obvious predictions.
       
                  autoexec wrote 1 day ago:
                  > They're already better than you at reciting historical
                  facts.
                  
                  They're better at regurgitating historical facts than me
                  because they were trained on historical facts written by many
                  humans other than me who knew a lot more historical facts.
                  None of those facts came from an LLM. Every historical fact
                  that isn't entirely LLM generated nonsense came from a human.
                  It's the humans that were intelligent, not the fancy
                  autocomplete.
                  
                  Now that LLMs have consumed the bulk of humanity's written
                  knowledge on history what's left for it to suck up will be
                  mainly its own slop. Exactly because LLMs are not even a
                  little bit intelligent they will regurgitate that slop with
                  exactly as much ignorance as to what any of it means as when
                  it was human generated facts, and they'll still spew it back
                  out with all the confidence they've been programed to
                  emulate. I predict that the resulting output will
                  increasingly shatter the illusion of intelligence you've so
                  thoroughly fallen for so far.
       
                  blibble wrote 1 day ago:
                  > They're already better than you at reciting historical
                  facts.
                  
                  so is a textbook, but no-one argues that's intelligent
       
                  janalsncm wrote 1 day ago:
                  To be clear, you are suggesting âhuge improvementsâ in
                  âevery intellectual taskâ?
                  
                  This is unlikely for the trivial reason that some tasks are
                  roughly saturated. Modest improvements in chess playing
                  ability are likely. Huge improvements probably not. Even more
                  so for arithmetic. We pretty much have that handled.
                  
                  But the more substantive issue is that intellectual tasks are
                  not all interconnected. Getting significantly better at
                  drawing hands doesnât usually translate to executive
                  planning or information retrieval.
       
                    yunwal wrote 1 day ago:
                    Thereâs plenty of room to grow for LLMs in terms of chess
                    playing ability considering chess engines have them beat by
                    around 1500 ELO
       
                      janalsncm wrote 1 day ago:
                      Sorry, I now realize this thread is about whether LLMs
                      can improve on tasks and not whether AI can. Agreed
                      thereâs a lot of headroom for LLMs, less so for AI as a
                      whole.
       
                  yunwal wrote 1 day ago:
                  What is an intellectual task? Once again, there's tons of
                  stuff LLMs won't be trained on in the next 3 years. So it
                  would be trivial to just find one of those things and say
                  voila! LLMs aren't better than me at that.
                  
                  I'll make one prediction that I think will hold up. No
                  LLM-based system will be able to take a generic ask like
                  "hack the nytimes website and retrieve emails and password
                  hashes of all user accounts" and do better than the best
                  hackers and penetration testers in the world, despite having
                  plenty of training data to go off of. It requires out-of-band
                  thinking that they just don't possess.
       
                    hathawsh wrote 1 day ago:
                    I'll take a stab at this: LLMs currently seem to be rather
                    good at details, but they seem to struggle greatly with the
                    overall picture, in every subject.
                    
                    - If I want Claude Code to write some specific code, it
                    often handles the task admirably, but if I'm not sure what
                    should be written, consulting Claude takes a lot of time
                    and doesn't yield much insight, where as 2 minutes with a
                    human is 100x more valuable.
                    
                    - I asked ChatGPT about some political event. It mirrored
                    the mainstream press. After I reminded it of some obvious
                    facts that revealed a mainstream bias, it agreed with me
                    that its initial answer was wrong.
                    
                    These experiences and others serve to remind me that
                    current LLMs are mostly just advanced search engines. They
                    work especially well on code because there is a lot of
                    reasonably good code (and tutorials) out there to train on.
                    LLMs are a lot less effective on intellectual tasks that
                    humans haven't already written and published about.
       
                      medler wrote 1 day ago:
                      > it agreed with me that its initial answer was wrong.
                      
                      Most likely that was just its sycophancy programming
                      taking over and telling you what you wanted to hear
       
            SubiculumCode wrote 1 day ago:
            Perhaps a new category, 'highest risk guess but right the most
            often'. Those is the high impact predictions.
       
              arjie wrote 1 day ago:
              Prediction markets have pretty much obviated the need for these
              things. Rather than rely on "was that really a hot take?" you
              have a market system that rewards those with accurate hot takes.
              The massive fees and lock-up period discourage low-return bets.
       
                gammarator wrote 1 day ago:
                Canât wait for the brave new world of individuals âmatch
                fixingâ outcomes on Polymarket.
       
                  Karrot_Kream wrote 1 day ago:
                  As opposed to the current world of brigading social media
                  threads to make consensus look like it goes your way and then
                  getting journalists scraping by on covering clickbait to
                  cover your brigading as fact?
       
                Karrot_Kream wrote 1 day ago:
                FWIW Polymarket (which is one of the big markets) has no
                lock-up period and, for now while they're burning VC coins, no
                fees. Otherwise agree with your point though.
       
          simianparrot wrote 1 day ago:
          Instead of "LLM's will put developers out of jobs" the boring reality
          is going to be "LLM's are a useful tool with limited use".
       
            jimbokun wrote 1 day ago:
            That is at odds with predicting based on recent rates of progress.
       
        mistercheph wrote 1 day ago:
        A majority don't seem to be predictions about the future, and it seems
        to mostly like comments that give extended air to what was then and now
        the consensus viewpoint, e.g. the top comment from pcwalton the highest
        scored user: [1] > (Copying my comment here from Reddit /r/rust:)
        Just to repeat, because this was somewhat buried in the article: Servo
        is now a multiprocess browser, using the gaol crate for sandboxing.
        This adds (a) an extra layer of defense against remote code execution
        vulnerabilities beyond that which the Rust safety features provide; (b)
        a safety net in case Servo code is tricked into performing insecure
        actions.
        There are still plenty of bugs to shake out, but this is a major
        milestone in the project.
        
 (HTM)  [1]: https://news.ycombinator.com/item?id=10657401
       
        tptacek wrote 1 day ago:
        'pcwalton, I'm coming for you. You're going down.
        
        Kidding aside, the comments it picks out for us are a little random.
        For instance, this was an A+ predictive thread (it appears to be rating
        threads and not individual comments): [1] But there's just 11 comments,
        only 1 for me, and it's like a 1-sentence comment.
        
        I do love that my unaccredited-access-to-startup-shares take is on that
        leaderboard, though.
        
 (HTM)  [1]: https://news.ycombinator.com/item?id=10703512
       
          mvkel wrote 16 hours 26 min ago:
          Hilariously, it seems you anticipated this happening and copyrighted
          your comments. Is karpathy's tool in violation of your copyright?!
       
            tptacek wrote 16 hours 19 min ago:
            Karpathy, I'm coming for you next.
       
          n4r9 wrote 22 hours 59 min ago:
          Yeah, I'm having to pinch myself a little here. Another slightly odd
          example it picked out from your history: [1] It's a good comment, but
          "prescient" isn't a word I'd apply to it. This is more like a list of
          solid takes. To be fair there probably aren't even that many
          explicit, correct predictions in one month of comments in 2015.
          
 (HTM)    [1]: https://news.ycombinator.com/item?id=10735398
       
          kbenson wrote 1 day ago:
          I noticed from reviewing my own entry (which honestly I'm surprised
          exists) that the idea of what it thinks constitutes a "prediction" is
          fairly open to interpretation, or at least that adding some nuance to
          a small aspect in a thread to someone else prediction counts quite
          heavily. I don't really view how I've participated here over the
          years in any way as making predictions. I actually thought I had done
          a fairly good job at not making predictions, by design.
       
        scosman wrote 1 day ago:
        Anyone have a branch that I can run to target my own comments? I'd love
        to see where I was right and where I was off base. Seems like a
        genuinely great way to learn about my own biases.
       
          xpe wrote 1 day ago:
          I appreciate your intent, but this tool needs a lot of work -- maybe
          an entire redesign -- before it would be suitable for the purpose you
          seek. See discussion at [1].
          
          Besides, in my experience, only a tiny fraction of HN comments can be
          interpreted as falsifiable predictions.
          
          Instead I would recommend learning about calibration [2] and ways to
          improve one's calibration, which will likely lead you into literature
          reviews of cognitive biases and what we can do about them. Also,
          jumping into some prediction markets (as long as they don't become
          too much of a distraction) is good practice.
          
          [1]
          
 (HTM)    [1]: https://news.ycombinator.com/item?id=46223959
 (HTM)    [2]: https://www.lesswrong.com/w/calibration
       
        0xWTF wrote 1 day ago:
        Now: compared to what? Is there a better source than HN? How's it
        compare to Reddit or lobsters?
        
        Compared to what happens next? Does tptacek's commentary become market
        signal equivalent to the Fed Chair or the BLS labor and inflation
        reports?
       
          tptacek wrote 1 day ago:
          What makes you think it already isn't?
       
            jacquesm wrote 1 day ago:
            You've made me billions by now! Thank you...
       
        btbuildem wrote 1 day ago:
        I've spent a weekend making something similar for my gmail account
        (which google keeps nagging me about being 90% full). It's fascinating
        to be able to classify 65k+ of emails (surprise: more than half are
        garbage), as well as summarize and trace the nature of communication
        between specific senders/recipients. It took about 50 hours on a dual
        RTX 3090 running Qwen 3.
        
        My original goal was to prune the account deleting all the useless
        things and keeping just the unique, personal, valuable communications
        -- but the other day, an insight has me convinced that the safer /
        smarter thing to do in the current landscape is the opposite: remove
        any personal, valuable, memorable items, and leave google (and whomever
        else is scraping these repositories) with useless flotsam of
        newsletters, updates, subscription receipts, etc.
       
          subscriptzero wrote 11 hours 46 min ago:
          I would love to do something like this, and weirdly I even have a
          dual 3090 home setup.
          
          Any chance you can outline the steps/prompts/tools you used to run
          this?
          
          I've been building a 2nd brain type project, that plugs into all my
          work places and a custom classifier has been on that list that would
          enhance that.
       
          red-iron-pine wrote 12 hours 1 min ago:
          so then what do you do with the useful stuff?
       
        neilv wrote 1 day ago:
        > I spent a few hours browsing around and found it to be very
        interesting.
        
        This seems to be the result of the exercise?  No evaluation?
        
        My concern is that, even if the exercise is only an amusing curiosity,
        many people will take the results more seriously than they should, and
        be inspired to apply the same methods to products and initiatives that
        adversely affect people's lives in real ways.
       
          cootsnuck wrote 1 day ago:
          > My concern is that, even if the exercise is only an amusing
          curiosity, many people will take the results more seriously than they
          should, and be inspired to apply the same methods to products and
          initiatives that adversely affect people's lives in real ways.
          
          That will most definitely happen. We already have known for awhile
          that algorithmic methods have been applied "to products and
          initiatives that adversely affect people's lives in real ways", for
          awhile: [1] I guess the question is if LLMs for some reason will
          reinvigorate public sentiment / pressure for governing bodies to
          sincerely take up the ongoing responsibility of   trying to lessen
          the unique harms that can be amplified by reckless implementation of
          algorithms.
          
 (HTM)    [1]: https://www.scientificamerican.com/blog/roots-of-unity/revie...
       
        slg wrote 1 day ago:
        This is a perfect example of the power and problems with LLMs.
        
        I took the narcissistic approach of searching for myself. Here's a
        grade of one of my comments[1]:
        
        >slg: B- (accurate characterization of PHâs ânetworking & facadeâ
        feel, but implicitly underestimates how long that model can persist)
        
        And here's the actual comment I made[2]:
        
        >And maybe it is the cynical contrarian in me, but I think the "real
        world" aspect of Product Hunt it what turned me off of the site before
        these issues even came to the forefront. It always seemed like an echo
        chamber were everyone was putting up a facade. Users seemed more
        concerned with the people behind products and networking with them than
        actually offering opinions of what was posted.
        
        >I find the more internet-like communities more natural. Sure, the top
        comment on a Show HN is often a critique. However I find that more
        interesting than the usual "Wow, another great product from John
        Developer. Signing up now." or the "Wow, great product. Here is why you
        should use the competing product that I work on." that you usually see
        on Product Hunt.
        
        I did not say nor imply anything about "how long that model can
        persist", I just said I personally don't like using the site.  It's a
        total hallucination to claim I was implying doom for "that model" and
        you would only know that if you actually took the time to dig into the
        details of what was actually said, but the summary seems plausible
        enough that most people never would.
        
        The LLM processed and analyzed a huge amount of data in a way that no
        human could, but the single in-depth look I took at that analysis was
        somewhere between misleading and flat out wrong.  As I said, a perfect
        example of what LLMs do.
        
        And yes, I do recognize the funny coincidence that I'm now doing the
        exact thing I described as the typical HN comment a decade ago.  I
        guess there is a reason old me said "I find that more interesting". [1]
        - [1] [2] -
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/2015-12-18/index.html#article-10...
 (HTM)  [2]: https://news.ycombinator.com/item?id=10761980
       
          npunt wrote 7 hours 20 min ago:
          I'm not so sure; that may not have been what you meant, but that
          doesn't mean it's not what others read into it. The broader context
          is HN is a startup forum and one of the most common discussion
          patterns is 'I don't like it' that is often a stand-in for 'I don't
          think it's viable as-is'. Startups are default dead, after all.
          
          With that context, if someone were to read your comment and be asked
          'does this person think the product's model is viable in the long
          run' I think a lot of people would respond 'no'.
       
        hackthemack wrote 1 day ago:
        I noticed the Hall of Fame grading of predictive comments has a quirk?
        It grades some comments about if they came true or not, but in the
        grading of comment to the article [1] The Cannons on the B-29 Bomber
        "accurate account of LeMay stripping turrets and shifting to incendiary
        area bombing; matches mainstream history"
        
        It gave a good grade to user cstross but to my reading of the comment,
        cstross just recounted a bit of old history. The evaluation gave
        cstross for just giving a history lesson or no?
        
 (HTM)  [1]: https://news.ycombinator.com/item?id=10654216
       
          karpathy wrote 1 day ago:
          Yes I noticed a few of these around. The LLM is a little too willing
          to give out grades for comments that were good/bad in a bit more
          general sense, even if they weren't making strong predictions
          specifically. Another thing I noticed is that the LLM has a very
          impressive recognition of the various usernames and who they belong
          to, and I think shows a little bit of a bias in its evaluations based
          on the identity of the person. I tuned the prompt a little bit based
          on some low-hanging fruit mistakes but I think one can most likely
          iterate it quite a bit further.
       
            patcon wrote 1 day ago:
            I think you were getting at this, but in case others didn't know:
            cstross is a famous sci-fi author and futurist :)
       
        mvdtnz wrote 1 day ago:
        Do we need more AI slop on the front page?
       
        Bjartr wrote 1 day ago:
        Neat, I got a shout-out. Always happy to share the random stuff I
        remember exists!
       
        GaggiX wrote 1 day ago:
        I was reading the Anki article on 2015-12-13, and the best prediction
        was by markm248 saying: "Remember that you read it here first, there
        will be a unicorn built on the concept of SRS"
        
        They were right, Duolingo.
       
          mtlynch wrote 1 day ago:
          Duolingo existed for a while at that point and was already valued at
          $500M by end of 2015.
       
            GaggiX wrote 1 day ago:
            It became a unicorn in December 2019 tho, 4 years later.
       
        modeless wrote 1 day ago:
        This is a cool idea. I would install a Chrome extension that shows a
        score by every username on this site grading how well their expressed
        opinions match what subsequently happened in reality, or the accuracy
        of any specific predictions they've made. Some people's opinions are
        closer to reality than others and it's not always correlated with
        upvotes.
        
        An extension of this would be to grade people on the accuracy of the
        comments they upvote, and use that to weight their upvotes more in
        ranking. I would love to read a version of HN where the only upvotes
        that matter are from people who agree with opinions that turn out to be
        correct. Of course, only HN could implement this since upvotes are
        private.
       
          emaro wrote 21 hours 38 min ago:
          I like the idea and certainly would try it. Although I feel in a way
          this would be an anti-thesis to HN. HN tries to foster curiosity, but
          if you're (only) ranked by the accuracy of your predictions, this
          could give the incentive to always fall back to a save and boring
          position.
       
          prawn wrote 23 hours 46 min ago:
          Didn't Slashdot have something like the second point with their
          meta-moderation, many many years ago?
       
            ssl-3 wrote 10 hours 54 min ago:
            Sorta.
            
            IIRC, when comment moderation and scoring came to Slashdot, only a
            random (and changing) selection of users were able to moderate.
            
            Meta-moderation came a bit later.  It allowed people to review
            prior moderation actions and evaluate the worth of those actions.
            
            Those users who made good moderations were more likely to become a
            mod again in the future than those who made bad moderations.
            
            The meta-mods had no idea whose actions they were evaluating, and
            previous/potential mods had no idea what their score was.  That
            anonymity helped keep it honest and harder to game.
       
          potato3732842 wrote 1 day ago:
          >This is a cool idea. I would install a Chrome extension that shows a
          score by every username on this site grading how well their expressed
          opinions match what subsequently happened in reality, or the accuracy
          of any specific predictions they've made.
          
          Why stop there?
          
          If you can do that you can score them on all sorts of things.    You
          could make a  "this person has no moral convictions and says whatever
          makes the number go up" score.    Or some other kind of score.
          
          Stuff like this makes the community "smaller" in a way.  Like back in
          the old days on forums and IRC you knew who the jerks were.
       
          leobg wrote 1 day ago:
          Thatâs what Elonâs vision was before he ended up buying Twitter.
          Keep a digital track record for journalists. He wanted to call it
          Pravda.
          
          (And we do have that in real life. Just as, among friends, we do keep
          track of who is in whose debt, we also keep a mental map of whose
          voice we listen to. Old school journalism still had that, where
          people would be reading someoneâs column over the course of
          decades. On the internet, we donât have that, or we have it
          rarely.)
       
          8organicbits wrote 1 day ago:
          The problem seems underspecified; what does it mean for a comment to
          be accurate? It would seem that comments like "the sun will rise
          tomorrow" would rank highest, but they aren't surprising.
       
            smeeger wrote 18 hours 51 min ago:
            just because an idea is qualitative doesn't mean its invalid
       
          TrainedMonkey wrote 1 day ago:
          I long had a similar idea for stocks. Analyze posts of people giving
          stock tips on WSB, Twitter, etc and rank by accuracy. I would be very
          surprised if this had not been done a thousand times by various
          trading firms and enterprising individuals.
          
          Of course in the above example of stocks there are clear predictions
          (HNWS will go up) and an oracle who resolves it (stock market). This
          seems to be a way harder problem for generic free form comments. Who
          resolves what prediction a particular comment has made and whether it
          actually happened?
       
            mvkel wrote 16 hours 35 min ago:
            Out of curiosity, I built this. I extended karpathy's code and
            widened the date range to see what stocks these users would pick
            given their sentiments.
            
            What came back were the usual suspects: GLP-1 companies and AI.
            
            Back to the "boring but right" thesis. Not much alpha to be found
       
            miki123211 wrote 21 hours 39 min ago:
            > Analyze posts of people giving stock tips on WSB, Twitter, etc
            and rank by accuracy.
            
            Didn't somebody make an ETF once that went against the prediction
            of some famous CNBC stock picker, showing that it would have given
            you alpha in the past.
            
            > seems to be a way harder problem for generic free form comments.
            
            That's what prediction markets are for. People for whom truth and
            accuracy matters (often concentrated around the rationalist
            community) will often very explicitly make annual lists of concrete
            and quantifiable predictions, and then self-grade on them later.
       
              red-iron-pine wrote 12 hours 4 min ago:
              Cramer is the stock picker guy.  There is a well known "Cramer
              Effect" or "Cramer Bounce" where the stock peaks then drops hard.
              
              Makes for great pump n dump if you're day trading and willing to
              ride [1] long-term his choices don't do well, so the Inverse
              Cramer basically says "do the opposite of this goober" and has
              solid returns (sorta; depends a lot on methodology, and the sole
              hedgefund playing that strategy shutdown)
              
 (HTM)        [1]: https://www.investopedia.com/terms/c/cramerbounce.asp
       
              Natsu wrote 13 hours 45 min ago:
              You probably mean Inverse Cramer:
              
 (HTM)        [1]: https://finbold.com/inverse-cramer-leaves-sp-nasdaq-and-...
       
            Karrot_Kream wrote 1 day ago:
            I ran across Sybil [1] the other day which tries to offer a
            reputation score based on correct predictions in prediction
            markets.
            
            [1] 
            
 (HTM)      [1]: https://sybilpredicttrust.info/
       
          cootsnuck wrote 1 day ago:
          The RES (Reddit Enhancement Suite) browser extension indirectly does
          this for me since it tracks the lifetime number of upvotes I give
          other users. So when I stumble upon a thread with a user with like
          +40 I know "This is someone whom I've repeatedly found to have good
          takes" (depending on the context).
          
          It's subjective of course but at least it's transparently so.
          
          I just think it's neat that it's kinda sorta a loose proxy for what
          you're talking about but done in arguably the simplest way possible.
       
            janalsncm wrote 1 day ago:
            That assumes your upvotes in the past were a good proxy for being
            correct today. You could have both been wrong.
       
            nickff wrote 1 day ago:
            I am not a Redditor, but RES sounds like it would increase the
            âecho-chamberâ effect, rather than improving oneâs
            understanding of contributorsâ calibration.
       
              baq wrote 21 hours 34 min ago:
              Echo chamber of rational, thoughtful and truthful speakers is
              what Iâm looking for in Internet forums.
       
                red-iron-pine wrote 12 hours 9 min ago:
                flat earth creationists would describe their colleagues the
                same way.
                
                a group of them certainly is an echo chamber; why isn't your
                view?
       
                  xmprt wrote 6 hours 3 min ago:
                  An echo chamber is a product of your own creation. If you're
                  willing to upvote people who disagree with your and actively
                  seek out opposite takes to be genuinely curious about, then
                  you're probably not in an echo chamber.
                  
                  The tools for controlling your feed are reducing on social
                  media like Instagram, TikTok, Youtube, etc., but simply
                  saying that you follow and respect the opinions of a select
                  group doesn't necessarily mean you're forming an echo
                  chamber.
                  
                  This is different from something like flat earth/other
                  conspiracy theories where when confronted with opposite
                  evidence, they aren't likely to engage with it in good faith.
       
                  ahf8Aithaex7Nai wrote 8 hours 20 min ago:
                  He doesn't deny that his point of view forms an echo chamber.
       
                jrmg wrote 12 hours 39 min ago:
                Thatâs what everyone living in an echo chamber (and
                especially one of their own creation) thinks theyâre in.
       
                  XorNot wrote 11 hours 21 min ago:
                  "you're in an echo chamber" is one of the most frightfully
                  overused opinions.
       
                    ssl-3 wrote 11 hours 6 min ago:
                    The expression is an echo chamber in and of itself; it is
                    self-fulfilling prophecy.
       
                  baq wrote 11 hours 42 min ago:
                  I don't think I'm in any is my problem (HN is better than
                  most, doesn't mean it's good in absolute terms...)
       
              intended wrote 22 hours 14 min ago:
              Echo chambers will always result on social media. I don't think
              you can come up with a format that will not result in
              consolidated blocs.
       
              PunchyHamster wrote 1 day ago:
              More than having exact same system but with any random reader
              voting ? I'd say as long as you don't do "I disagree therefore I
              downvote" it would probably be more accurate than having
              essentially same voting system driven by randoms like reddit/HN
              already does
       
              modeless wrote 1 day ago:
              Reddit's current structure very much produces an echo chamber
              with only one main prevailing view. If everyone used an extension
              like this I would expect it to increase overall diversity of
              opinion on the site, as things that conflict with the main echo
              chamber view could still thrive in their own communities rather
              than getting downvoted with the actual spam.
       
                XorNot wrote 11 hours 20 min ago:
                Hacker News structure is identical though. Topics invite
                different demographics and downvotes suppress unpopular
                opinions. The front page shows most up voted stories. It's the
                same system.
       
                  morshu9001 wrote 2 hours 56 min ago:
                  HN has some built-in ways to reduce this, like not allowing
                  everyone to downvote everything.
       
                  modeless wrote 9 hours 53 min ago:
                  HN's moderation and ranking is better. But there's definitely
                  an echo chamber effect here too.
       
              mistercheph wrote 1 day ago:
              it depends on if you vote based on the quality of contribution to
              the discussion or based on how much you agree/disagree.
       
                miki123211 wrote 21 hours 46 min ago:
                I don't think you can change user behavior like this.
                
                You can give them a "venting sink" though. Instead of having a
                downvote button that just downvotes, have it pop up a little
                menu asking for a downvote reason, with "spam" and "disagree"
                as options. You could then weigh downvotes by which option was
                selected, along with an algorithm to discover "user honesty"
                based on whether their downvotes correlate with others or just
                with the people on their end of the political spectrum, a la
                Birdwatch.
       
                  morshu9001 wrote 2 hours 56 min ago:
                  You can't change it for other users, only for yourself, which
                  is what the original comment about the extension said.
       
        jeffbee wrote 1 day ago:
        I'm delighted to see that one of the users who makes the same negative
        comments on every Google-related post gets a "D" for saying Waymo was
        smoke and mirrors. Never change, I guess.
       
        gaigalas wrote 1 day ago:
        I am not sure if we need a karma precog analogue.
        
        It does seem better than just upvotes and downvotes though.
       
        collinmcnulty wrote 1 day ago:
        > But if intelligence really does become too cheap to meter, it will
        become possible to do a perfect reconstruction and synthesis of
        everything. LLMs are watching (or humans using them might be). Best to
        be good.
        
        I cannot believe this is just put out there unexamined of any level of
        "maybe we shouldn't help this happen". This is complete moral
        abdication. And to be clear, being "good" is no defense. Being good
        often means being unaligned with the powerful, so being good is often
        the very thing that puts you in danger.
       
          consumer451 wrote 1 day ago:
          It's nice that the LLM-enabled panopticon still cannot find this very
          recent related media, [0] but my silly mind can. It is actually an
          interesting commentary from a non-tech point of view. This is how the
          rest of the world feels:
          
          Anyway, back to work trying to make my millions using Opus and such.
          
          [0]
          
 (HTM)    [1]: https://old.reddit.com/r/funny/comments/1pj5bg9/al_companies...
       
          cootsnuck wrote 1 day ago:
          To be clear...prior to this recent explosive interest in LLMs, this
          was already true. Snowden was over 10 years ago.
          
          We can't start clutching our pearls now as if programmatic mass
          surveillance hasn't been running on all cylinders for over 20 years.
          
          Don't get me wrong, we should absolutely care about this, everyone
          should. I'm just saying any vague gestures at imminent privacy-doom
          thanks to LLMs is liable to be doing some big favors of inadvertently
          sanitizing the history of prior (and still) egregious privacy
          offenders.
          
          I'm just suggesting more "Yes and" and less "pearl clutching" is all.
       
            panarky wrote 1 day ago:
            Who, exactly, is the "we" who you see "pearl clutching" instead of
            "yes and-ing"?
       
          thatguy0900 wrote 1 day ago:
          Well the companies that facilitate this have found themselves in a
          position where if they go down they take the US economy with them, so
          the maybe this shouldn't happen thing is a moot point. At least we
          know this stuff is in stable, secure hands though, like how the
          palantir ceo does recorded interviews while obviously blasted out of
          his mind on drugs.
       
          doctoboggan wrote 1 day ago:
          I've had the same though as Karpathy over the past couple of
          months/years. I don't think it's good, exciting, or something to
          celebrate, but I also have no idea how to prevent it.
          
          I would read his "Best to be good." as a warning or reminder that
          everything you do or say online will be collected and analyzed by an
          "intelligence". You can't count on hiding amongst the mass of online
          noise. Imagine if someone were to collect everything you've written
          or uploaded to the internet and compiled it into a long document.
          What sort of story would that tell about who you are? What would a
          clever person (or LLM) be able to do with that document?
          
          If you have any ideas on how to stop everyone from building the
          torment nexus, I am willing to listen.
       
            tensor wrote 1 day ago:
            I think we need to stop focusing only on the AI aspect of this.
            Yes, it's an important component to the sort of mass surveillance
            system you're describing, but it's not the only component. The
            internet, advertising, privacy, all of these are integral to this
            outcome.
            
            While I don't have a general solution, I do believe that the
            solution will need to be multi-faceted and address multiple aspects
            of the technologies enabling this. My first step would be for
            society to re-evaluate and shift its views towards information,
            both locally and internationally.
            
            For example, if you proposed to get rid of all physical borders
            between countries, everyone would likely be aghast. Obviously there
            are too many disagreements and conflicting value sets between
            countries for this to happen. Yet in the west we think nothing have
            having no digital information borders, despite the fact that the
            lack of them in part enables this data collection and other issues
            such as election interference. Yes, erecting firewalls is extremely
            unpalatable to people in the west, but is almost certainly part of
            the solution on the national level. Countries like China long ago
            realized this, though they also use firewalls as a means of
            control, not just protection (it doesn't have to be this way).
            
            But within countries we also need to shift away from a default
            position of "I have the right to say whatever I want so therefore I
            should" and into one of "I'm not putting anything online unless I'm
            willing to have my employer, parents, literally everyone, read it."
            Also, we need to systematically attack and dismantle the
            advertising industry. That industry is one of the single biggest
            driving factors behind the extreme systematic collection and
            correlation of data on people. Advertising needs to switch to a
            "you come to me" approach not a "I'm coming to you" approach.
       
            flir wrote 1 day ago:
            That's not my department, says Wernher von Braun.
            
            Don't know why that just popped into my head.
       
            collinmcnulty wrote 1 day ago:
            This is my plan at least
            
            1. Don't build the Torment Nexus yourself. Don't work for them and
            don't give them your money.
            
            2. When people you know say they're taking a new job to work at
            Torment Nexus, act like that's super weird, like they said they're
            going to work for the Sinaloa cartel. Treat rich people working on
            the Torment Nexus like it's cringe to quote them.
            
            3. Get hostile to bots. Poison the data. Use AdNauseum and Anubis.
            
            4. Give your non-tech friends the vague sense that this stuff is
            bad. Some might want to listen more, but most just take their sense
            of what's cool and good from people they trust in the area.
       
              magic_hamster wrote 1 day ago:
              This seems to me like a form of social engineering, or to some
              extent, being a bit insufferable. And, rest assured it will not
              result in anything useful. The only result of this is that you
              will alienate your friends and colleagues if they work for an
              employer you don't like.
       
              Teever wrote 1 day ago:
              Do you have any suggestions on how to interact online with people
              who work at Torment Nexus?
       
            karpathy wrote 1 day ago:
            Thank you
       
          Teever wrote 1 day ago:
          The time for discussion and action on this was over a 15 years ago
          when Snowden and the NSA with their Utah data centre was a big story.
          
          Governments around the world have profiles on people and spiders that
          quietly amass the data that continuously updates those profiles.
          
          It's just a matter of time before hardware improves and we see
          another holocaust   scale purge facilitated by robots.
          
          Surveillance capitalism won.
       
        artur44 wrote 1 day ago:
        Interesting experiment. Using modern LLMs to retroactively grade
        decade-old HN discussions is a clever way to measure how well our
        collective predictions age. Itâs impressive how little time and
        compute it now takes to analyze something that wouldâve required days
        of manual reading. My only caution is that hindsight grading can
        overvalue outcomes instead of reasoning â good reasoning can still
        lead to wrong predictions. But as a tool for calibrating forecasting
        and identifying real signal in discussions, this is a very cool
        direction.
       
        lapcat wrote 1 day ago:
        Does anyone else think that HN engages in far too much navel-gazing?
        Nothing gets upvotes faster than a HN submission about HN.
       
          dang wrote 1 day ago:
          It's true that meta is the crack of internet forums, so we, er, crack
          down on it quite a bit. That's a longstanding view: [1] Alternate
          metaphor: evil catnip - [2] But yesterday's thread and this one are
          clearly exceptionsâfar above the median. [3] was particularly
          incredible I think!
          
 (HTM)    [1]: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
 (HTM)    [2]: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
 (HTM)    [3]: https://news.ycombinator.com/item?id=46212180
       
            DonHopkins wrote 23 hours 27 min ago:
            Dang, posting links to searches for your own comments is so meta,
            no matter the topic, but even more meta when about meta crack. I
            love how the first hit of meta crack is this, your own message
            about meta crack.
       
              dang wrote 10 hours 26 min ago:
              I'm higher than my supplier!
       
            latexr wrote 1 day ago:
            I love it when you share some insight about HN or internet
            communication for which you have relevant searches at the ready to
            explanations of the concept.
            
            A personal favourite is âthe contrarian dynamicâ.
            
            Do you have a list of those at the ready or do you just remember
            them? If you feel like sharing, whatâs your process and is there
            a list of those youâd make public?
            
            I imagine having one would be useful, e.g. for onboarding someone
            like tomhow, though that doesnât really happen often.
       
              dang wrote 1 day ago:
              I just remember them. Or forget them!
              
              The process is simply that moderation is super repetitive, so
              eventually certain pathways get engraved in one's memory. A lot
              of the time, though, I can't quite remember one of these patterns
              and I'm unable to dig up my past comments about it. That's
              annoying, in that particular way when your brain can feel
              something's there but is unable to retrieve it.
       
                Terretta wrote 1 day ago:
                Well, you're #24 in this article's hall of fame, and the LLM
                thinks your moderation views stood the test of time.  Perhaps
                it can already retrieve them for you.
       
                  dang wrote 1 day ago:
                  There are so many interesting points and patterns that I've
                  just lost track of over the years.
                  
 (HTM)            [1]: https://hn.algolia.com/?dateRange=all&page=0&prefix=...
       
          CamperBob2 wrote 1 day ago:
          As moultano suggests, this is likely because most other websites make
          it completely impossible to navel-gaze.  We can't possibly give the
          HN admins too much praise and credit for their commitment to open and
          stable availability of legacy data.
       
          yellow_lead wrote 1 day ago:
          It's weird that HN viewers are interested in HN
       
        bgwalter wrote 1 day ago:
        "If LLMs are watching, humans will be on their best behavior".
        Karpathy, paraphrasing Larry Ellison.
        
        The EU may give LLM surveillance an F at some point.
       
        swalsh wrote 1 day ago:
        I have never felt less confident in the future than I do in 2025... and
        it's such a stark contrast.  I guess if you split things down the
        middle, AI probably continues to change the world in dramatic ways but
        not in the all or nothing way people expect.
        
        A non trivial amount of people get laid off, likely due to a finanical
        crisis which is used as an excuse for companies scale up use of AI. 
        Good chance the financial crisis was partly caused by AI companies,
        which ironically makes AI cheaper as infra is bought up on the cheap
        (so there is a consolidation, but the bountiful infra keeps things
        cheap).  That results in increased usage (over a longer period of
        time). and even when the economy starts coming back the jobs numbers
        stay abismal.
        
        Politics are divided into 2 main groups, those who are employed, and
        those who are retired.    The retired group is VERY large, and has alot
        of power.  They mostly care about entitlements.  The employed age
        people focus on AI which is making the job market quite tough.    There
        are 3 large political forces (but 2 parties).  The Left, the Right, and
        the Tech Elite.  The left and the right both hate AI, but the tech
        elite though a minority has outsized power in their tie breaker role. 
        The age distributions would surprise most.  Most older people are now
        on the left, and most younger people are split by gender.  The right
        focuses on limiting entitlements, and the left focuses on growing them
        by taxing the tech elite.  The right maintains power by not threatening
        the tech elite.
        
        Unlike the 20th century America is a more focused global agenda.  We're
        not policing everyone, just those core trading powers.    We have not
        gone to war with China, China has not taken over Taiwan.
        
        Physical robotics is becoming a pretty big thing, space travel is
        becoming cheaper.  We have at least one robot on an astroid mining it. 
        The yield is trivial, but we all thought it was neat.
        
        Energy is much much greener, and you wouln't have guessed it... but it
        was the data centers that got us there.  The Tech elite needed it
        quickly, and used the political connections to cut red tape and build
        really quickly.
       
          Karrot_Kream wrote 1 day ago:
          Are you in the wrong thread?
       
          1121redblackgo wrote 1 day ago:
          We do not currently have the political apparatus in place to stop the
          dystopian nightmares depicted in movies and media. They were supposed
          to be cautionary tales. Maybe they still can be, but there are
          basically zero guardrails in non-progressive forms of government to
          prevent massive accumulations of power being wielded in ways most of
          the population disapproves of.
       
            samdoesnothing wrote 1 day ago:
            Thats the whole point of democracy, to prevent the ruling parties
            from doing wildly unpopular things. Unlike a dictatorship, where
            they can do anything (including good things, that otherwise
            wouldn't happen in a democracy).
            
            I know that "X is destroying democracy, vote for Y" has been a
            prevalent narrative lately, but is there any evidence that it's
            true? I get that it's death by a thousand cuts, or "one step at a
            time" as they say.
       
              xpe wrote 11 hours 40 min ago:
              > I know that "X is destroying democracy, vote for Y" has been a
              prevalent narrative lately, but is there any evidence that it's
              true? I get that it's death by a thousand cuts, or "one step at a
              time" as they say.
              
              I suggest reading [1], [2], and [3]. From there, you'll probably
              have lots of background to pose your own research questions.
              According to [4], until you write about something, your thinking
              will be incomplete, and I tend to agree nearly all of the time.
              
              [1] [2] [3] [4]: "Neuroscientists, psychologists and other
              experts on thinking have very different ideas about how our
              brains work, but, as Levy writes: âno matter how internal
              processes are implemented, (you) need to understand the extent to
              which the mind is reliant upon external scaffolding.â (2011,
              270) If there is one thing the experts agree on, then it is this:
              You have to externalise your ideas, you have to write. Richard
              Feynman stresses it as much as Benjamin Franklin. If we write, it
              is more likely that we understand what we read, remember what we
              learn and that our thoughts make sense." - SÃ¶nke Ahrens. How to
              Take Smart Notes_ - Sonke Ahrens (p. 30)
              
 (HTM)        [1]: https://en.wikipedia.org/wiki/Democratic_backsliding
 (HTM)        [2]: https://hub.jhu.edu/2024/08/12/anne-applebaum-autocracy-...
 (HTM)        [3]: https://carnegieendowment.org/research/2025/08/us-democr...
       
        MBCook wrote 1 day ago:
        #272, I got a B+! Neat.
        
        It would be very interesting to see this applied year after year to see
        if people get better or worse over time in the accuracy of their
        judgments.
        
        It would also be interesting to correlate accuracy to scores, but I
        kind of doubt that can be done. Between just expressing popular
        sentiment and the first to the post people getting more votes for the
        same comment than people who come later it probably wouldnât be very
        useful data.
       
          pjc50 wrote 1 day ago:
          #250, but then I wasn't trying to make predictions for a future AI.
          Or anyone else, really. Got a high score mostly for status quo bias,
          e.g. visual languages going nowhere and FPGAs remain niche.
       
            embedding-shape wrote 1 day ago:
            Yeah, it be much more interesting to see the people who made (at
            the time) outrageous claims, but they came to be true, rather than
            a list of people who could state that the status quo most likely
            would stay as it is.
       
        GaggiX wrote 1 day ago:
        I think the most fun thing is to go to: [1] And scroll down to the
        bottom.
        
 (HTM)  [1]: https://karpathy.ai/hncapsule/hall-of-fame.html
       
          MBCook wrote 1 day ago:
          Itâs interesting, if you go down near the bottom you see some
          people with both Aâs and Dâs.
          
          According to the ratings for example, one person both had extremely
          racist ideas but also made a couple of accurate points about how some
          tech concepts would evolve.
       
            brian_spiering wrote 1 day ago:
            That is interesting because of the Halo effect. There is a
            cognitive bias that if a person is right in one area, they will be
            right in another unrelated area.
            
            I try to temper my tendency to believe the Halo effect with Warren
            Buffett's notion of the Circle of Competence; there is often a very
            narrow domain where any person can be significantly knowledgeable.
       
        exasperaited wrote 1 day ago:
        > Everything we do today might be scrutinized in great detail in the
        future because it will be "free".
        
        s/"free"/stolen/
        
        The bit about college courses for future prediction was just silly, I'm
        afraid: reminds me of how Conan Doyle has Sherlock not knowing Earth
        revolves around the Sun. Almost all serious study concerns itself with
        predicting,  modelling and influence over the future behaviour of some
        system; the problem is only that people don't fucking listen to the
        predictions of experts. They aren't going to value refined, academic
        general-purpose futurology any more than they have in the past; it's
        not even a new area of study.
       
        moultano wrote 1 day ago:
        Notable how this is only possible because the website is a good "web
        citizen." It has urls that maintain their state over a decade. They
        contain a whole conversation. You don't have to log in to see anything.
        The value of old proper websites increases with our ability to process
        them.
       
          dietr1ch wrote 1 day ago:
          > because the website is a good "web citizen." It has urls that
          maintain their state over a decade.
          
          It's a shame that maintaining the web is so hard that only a few
          websites are "good citizens". I wish the web was a -bit- way more
          like git. It should be easier to crawl the web and serve it.
          
          Say, you browse and get things cached and shared, but only your
          "local bookmarks" persist. I guess it's like pinning in IPFS.
       
            drdec wrote 1 day ago:
            > It's a shame that maintaining the web is so hard that only a few
            websites are "good citizens"
            
            It's not hard actually.  There is a lack of will and forethought on
            the part of most maintainers.  I suspect that monetization also
            plays a role.
       
            DANmode wrote 1 day ago:
            Let Reddit and friends continue to out themselves for who they are.
            
            Keeps the spotlight on carefully protected communities like this
            one.
       
            moultano wrote 1 day ago:
            Yes, I wish we could serve static content more like bittorent,
            where your uri has an associate hash, and any intermediate router
            or cache could be an equivalent source of truth, with the final
            server only needing to play a role if nothing else has it.
            
            It is not possible right now to make hosting
            democratized/distributed/robust because there's no way for people
            to donate their own resources in a seamless way to keeping things
            published. In an ideal world, the internet archive seamlessly drops
            in to serve any content that goes down in a fashion transparent to
            the user.
       
              oncallthrow wrote 1 day ago:
              This is IPFS
       
                shpx wrote 1 day ago:
                In my experience from the couple of times I clicked an IPFS
                link years ago, it loaded for a long time and never actually
                loaded anything, failing the first "I wish we could serve
                static content" part.
                
                If you make it possible for people to donate bandwidth you
                might just discover no one wants to.
       
                  dietr1ch wrote 9 hours 52 min ago:
                  I think that many are able to toss a almost permanently
                  online raspberry pi in their homes and that's probably enough
                  for sustaining a decently good distributed CAS network that
                  shares small text files.
                  
                  The wanting to is in my mind harder. How do you convince
                  people that having the network is valuable enough? It's easy
                  to compare it with the web backed by few feuds that offer for
                  the most part really good performance, availability and
                  somewhat good discovery.
       
          jeffbee wrote 1 day ago:
          There are things that you have to log in to see, and the mods
          sometimes move conversations from one place to another, and also, for
          some reason, whole conversations get reset to a single timestamp.
       
            embedding-shape wrote 1 day ago:
            > and the mods sometimes move conversations from one place to
            another
            
            This only manipulates the children references though, never the
            item ID itself. So if you have the item ID of an item (submission,
            comment, poll, pollItem), it'll be available there as long as
            moderators don't remove it, which happens very seldom.
       
            latexr wrote 1 day ago:
            > for some reason, whole conversations get reset to a single
            timestamp.
            
            What do you mean?
       
              embedding-shape wrote 1 day ago:
              Submissions put in the second-chance pool briefly appear
              (sometimes "again") on the frontpage, and the conversation
              timestamps are reset so it appears like they were written after
              the second-chance submission, not before.
       
                Y_Y wrote 1 day ago:
                I never noticed that. What a weird lie!
                
                I suppose they want to make the comments seem "fresh" but it's
                a deliberate misrepresentation. You could probably even
                contrive a situation where it could be damaging, e.g. somebody
                says something before some relevant incident, but the website
                claims they said it afterwards.
       
                  embedding-shape wrote 1 day ago:
                  I think the reason is much simpler than that. Resetting the
                  timestamp lets them easily resurface things on the frontpage,
                  because the current time - posting time delta becomes a lot
                  smaller, so it's again ranked higher. And avoiding adding a
                  special case, lets the rest of the codebase work exactly like
                  it was before, basically just need to add a "set submission
                  time to now" function and you get the rest for free.
                  
                  But, I'm just guessing here based on my own refactoring
                  experience through the years, may be a completely different
                  reason, or even by mistake? Who knows? :)
       
              jeffbee wrote 1 day ago:
              There is some action that moderators can take that throws one of
              yesterday's articles back on the front page and when that happens
              all the comments have the same timestamp.
       
                consumer451 wrote 1 day ago:
                I believe that this is called "the second chance pool." It is a
                bit strange when it unexpectedly happens to one's own post.
       
          chrisweekly wrote 1 day ago:
          Yes! See "Cool URIs Don't Change"^1 by Sir TBL himself.
          
          1.
          
 (HTM)    [1]: https://www.w3.org/Provider/Style/URI
       
        jasonthorsness wrote 1 day ago:
        It's fun to read some of these historic comments! A while back I wrote
        a replay system to better capture how discussions evolved at the time
        of these historic threads. Here's Karpathy's list from his graded
        articles, in the replay visualizer:
        
        Swift is Open Source [1] Launch of Figma, a collaborative interface
        design tool [2] Introducing OpenAI [3] The first person to hack the
        iPhone is building a self-driving car [4] SpaceX launch webcast:
        Orbcomm-2 Mission [video] [5] At Theranos, Many Strategies and Snags
        
 (HTM)  [1]: https://hn.unlurker.com/replay?item=10669891
 (HTM)  [2]: https://hn.unlurker.com/replay?item=10685407
 (HTM)  [3]: https://hn.unlurker.com/replay?item=10720176
 (HTM)  [4]: https://hn.unlurker.com/replay?item=10744206
 (HTM)  [5]: https://hn.unlurker.com/replay?item=10774865
 (HTM)  [6]: https://hn.unlurker.com/replay?item=10799261
       
          matsemann wrote 20 hours 33 min ago:
          I like the "past" functionality here, maybe wished there was one for
          week/month I could scroll back as well.
          
          Miss it for reddit as well. Top day/week/month/alltime makes it hard
          to find top a month in 2018.
       
          arowthway wrote 20 hours 35 min ago:
          Comment dates on hn frontend are sometimes altered when submissions
          are merged, do you handle this case properly?
       
            jasonthorsness wrote 15 hours 55 min ago:
            It is handled on the Unlurker front page (you will see a little
            note that says âtime adjusted for second chanceâ). The replay
            doesnât do any adjustment for it, but I think that makes it
            reflect the reality of when the comments came in since the
            adjustments are like a temporary bump
       
          SauntSolaire wrote 1 day ago:
          I'd love to see sentiment analysis done based on time of day. I'm
          sure it's largely time zone differences, but I see a large variance
          in the types of opinions posted to hn in the morning versus the
          evening and I'd be curious to see it quantified.
       
            red-iron-pine wrote 12 hours 12 min ago:
            e.g. how many are cali tech bros vs nyc fintec vs 10am moscow
            shillbot time
       
            embedding-shape wrote 1 day ago:
            Yeah, I see this constantly any time Europe is mentioned in a
            submission. Early European morning/day, regular discussions, but as
            the European afternoon/evening comes around, you start noticing a
            lot anti-union sentiment, discussions start to shift into
            over-regulation, and the typical boring anti-Europe/EU talking
            points.
       
              nostrebored wrote 1 day ago:
              âRegularâ to who? Pro EU sentiment almost only comes from the
              EU, which is what youâre observing. Pro-US sentiment is
              relatively mixed (as is anti-US sentiment) in distribution.
       
                gilrain wrote 17 hours 10 min ago:
                > Pro EU sentiment almost only comes from the EU
                
                Says who? But also, it doesnât suggest what you imply. I
                could as easily conclude: âOh wow, the people who actually
                experience the system like it that much? Awesome!â
       
          HanClinto wrote 1 day ago:
          Okay, your site is a ton of fun. Thank you! :)
       
        siliconc0w wrote 1 day ago:
        Random Bets for 2035:
        
        * Nvidia GPUs will see heavy competition and most chat-like use-cases
        switching to cheaper models and inference-specific-silicon but will be
        still used on the high end for critical applications and frontier
        science
        
        * Most Software and UIs will be primarily AI-generated.  There will be
        no 'App Stores' as we know them.
        
        * ICE Cars will become niche and will be largely been replaced with
        EVs, Solar will be widely deployed and will be the dominate source of
        power
        
        * Climate Change will be widely recognized due to escalating
        consequences and there will be lots of efforts in mitigations (e.g,
        Climate Engineering, Climate-resistant crops, etc).
       
          rafaelmn wrote 1 day ago:
          I'd take the other side for most of these - Nvidia one is too vague
          (some could argue it's already seeing "heavy competition" from Google
          and other players in the space) but something more concrete - I doubt
          they will fall below 50% market share.
       
          pu_pe wrote 1 day ago:
          The infamous Dropbox comment might turn out to be right in 10 more
          years, when LLMs might just build an entire application from scratch
          for you.
       
          xattt wrote 1 day ago:
          Youâre about 20 days short or 345 days late for this HN tradition.
          ;)
       
        gen6acd60af wrote 1 day ago:
        Commenters of HN:
        
        Your past thoughts have been dredged up and judged.
        
        For each $TOPIC, you have been awarded a grade by GPT-5.1 Thinking.
        
        Your grade is based on OpenAI's aligned worldview and what OpenAI's
        blob of weights considers Truth in 2025.
        
        Did you think well, netizen?
        
        Are you an Alpha or a Delta-Minus?
        
        Where will the dragnet grading of your online history happen next?
       
        bediger4000 wrote 1 day ago:
        LLMs are watching (or humans using them might be). Best to be good.
        
        Shades of Roko's Basilisk!
       
          ambicapter wrote 1 day ago:
          More like a Panopticon. As the parenthesis notes, this is just as bad
          when humans are the final link in the eyeball chain.
       
       
 (DIR) <- back to front page