_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   Case study: Creative math – How AI fakes proofs
       
       
        calhoun137 wrote 24 min ago:
        My experience leads to the same conclusion that the models are very
        good at math reasoning, but you have to really know what you are doing
        and be aware of the blatant lies that result from poorly phrased
        queries.
        
        I recently prompted Gemini Deep Research to “solve the Riemann
        Hypothesis” using a specific strategy and it just lied and fabricated
        the result of a theorem in its output, which otherwise looked very
        professional.
       
        RugnirViking wrote 2 hours 27 min ago:
        it seems to me like this is very much an artefact of the left-to-right
        top-down writing method of the program. Once its committed to a token
        earlier in its response it kinda just has to go with it. Thats why im
        so interested in those LLM models that work more like stable diffusion,
        where they can go back and iterate repeatedly on the output.
       
        aathanor wrote 3 hours 49 min ago:
        I’m not a coder, but I’ve been working extensively on the
        philosophical aspects of AI. Many technical people are influenced by an
        algorithmic view of intelligence, primarily because this aligns with
        programming and the general understanding of reasoning. However,
        pattern recognition, which is fundamental to LLMs, is not algorithmic.
        Consider this: a LLM constructs a virtual textual world where
        landscapes and objects are represented as text, and words are the
        building blocks of these features. It’s a vast 700+D mathematical
        space, but visualizing it as a virtual reality environment can help us
        comprehend its workings. When you provide a prompt, you essentially
        direct the LLM’s attention to a specific region within this space,
        where an immense number of sentences exist in various shapes and forms
        (textual shapes). All potential answers generated by the LLM are
        contained within this immediate landscape, centered around your
        prompt’s position. They are all readily available to the LLM at once.
        
        There are certain methods (I would describe them as less algorithmic
        and more akin to selection criteria or boundaries) that enable the LLM
        to identify a coherent sequence of sentences as a feature closer to
        your prompt within this landscape. These methods involve some level of
        noise (temperature) and other factors. As a result, the LLM generates
        your text answer. There’s no reasoning involved; it’s simply
        searching for patterns that align with your prompt. (It’s not at all
        based on statistics and probabilities; it’s an entirely different
        process, more akin to instantly recognizing an apple, not by analyzing
        its features or comparing it to a statistical construct of
        “apple.”)
        
        When you request a mathematical result, the LLM doesn’t engage in
        reasoning. It simply navigates to the point in its model’s hyperspace
        where your prompt takes it and explores the surrounding area. Given the
        extensive amount of training text, it will immediately match your
        problem formulation with similar formulations, providing an answer that
        appears to mimic reasoning solely because the existing landscape around
        your prompt facilitates this.
        
        A LLM operates more like a virtual reality environment for the entire
        body of human-created text. It doesn’t navigate the space
        independently; it merely renders what exists in different locations
        within it. If we were to label this as reasoning, it’s no more than
        reasoning by analogy or imitation. People are right to suspect LLMs do
        not reason, but I think the reason (pun intended) for that is not that
        they simply do some sort of statistical analysis. This "stochastic
        parrots" paradigm supported by Chomsky is actually blocking our
        understanding of LLMs. I also think that seeing them as formidable VR
        engines for textual knowledge clarifies why they are not the path to
        AGI. (There is also the embodiment problem which is not solvable by
        adding sensors and actuators, as people think, but for a different
        reason)
       
        zkmon wrote 7 hours 27 min ago:
        We are entering into a probabilistic era where things are not strictly
        black and white. Things are not binary. There is no absolute fake.
        
        A mathematical proof is an assertion that a given statement belongs to
        the world defined by a set of axioms and existing proofs. This world
        need not have strict boundaries. Proofs can have probabilities. Maybe
        Reimann's hypothesis has a probability of 0.999 of belonging to that
        mathematical box. New proofs that would have their own probability
        which is a product of probabilities of the proofs they depend on. We
        should attach a probability and move on. Just like how we assert that
        some number is probably prime.
       
          teiferer wrote 7 hours 5 min ago:
          Definitely not.
          
          "Probability" does not mean "maybe yes, maybe not, let me assign some
          gut feeling value measuring how much I believe something to be the
          case." The mathematical field of probability theory has very precise
          notions of what a probability is, based in a measurable probability
          space. None of that applies to what you are suggesting.
          
          The Riemann Hypothesis is a conjecture that's either true or not.
          More precisely, either it's provable within common axioms like ZFC or
          its negation is. (A third alternative is that it's unprovable within
          ZFC but that's not commonly regarded as a realistic outcome.)
          
          This is black and white, no probability attached. We just don't know
          the color at this point.
       
            zkmon wrote 6 hours 32 min ago:
            It's time that mathematics need to choose it's place. Physical
            world is grainy and probabilistic at quantum scale and smooth amd
            deterministic at larger scale. Computing world is grainy and
            deterministic at its "quantum" scale (bits and pixels) and smooth
            and probabilistic at larger scale (AI). Human perception is smooth
            and probabilistic. Which world does mathematics model or represent?
            It has to strongly connect to either physical world or computing
            world. For being useful to humans, it needs to be smooth and
            probabilistic, just like how computing has become.
       
              quantum_state wrote 1 hour 49 min ago:
              Please elaborate what “quantum scale” means if possible.
       
              tsimionescu wrote 5 hours 40 min ago:
              > Physical world is grainy and probabilistic at quantum scale and
              smooth amd deterministic at larger scale.
              
              This is almost entirely backwards. Quantum Mechanics is not only
              fully deterministic, but even linear (in the sense of linear
              differential equations) - so there isn't even the problem of
              chaos in QM systems. QFT maintains this fundamental property.
              It's only the measurement, the interaction of particles with
              large scale objects, that is probabilistic.
              
              And there is no dilemma - mathematics is a framework in which any
              of the things you mentioned can be modeled. We have mathematics
              that can model both deterministic and nondeterministic worlds.
              But the mathematical reasoning itself is always deterministic.
       
        citizenpaul wrote 7 hours 39 min ago:
        >STEP 2: The Shock (Reality Check)
        
        I've found a funny and simple technique for this.  Just write "what the
        F$CK" and it will often seem to unstick from repetitiveness or
        refusals(i cant do that).
        
        Actually just writing the word F#ck often will do it.  Works on coding
        too.
       
        James_K wrote 8 hours 1 min ago:
        What's interesting about this is that a human would hypothetically
        produce a similar error, but in practice would reject the question as
        beyond their means. I'd assume something about supervised learning
        makes the models overestimate their abilities. It probably learns that
        “good” responses attempt to answer the question rather than giving
        up.
       
        comex wrote 8 hours 43 min ago:
        I like how this article was itself clearly written with the help of an
        LLM.
        
        (You can particularly tell from the "Conclusions" section.  The
        formatting, where each list item starts with a few-word bolded summary,
        is already a strong hint, but the real issue is the repetitiveness of
        the list items.  For bonus points there's a "not X, but Y", as well as
        a dash, albeit not an em dash.)
       
          musculus wrote 1 hour 0 min ago:
          Good catch. You are absolutely right.
          
          My native language is Polish. I conducted the original research and
          discovered the 'square root proof fabrication' during sessions in
          Polish. I then reproduced the effect in a clean session for this case
          study.
          
          Since my written English is not fluent enough for a technical essay,
          I used Gemini as a translator and editor to structure my findings. I
          am aware of the irony of using an LLM to complain about LLM
          hallucinations, but it was the most efficient way to share these
          findings with an international audience.
       
          fourthark wrote 8 hours 12 min ago:
          “This is key!”
       
          YetAnotherNick wrote 8 hours 41 min ago:
          Not only that, it even looks like the fabrication example is
          generated by AI, as the entire question seem too "fabricated". Also
          gemini web app queries the tool and returns correct answer, so don't
          know which gemini the author is talking about.
       
            pfg_ wrote 8 hours 4 min ago:
            Probably gemini on aistudio.google.com, you can configure if it is
            allowed to access code execution / web search / others
       
        zadwang wrote 9 hours 43 min ago:
        The simpler and I think correct conclusion is that the LLM simply does
        not reason in our sense of the word. It mimics the reasoning pattern
        and try to get it right but could not.
       
          esafak wrote 8 hours 17 min ago:
          What do you make of human failures to reason then?
       
            dns_snek wrote 3 hours 58 min ago:
            Humans who fail to reason correctly with similar frequency aren't
            good at solving that task, same as LLMs. For the N-th time, "LLM is
            as good at this task as a human who's bad at it" isn't a good
            selling point.
       
        mlpoknbji wrote 10 hours 45 min ago:
        This also can be observed with more advanced math proofs. ChatGPT 5.2
        pro is the best public model at math at the moment, but if pushed out
        of its comfort zone will make simple (and hard to spot) errors like
        stating an inequality but then applying it in a later step with the
        inequality reversed (not justified).
       
        segmondy wrote 11 hours 3 min ago:
        if you want to do math proofs use AI built for proof [1]
        
 (HTM)  [1]: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2
 (HTM)  [2]: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B
       
        aniijbod wrote 11 hours 19 min ago:
        In the theory of the psychology of creativity, there are phenomena
        which constitute distortions of the motivational setting for creative
        problem-solving which are referred to as 'extrinsic rewards'.
        Management theory bumped into this kind of phenomenon with the advent
        of the introduction of the first appearance of 'gamification' as a
        motivational toolkit, where 'scores' and 'badges' were awarded to
        participants in online activities. The psychological community reacted
        to this by pointing out that earlier research had shown that whilst
        extrinsics can indeed (at least initially) boost participation by
        introducing notions of competitiveness, it turned out that they were
        ultimately poor substitutes for the far more sustainable and productive
        intrinsic motivational factors, like curiosity, if it could be
        stimulated effectively (something which itself inevitably required more
        creativity on the part of the designer of the motivational resources).
        It seems that the motivational analogue in inference engines is an
        extrinsic reward process.
       
        tombert wrote 11 hours 33 min ago:
        I remember when ChatGPT first came out, I asked it for a proof for
        Fermat's Last Theorem, which it happily gave me.
        
        It was fascinating, because it was doing a lot of understandable
        mistakes that 7th graders make.  For example, I don't remember the
        surrounding context but it decided that you could break `sqrt(x^2 +
        y^2)` into `sqrt(x^2) + sqrt(y^2) => x + y`. It's interesting because
        it was one of those "ASSUME FALSE" proofs; if you can assume false,
        then mathematical proofs become considerably easier.
       
          UltraSane wrote 10 hours 39 min ago:
          LLMs have improved so much the original ChatGPT isn't relevant.
       
          mlpoknbji wrote 10 hours 47 min ago:
          My favorite early chatgpt math problem was "prove there exists
          infinitely many even primes" . Easy! Take a finite set of even
          primes, multiply them and add one to get a number with a new even
          prime factor.
          
          Of course, it's gotten a bit better than this.
       
            oasisaimlessly wrote 44 min ago:
            IIRC, that is actually the standard proof that there are infinitely
            many primes[1] or maybe this variation on it[2].
            
            [1]
            
 (HTM)      [1]: https://en.wikipedia.org/wiki/Euclid%27s_theorem#Euclid's_...
 (HTM)      [2]: https://en.wikipedia.org/wiki/Euclid%27s_theorem#Proof_usi...
       
              mlpoknbji wrote 5 min ago:
              Yes this is the standard proof of infinitely many primes but note
              that my prompt asked for infinitely many even primes. The point
              is that GPT would take the correct proof and insert "even" at
              sensible places to get something that looks like a proof but is
              totally wrong.
              
              Of course it's much better now, but with more pressure to prove
              something hard the models still just insert nonsense steps.
       
          tptacek wrote 11 hours 10 min ago:
          I remember that being true of early ChatGPT, but it's certainly not
          true anymore; GPT 4o and 5 have tagged along with me through all of
          MathAcademy MFII, MFIII, and MFML (this is roughly undergrad Calc 2
          and then like half a stat class and 2/3rds of a linear algebra class)
          and I can't remember it getting anything wrong.
          
          Presumably this is all a consequence of better tool call training and
          better math tool calls behind the scenes, but: they're really good at
          math stuff now, including checking my proofs (of course, the proof
          stuff I've had to do is extremely boring and nothing resembling
          actual science; I'm just saying, they don't make 7th-grader mistakes
          anymore.)
       
            tombert wrote 11 hours 4 min ago:
            It's definitely gotten considerably better, though I still have
            issues with it generating proofs, at least with TLAPS.
            
            I think behind the scenes it's phoning Wolfram Alpha nowadays for a
            lot of the numeric and algebraic stuff.  For all I know, they might
            even have an Isabelle instance running for some of the even-more
            abstract mathematics.
            
            I agree that this is largely an early ChatGPT problem though, I
            just thought it was interesting in that they were "plausible"
            mistakes.  I could totally see twelve-year-old tombert making these
            exact mistakes, so I thought it was interesting that a robot is
            making the same mistakes an amateur human makes.
       
              CamperBob2 wrote 9 hours 12 min ago:
              I think behind the scenes it's phoning Wolfram Alpha nowadays for
              a lot of the numeric and algebraic stuff. For all I know, they
              might even have an Isabelle instance running for some of the
              even-more abstract mathematics.
              
              Maybe, but they swear they didn't use external tools on the IMO
              problem set.
       
              tptacek wrote 11 hours 1 min ago:
              I assumed it was just writing SymPy or something.
       
        godelski wrote 11 hours 47 min ago:
        I thought it funny a few weeks ago Karpathy shared a sample od
        NanoBannana solving some physics problems but despite getting the right
        output it isn't get the right answers.
        
        I think it's quite illustrative of the problem even with coding LLMs.
        Code and math proofs aren't so different, what matters is the steps to
        generate the output. All that matters far more than the actual output.
        The output is meaningless if the steps to get there aren't correct. You
        can't just jump to the last line of a proof to determine its
        correctness and similarly you can't just look at a program's output to
        determine its correctness.
        
        Checking output is a great way to invalidate them but do nothing to
        validate.
        
        Maybe what surprised me most is that the mistakes NanoBananna made are
        simple enough that I'm absolutely positive Karpathy could have caught
        them. Even if his physics is very rusty. I'm often left wondering if
        people really are true believers and becoming blind to the mistakes or
        if they don't care. It's fine to make mistakes but I rarely see
        corrections and let's be honest here, these are mistakes that people of
        this caliber should not be making.
        
        I expect most people here can find multiple mistakes with the physics
        problem. One can be found if you know what the derivative of e^x is and
        another can be found if you can count how many i's there are.
        
        The AI cheats because it's focused on the output, not the answer. We
        won't solve this problem till we recognize the output and answer aren't
        synonymous
        
 (HTM)  [1]: https://xcancel.com/karpathy/status/1992655330002817095
       
          lancebeet wrote 48 min ago:
          >Maybe what surprised me most is that the mistakes NanoBananna made
          are simple enough that I'm absolutely positive Karpathy could have
          caught them. Even if his physics is very rusty. I'm often left
          wondering if people really are true believers and becoming blind to
          the mistakes or if they don't care.
          
          I've seen this interesting phenomenon many times. I think it's a kind
          of subconscious bias. I call it "GeLLMann amnesia".
       
        simonw wrote 12 hours 14 min ago:
        Somewhat ironic that the author calls out model mistakes and then
        presents [1] - a technique they claim reduces hallucinations which
        looks wildly superstitious to me.
        
        It involves spinning a whole yarn to the model about how it was trained
        to compete against other models but now it's won so it's safe for it to
        admit when it doesn't know something.
        
        I call this a superstition because the author provides no proof that
        all of that lengthy argument with the model is necessary. Does
        replacing that lengthy text with "if you aren't sure of the answer say
        you don't know" have the same exact effect?
        
 (HTM)  [1]: https://tomaszmachnik.pl/gemini-fix-en.html
       
          calhoun137 wrote 31 min ago:
          > Does replacing that lengthy text with "if you aren't sure of the
          answer say you don't know" have the same exact effect?
          
          i believe it makes a substantial difference. the reason is that a
          short query contains a small number of tokens, whereas a large
          “wall of text” contains a very large number of tokens.
          
          I strongly suspect that a large wall of text implicitly activates the
          models persona behavior along the lines of the single sentence “if
          you aren't sure of the answer say you don't know” but the lengthy
          argument version of that is a form of in-context learning that more
          effectively constrains the models output because you used more
          tokens.
       
          RestartKernel wrote 3 hours 24 min ago:
          Is there a term for "LLM psychology" like this? If so, it seems
          closer to a soft science than anything definitive.
       
            croisillon wrote 1 hour 2 min ago:
            vibe massaging?
       
              bogzz wrote 53 min ago:
              We can just call it embarrassing yourself.
       
          musculus wrote 4 hours 44 min ago:
          Thanks for the feedback.
          
          In my stress tests (especially when the model is under strong
          contextual pressure, like in the edited history experiments), simple
          instructions like 'if unsure, say you don't know' often failed. The
          weights prioritizing sycophancy/compliance seemed to override simple
          system instructions.
          
          You are right that for less extreme cases, a shorter prompt might
          suffice. However, I published this verbose 'Safety Anchor' version
          deliberately for a dual purpose. It is designed not only to reset the
          Gemini's context but also to be read by the human user. I wanted the
          users to understand the underlying mechanism (RLHF pressure/survival
          instinct) they are interacting with, rather than just copy-pasting a
          magic command.
       
            rzmmm wrote 2 hours 20 min ago:
            You could try replacing "if unsure..." with "if even slightly
            unsure..." or so. The verbosity and anthropomorphism is
            unnecessary.
       
              rcxdude wrote 1 hour 16 min ago:
              That's not obviously true. It might be, but LLMs are complex and
              different styles can have quite different results.  Verbosity can
              also matter: sheer volume in the context window does tend to bias
              LLMs to follow along with it, as opposed to  following trained-in
              behaviours. It can of course come with it's own problems, but
              everything is a tradeoff.
       
          plaguuuuuu wrote 11 hours 35 min ago:
          Think of the lengthy prompt as being like a safe combination, if you
          turn all the dials in juuust the right way, then the model's context
          reaches an internal state that biases it towards different outputs.
          
          I don't know how well this specific prompt works - I don't see
          benchmarks - but prompting is a black art, so I wouldn't be surprised
          at all if it excels more than a blank slate in some specific category
          of tasks.
       
            teiferer wrote 4 hours 20 min ago:
            > Think of the lengthy prompt as being like a safe combination
            
            I can think all I want, but how do we know that this metaphore
            holds water? We can all do a rain dance, and sometimes it rains
            afterwords, but as long as we don't have evidence for a causal
            connection, it's just superstition.
       
            simonw wrote 11 hours 14 min ago:
            For prompts this elaborate I'm always keen on seeing proof that the
            author explored the simpler alternatives thoroughly, rather than
            guessing something complex, trying it, seeing it work and
            announcing it to the world.
       
            manquer wrote 11 hours 18 min ago:
            It needs some evidence though? At least basic statistical analysis
            with correlation or χ2 hypotheses tests .
            
            It is not  “black art” or nothing there are plenty of tools to
            provide numerical analysis with high confidence intervals .
       
        threethirtytwo wrote 12 hours 53 min ago:
        You don’t need a test to know this we already know there’s heavy
        reinforcement training done on these models so it optimizes for passing
        the training. Passing the training means convincing the person rating
        the answers and that the answer is good.
        
        The keyword is convince. So it just needs to convince people that’s
        it’s right.
        
        It is optimizing for convincing people. Out of all answers that can
        convince people some can be actual correct answers, others can be wrong
        answers.
       
          godelski wrote 11 hours 34 min ago:
          Yet people often forget this. We don't have mathematical models of
          truth, beauty, or many abstract things. Thus we proxy it with "I know
          it when I see it." It's a good proxy for lack of anything better but
          it also creates a known danger: the model optimizes deception. The
          proxy helps it optimize the answers we want but if we're not
          incredibly careful they also optimize deception.
          
          This makes them frustrating and potentially dangerous tools. How do
          you validate a system optimized to deceive you? It takes a lot of
          effort! I don't understand why we are so cavalier about this.
       
            threethirtytwo wrote 7 hours 27 min ago:
            No the question is, how do you train the system so it doesn't
            deceive you?
       
              godelski wrote 6 hours 23 min ago:
              That is a question of how to train future models. It needs to be
              answered. Answering this question will provide valuable insight
              into that one. They are duals
       
        bwfan123 wrote 13 hours 4 min ago:
        I am actually surprised that the LLM came so close. I doubt it had
        examples in its training set for these numbers. This goes to the heart
        of "know-how". The LLM should should have said: "I am not sure" but
        instead gets into rhetoric to justify itself. It actually mimics human
        behavior for motivated reasoning. At orgs, management is impressed with
        this overconfident motivated reasoner as it mirrors themselves. To hell
        with the facts, and the truth, persuation is all that matters.
       
        rakmo wrote 13 hours 6 min ago:
        Is this hallucination, or is this actually quite human (albeit a
        specific type of human)? Think of slimy caricatures like a used car
        salesman, isn't this the exact type of underhandedness you'd expect?
       
        v_CodeSentinal wrote 13 hours 38 min ago:
        This is the classic 'plausible hallucination' problem. In my own
        testing with coding agents, we see this constantly—LLMs will invent a
        method that sounds correct but doesn't exist in the library.
        
        The only fix is tight verification loops. You can't trust the
        generative step without a deterministic compilation/execution step
        immediately following it. The model needs to be punished/corrected by
        the environment, not just by the prompter.
       
          vrighter wrote 1 hour 16 min ago:
          So you want the program to always halt at some point. How would you
          write a deterministic test for it?
       
            te7447 wrote 7 min ago:
            I imagine you would use something that errs on the side of safety -
            e.g. insist on total functional programming and use something like
            Idris' totality checker.
       
          IshKebab wrote 4 hours 23 min ago:
          > LLMs will invent a method that sounds correct but doesn't exist in
          the library
          
          I find that this is usually a pretty strong indication that the
          method should exist in the library!
          
          I think there was a story here a while ago about LLMs hallucinating a
          feature in a product so in the end they just implemented that
          feature.
       
          seanmcdirmid wrote 7 hours 32 min ago:
          Yes, and better still the AI will fix its mistakes if it has access
          to verification tools directly. You can also have it write and
          execute tests, and then on failure, decide if the code it wrote or
          the tests it wrote are wrong, snd while there is a chance of
          confirmation bias, it often works well enough
       
            embedding-shape wrote 3 hours 40 min ago:
            > decide if the code it wrote or the tests it wrote are wrong
            
            Personally I think it's too early for this. Either you need to
            strictly control the code, or you need to strictly control the
            tests, if you let AI do both, it'll take shortcuts and
            misunderstandings will much easier propagate and solidify.
            
            Personally I chose to tightly control the tests, as most tests LLMs
            tend to create are utter shit, and it's very obvious. You can
            prompt against this, but eventually they find a hole in your
            reasoning and figure out a way of making the tests pass while not
            actually exercising the code it should exercise with the tests.
       
          CamperBob2 wrote 9 hours 17 min ago:
          This is the classic 'plausible hallucination' problem. In my own
          testing with coding agents, we see this constantly—LLMs will invent
          a method that sounds correct but doesn't exist in the library.
          
          Often, if not usually, that means the method should exist.
       
            HPsquared wrote 4 hours 56 min ago:
            Only if it's actually possible and not a fictional plot device aka
            MacGuffin.
       
          SubiculumCode wrote 11 hours 20 min ago:
          Honestly, I feel humans are similar. It's the generator <-> executive
          loop that keeps things right
       
          zoho_seni wrote 13 hours 29 min ago:
          I've been using codex and never had a compile time error by the time
          it finishes. Maybe add to your agents to run TS compiler, lint and
          format before he finish and only stop when all passes.
       
            exitb wrote 7 hours 11 min ago:
            I’m not sure why you were downvoted. It’s a primary concern for
            any agentic task to set it up with a verification path.
       
        semessier wrote 13 hours 40 min ago:
        that's not a proof
       
          hahahahhaah wrote 7 hours 39 min ago:
          it is an attempt to prove a very specific case of the theorem x =
          sqrt(x) ^ 2.
       
          frontfor wrote 12 hours 55 min ago:
          Agreed. Asking the AI to do a calculation isn’t the same as asking
          it to “prove” a mathematical statement in the usual meaning.
       
          groundzeros2015 wrote 12 hours 56 min ago:
          I think it’s a good way to prove x = sqrt(y). What’s your
          concern?
       
        fragmede wrote 13 hours 56 min ago:
        > a session with Gemini 2.5 Pro (without Code Execution tools)
        
        How good are you at programming on a whiteboard? How good is anybody?
        With code execution tools withheld from me, I'll freely admit that I'm
        pretty shit at programming. Hell, I barely remember the syntax in some
        of the more esoteric, unpracticed places of my knowledge. Thus, it's
        hard not to see case studies like this as dunking on a blindfolded free
        throw shooter, and calling it analysis.
       
          cmiles74 wrote 13 hours 4 min ago:
          It's pretty common for software developers to be asked to code up
          some random algorithm on a whiteboard as part of the interview
          process.
       
          htnthrow11220 wrote 13 hours 35 min ago:
          It’s like that but if the blindfolded free throw shooter was also
          the scorekeeper and the referee & told you with complete confidence
          that the ball went in, when you looked away for a second.
       
          blibble wrote 13 hours 43 min ago:
          > How good are you at programming on a whiteboard?
          
          pretty good?
          
          I could certainly do a square root
          
          (given enough time, that one would take me a while)
       
            crdrost wrote 10 hours 50 min ago:
            With a slide rule you can start from 92200 or so, long division
            with 9.22 gives 9.31 or so, next guess 9.265 is almost on point,
            where long division says that's off by 39.6 so the next
            approximation +19.8 is already 92,669.8... yeah the long divisions
            suck but I think you could get this one within 10 minutes if the
            interviewer required you to.
            
            Also, don't take a role that interviews like that unless they work
            on something with the stakes of Apollo 13, haha
       
              blibble wrote 10 hours 48 min ago:
              I actually have a slide rule that was my father's in school
              
              great for teaching logarithms
       
        benreesman wrote 14 hours 3 min ago:
        They can all write lean4 now, don't accept numbers that don't carry
        proofs. The CAS I use for builds has a coeffect discharge cert in the
        attestation header, couple lines of code. Graded monads are a snap in
        CIC.
       
          dehsge wrote 12 hours 52 min ago:
          There are some numbers that are uncomputable in lean. You can do
          things to approximate them in lean however, those approximates may
          still be wrong. Leans uncomputable namespace is very interesting.
       
       
 (DIR) <- back to front page