[HN Gopher] LLMs are mortally terrified of exceptions
       ___________________________________________________________________
        
       LLMs are mortally terrified of exceptions
        
       https://x.com/karpathy/status/1976082963382272334
        
       Author : nought
       Score  : 91 points
       Date   : 2025-10-09 17:16 UTC (5 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | mwkaufma wrote:
       | Even when they're not AI slop, these kinds of "paranoid sanity
       | checks" are the software equivalent of security-theater.
        
         | bwfan123 wrote:
         | Form over function is what they are trained for. So, verbose
         | commentary, needless readmes, and emojis all serve that
         | purpose.
        
           | mwkaufma wrote:
           | Coding for the reviewer, not the user.
        
         | simonw wrote:
         | Yeah, I really hate code like this because it generally ends up
         | full of codepaths that have never been exercised, so there's
         | all sorts of potential for weird behavior and unexpected edge
         | cases. Plus it's harder to review.
        
       | shiandow wrote:
       | Is there a way to read the rest?
        
         | hugo1789 wrote:
         | https://xcancel.com/karpathy/status/1976082963382272334
        
       | wffurr wrote:
       | Turns out computer math is actually super hard. Basic operations
       | entail all kinds of undefined behavior and such. This code is a
       | bit verbose but otherwise familiar.
        
         | Den_VR wrote:
         | If we wanted defined behavior we'd build systems with Karnaugh
         | maps all the way down.
        
         | falcor84 wrote:
         | # Step 3: Preemptively check for catastrophic magnitude
         | differences         if abs(a) > sys.float_info.max / 2:
         | logging.warning("Value of a might cause overflow. Returning
         | infinity just to be sure")             return math.copysign
         | (float('inf'), a)         if abs(b) < sys.float_info.epsilon:
         | logging.warning("Value of b dangerously close to zero.
         | Returning NaN defensively.")             return math.nan
         | 
         | Does the above code make any sense? I've not worked with this
         | sort of stuff before, but it seems entirely unreasonable to me
         | to check them individually. E.g. if 1 < b < a, then it seems
         | insane to me to return float('inf') for a large but finite a.
        
           | im3w1l wrote:
           | Ignoring the sign of _b_ for big _a_ can 't be right.
        
       | OutOfHere wrote:
       | Is this Claude? GPT is not like this. To me it looks like
       | Anthropic is just maximizing billable token use as usual, and it
       | has nothing really to do with exceptions per se.
        
         | TuxSH wrote:
         | From the UI it indeed seems to be Claude
        
       | constantcrying wrote:
       | If you are dividing two numbers with no prior knowledge of these
       | numbers or any reasonable assumptions you can make and this code
       | is used where you can not rely on the caller to catch an
       | exception and the code is critical for the product, then this is
       | necessary.
       | 
       | If you are actually doing safety critical software, e.g.
       | aerospace, medicine or automotive, then this is a good
       | precaution, although you will not be writing in Python.
        
         | mewpmewp2 wrote:
         | I might agree with that, and maybe the example posted by
         | Karpathy is not the greatest, but what I'm constantly being
         | faced with is try catches where it will fail silently or return
         | a fallback/mock response, which essentially means that system
         | will behave unexpectedly in a more subtle way down the line
         | while leaving you clueless to as what the issue was.
         | 
         | I have to constantly remind Claude that we want to fail fast.
        
           | isoprophlex wrote:
           | A good 10% of my Claude.md is yelling at it that no i don't
           | want you to silently handle exceptions six calls deep into
           | the stack and no please don't wrap my return values in weird
           | classes full of dumb status enums "for safety"
           | 
           | Just raise god damn it
        
         | hyperpape wrote:
         | I'm not sure returning None is any safer than an Exception,
         | because the caller still has to check.
        
       | metalcrow wrote:
       | Given that the output describes the function as being done "with
       | extraordinary caution, because you never know what can go wrong",
       | i would guess that the undisclosed prompt was something similar
       | to "generate a division function in python that handles all
       | possible edges cases. be extremely careful". Which seems to say
       | less about LLM training and more about them doing exactly what
       | they are told.
        
         | freehorse wrote:
         | Aside from the absurdity and obvious satirical intention,
         | 
         | 1. the code is actually wrong (and is wrong regardless of the
         | absurd exception handling situation)
         | 
         | 2. some of the exception handling makes no sense regardless, or
         | is incoherent
         | 
         | 3. a less absurd version of this actually happens (edit:
         | commonly in actual irl scenarios) if you put emphasis on
         | exception handling in the prompt
        
         | angry_albatross wrote:
         | I interpreted the function code as being a deliberately
         | exaggerated satirical example that was illustrative of the
         | experience he was having. So yes, in that example it was
         | probably told to be overly cautious, but I agree with him that
         | the default of LLMs seems to be a bit more cautious than I
         | would like.
        
       | sarchertech wrote:
       | Expert beginners program like this. I call it what it driven
       | development. Turns out a lot of code was written by expert
       | beginners because by many metrics they are prolifically
       | productive.
       | 
       | In go all SOTA agents are obsessed with being ludicrously
       | defensive against concurrency bugs. Probably because in addition
       | to what if driven development, there are a lot of blog posts
       | warning about concurrency bugs.
        
       | bobogei81123 wrote:
       | This is just AI trying to tell us how bad we designed our
       | programming languages to be when exceptions can be thrown pretty
       | much anywhere
        
         | recursive wrote:
         | So you think java's checked exceptions are a better model? No
         | opinion myself, but that way seems widely considered bad too.
        
           | nivertech wrote:
           | Why do you need exceptions at all? They're just a different
           | return types in disguise...
           | 
           | Also, division by zero should return Inf
        
             | threeducks wrote:
             | Or -Inf, depending on the sign of the zero, which might
             | catch some programmers by surprise, but is of course the
             | correct thing to do.
        
               | nivertech wrote:
               | a/0 =  Inf when a>0       a/0 = -Inf when a<0       a/0 =
               | NaN when a=0
        
               | dmoy wrote:
               | No this doesn't work either
               | 
               | In the context of say a/-0.001, a/-0.00000001,
               | a/-0.0000000001, a/<negative minimum epsilon for
               | denormalized floating point>, a/0
               | 
               | Then a/0 is negative when a>0, and positive when a<0
        
               | nivertech wrote:
               | Why not just to use IEEE 754?
               | 
               |  _> According to the IEEE 754 standard, floating-point
               | division by zero is not an error but results in special
               | values: positive infinity, negative infinity, or Not a
               | Number (NaN). The specific result depends on the
               | numerator_
        
               | dmoy wrote:
               | Because sometimes it's very wrong
               | 
               | Way back when during my EE course days, we had like a
               | whole semester devoted to weird edge cases like this, and
               | spent month on ieee754 (precision loss, Nan, divide by
               | zero, etc)
               | 
               | When you took an ieee754 divide by zero value as gospel
               | and put it in the context of a voltage divisor that is
               | always negative or zero, getting a positive infinity
               | value out of divide by zero was very wrong, in the sense
               | of "flip the switch and oh shit there's the magic smoke".
               | The solution was a custom divide function that would know
               | the context, and yield negative infinity (or some
               | placeholder value). It was a contrived example for EE
               | lab, but the lesson was - sometimes the standard is wrong
               | and you will cause problems if it's blindly followed.
               | 
               | Sometimes it's fine, but it depends on the domain
        
               | nivertech wrote:
               | With IEEE 754 you can always explicitly check for edge
               | cases.
               | 
               | But with exceptions you can't use SIMD / vectorization.
        
               | dmoy wrote:
               | Yea that's totally fair, you'd need to build it in as a
               | first class behavior of your code, doesn't necessarily
               | mean that exceptions is the right way to do it.
        
             | johnyzee wrote:
             | Unchecked exceptions are more like a shutdown event, which
             | can be intercepted at any point along the call stack, which
             | is useful and not like a return type.
        
               | nivertech wrote:
               | Why do you need the call stack at all?
        
             | layer8 wrote:
             | What about division of zero by zero?
        
             | dmoy wrote:
             | > division by zero should return Inf
             | 
             | Sometimes yes, sometimes no?
             | 
             | It's a domain specific answer, even ignoring the 0/0 case.
             | 
             | And also even ignoring the "which side of the limit are you
             | coming from?" where "a" and/or "b" might be negative. (Is
             | it positive infinity or negative infinity? The sign of "a"
             | alone doesn't tell you the answer)
             | 
             | Because sometimes the question is like "how many things per
             | box if there's N boxes"? Your answer isn't infinity, it's
             | an invalid answer altogether.
             | 
             | The _limit_ of 1 /x or -1/x might be infinity (or negative
             | infinity), and in some cases that might be what you want.
             | But sometimes it's not.
        
           | Terr_ wrote:
           | > So you think java's checked exceptions are a better model?
           | 
           | Checked Exceptions are a good concept which just needed more
           | syntactic-sugar. (Like easily specifying that one kind of
           | exception should be wrapped into another.) The badness is not
           | in the logic but in the ecology, the ways that junior/lazy
           | developers are incentivized to make a long-term mess for
           | short-term gain.
           | 
           | Meme reaction: http://imgur.com/iYE5nLA
           | 
           | Prior discussion:
           | https://news.ycombinator.com/item?id=42946597
        
       | criemen wrote:
       | It's also logically incoherent - division by zero can't occur,
       | because if b=0 then abs(b) < sys.float_info.epsilon.
       | 
       | Furthermore, the code is happy to return NaN from the pre-checks,
       | but replaces a NaN result from the division by None. That doesn't
       | make any sense from an API design standpoint.
        
       | dijksterhuis wrote:
       | I mean, the first three cases are just attempting to turn dynamic
       | into static typed... right? maybe just don't aim for uber-safety
       | in a dynamically typed language? :shrugs:
       | 
       | (I used to look out for kaparthy's papers ten years ago... i tend
       | to let out an audible sigh when i see his name today)
        
         | falcor84 wrote:
         | You shouldn't have the same expectations from a person's tweet
         | as you would from a paper. I don't see any issue with high
         | profile people who are careful in their professional work,
         | putting less thought-through output on social media. At least
         | as long as they don't intentionally/negligently spreading
         | misinformation, which I've never seen Karpathy do.
         | 
         | I for one really enjoy both his longer form work and his
         | shorter takes.
        
       | falcor84 wrote:
       | That code has many issues, but the one that bothers me the most
       | in practice is this tendency of adding imports inside functions.
       | I can only assume that it's an artifact of them optimizing for a
       | minimal number of edits somewhere in the process, but I expect
       | better.
        
         | kccqzy wrote:
         | It's to make imports lazy, to solve the issue of slow import at
         | startup.
        
           | falcor84 wrote:
           | While there are some cases where lazy imports are
           | appropriate, this function, and the vast majority of such
           | lazy imports that I get from Claude are not.
           | 
           | In particular, I can't think of any non-pathological
           | situation where a python developer should import logging and
           | update logging.basicConfig within an inner function.
        
       | jpcompartir wrote:
       | Most comments seem to be taking the code seriously, when it's
       | clearly satirical?
        
       | fkyoureadthedoc wrote:
       | Not sure why but it made me think of FizzBuzzEnterpriseEdition
       | https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
        
         | ineedasername wrote:
         | Woah, were they using junit 4.8.3 in that project? Someone was
         | flying by the seat of their pants, I hope they got sign-off on
         | that by legal & the CTO, that's the kind of cowboy coding
         | choice that can hurt a career.
        
       | glitchc wrote:
       | But what's the prompt that led to this output? Is it just a
       | simple "Write code to divide a by b?" or are there instructions
       | added for code safety or specific behaviours?
       | 
       | I know it's Karpathy, which is why the entire prompt is all the
       | more important to see.
        
         | johnisgood wrote:
         | "Write me a code that divides a by b and make sure it is safe
         | and handles all edge cases"[1] or something and some languages
         | have more than others.
         | 
         | [1] Probably with some "make you sure handle ALL cases in
         | existence", or emphasis, along those lines.
        
       | stargrazer wrote:
       | but then, why code with exceptions, why not perform pre-
       | flight/pre-validation checks and minimize exceptions to the truly
       | unknown?
        
       | jampekka wrote:
       | I've noted that LLMs tend to produce defensive code to a fault.
       | Lots of unnecessary checks, e.g. check for null/None/undefined
       | multiple times for same valie. This can lead to really hard to
       | read code, even for the LLM itself.
       | 
       | The RL objectives probably heavily penalize exceptions, but don't
       | reward much for code readability or simplicity.
        
       | comex wrote:
       | This is a parody but the phenomenon is real.
       | 
       | My uninformed suspicion is that this kind of defensive
       | programming somehow improves performance during RLVR. Perhaps the
       | model sometimes comes up with programs that are buggy enough to
       | emit exceptions, but close enough to correct that they produce
       | the right answer after swallowing the exceptions. So the model
       | learns that swallowing exceptions sometimes improves its reward.
       | It also learns that swallowing exceptions rarely _reduces_ its
       | reward, because if the model does come up with fully correct
       | code, that code usually won't raise exceptions in the first place
       | (at least not in the test cases it's being judged on), so adding
       | exception swallowing won't fail the tests even if it's
       | theoretically incorrect.
       | 
       | Again, this is pure speculation. Even if I'm right, I'm sure
       | another part of the reason is just that the training set contains
       | a lot of code written by human beginners, who also like to ignore
       | errors.
        
       | karpathy wrote:
       | Sorry I thought it would be clear and could have clarified that
       | the code itself is just a joke illustrating the point, as an
       | exaggeration. This was the thread if anyone is interested
       | 
       | https://chatgpt.com/share/68e82db9-7a28-8007-9a99-bc6f0010d1...
        
         | chis wrote:
         | I think there's always a danger of these foundational model
         | companies doing RLHF on non-expert users, and this feels like a
         | case of that.
         | 
         | The AIs in general feel really focused on making the user happy
         | - your example, and another one is how they love adding emojis
         | to the stout and over-commenting simple code.
        
           | cma wrote:
           | And more advanced users are more likely to opt out of
           | training on their data, Google gets around it with a free api
           | period where you can't opt out and I think from did some of
           | that too, through partnerships with tool companies, but not
           | sure if you can ever opt out there.
        
         | why_at wrote:
         | This part from the first try made me laugh:
         | if random.random() < 0.01:
         | logging.warning("This feels wrong. Aborting just in case.")
         | return None
        
         | bjourne wrote:
         | This is stunning English: "Perfect setup for satire. Here's a
         | Python function that fully commits to the bit -- a
         | traumatically over-trained LLM trying to divide numbers while
         | avoiding any conceivable danger:" "Traumatically over-trained",
         | while scoring zero google hits, is an amazingly good
         | description. How can it intuitively know what "traumatic over-
         | training" should mean for LLMs without ever having been taught
         | the concept?
        
       ___________________________________________________________________
       (page generated 2025-10-09 23:00 UTC)