[HN Gopher] LLMs are mortally terrified of exceptions
___________________________________________________________________
LLMs are mortally terrified of exceptions
https://x.com/karpathy/status/1976082963382272334
Author : nought
Score : 91 points
Date : 2025-10-09 17:16 UTC (5 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| mwkaufma wrote:
| Even when they're not AI slop, these kinds of "paranoid sanity
| checks" are the software equivalent of security-theater.
| bwfan123 wrote:
| Form over function is what they are trained for. So, verbose
| commentary, needless readmes, and emojis all serve that
| purpose.
| mwkaufma wrote:
| Coding for the reviewer, not the user.
| simonw wrote:
| Yeah, I really hate code like this because it generally ends up
| full of codepaths that have never been exercised, so there's
| all sorts of potential for weird behavior and unexpected edge
| cases. Plus it's harder to review.
| shiandow wrote:
| Is there a way to read the rest?
| hugo1789 wrote:
| https://xcancel.com/karpathy/status/1976082963382272334
| wffurr wrote:
| Turns out computer math is actually super hard. Basic operations
| entail all kinds of undefined behavior and such. This code is a
| bit verbose but otherwise familiar.
| Den_VR wrote:
| If we wanted defined behavior we'd build systems with Karnaugh
| maps all the way down.
| falcor84 wrote:
| # Step 3: Preemptively check for catastrophic magnitude
| differences if abs(a) > sys.float_info.max / 2:
| logging.warning("Value of a might cause overflow. Returning
| infinity just to be sure") return math.copysign
| (float('inf'), a) if abs(b) < sys.float_info.epsilon:
| logging.warning("Value of b dangerously close to zero.
| Returning NaN defensively.") return math.nan
|
| Does the above code make any sense? I've not worked with this
| sort of stuff before, but it seems entirely unreasonable to me
| to check them individually. E.g. if 1 < b < a, then it seems
| insane to me to return float('inf') for a large but finite a.
| im3w1l wrote:
| Ignoring the sign of _b_ for big _a_ can 't be right.
| OutOfHere wrote:
| Is this Claude? GPT is not like this. To me it looks like
| Anthropic is just maximizing billable token use as usual, and it
| has nothing really to do with exceptions per se.
| TuxSH wrote:
| From the UI it indeed seems to be Claude
| constantcrying wrote:
| If you are dividing two numbers with no prior knowledge of these
| numbers or any reasonable assumptions you can make and this code
| is used where you can not rely on the caller to catch an
| exception and the code is critical for the product, then this is
| necessary.
|
| If you are actually doing safety critical software, e.g.
| aerospace, medicine or automotive, then this is a good
| precaution, although you will not be writing in Python.
| mewpmewp2 wrote:
| I might agree with that, and maybe the example posted by
| Karpathy is not the greatest, but what I'm constantly being
| faced with is try catches where it will fail silently or return
| a fallback/mock response, which essentially means that system
| will behave unexpectedly in a more subtle way down the line
| while leaving you clueless to as what the issue was.
|
| I have to constantly remind Claude that we want to fail fast.
| isoprophlex wrote:
| A good 10% of my Claude.md is yelling at it that no i don't
| want you to silently handle exceptions six calls deep into
| the stack and no please don't wrap my return values in weird
| classes full of dumb status enums "for safety"
|
| Just raise god damn it
| hyperpape wrote:
| I'm not sure returning None is any safer than an Exception,
| because the caller still has to check.
| metalcrow wrote:
| Given that the output describes the function as being done "with
| extraordinary caution, because you never know what can go wrong",
| i would guess that the undisclosed prompt was something similar
| to "generate a division function in python that handles all
| possible edges cases. be extremely careful". Which seems to say
| less about LLM training and more about them doing exactly what
| they are told.
| freehorse wrote:
| Aside from the absurdity and obvious satirical intention,
|
| 1. the code is actually wrong (and is wrong regardless of the
| absurd exception handling situation)
|
| 2. some of the exception handling makes no sense regardless, or
| is incoherent
|
| 3. a less absurd version of this actually happens (edit:
| commonly in actual irl scenarios) if you put emphasis on
| exception handling in the prompt
| angry_albatross wrote:
| I interpreted the function code as being a deliberately
| exaggerated satirical example that was illustrative of the
| experience he was having. So yes, in that example it was
| probably told to be overly cautious, but I agree with him that
| the default of LLMs seems to be a bit more cautious than I
| would like.
| sarchertech wrote:
| Expert beginners program like this. I call it what it driven
| development. Turns out a lot of code was written by expert
| beginners because by many metrics they are prolifically
| productive.
|
| In go all SOTA agents are obsessed with being ludicrously
| defensive against concurrency bugs. Probably because in addition
| to what if driven development, there are a lot of blog posts
| warning about concurrency bugs.
| bobogei81123 wrote:
| This is just AI trying to tell us how bad we designed our
| programming languages to be when exceptions can be thrown pretty
| much anywhere
| recursive wrote:
| So you think java's checked exceptions are a better model? No
| opinion myself, but that way seems widely considered bad too.
| nivertech wrote:
| Why do you need exceptions at all? They're just a different
| return types in disguise...
|
| Also, division by zero should return Inf
| threeducks wrote:
| Or -Inf, depending on the sign of the zero, which might
| catch some programmers by surprise, but is of course the
| correct thing to do.
| nivertech wrote:
| a/0 = Inf when a>0 a/0 = -Inf when a<0 a/0 =
| NaN when a=0
| dmoy wrote:
| No this doesn't work either
|
| In the context of say a/-0.001, a/-0.00000001,
| a/-0.0000000001, a/<negative minimum epsilon for
| denormalized floating point>, a/0
|
| Then a/0 is negative when a>0, and positive when a<0
| nivertech wrote:
| Why not just to use IEEE 754?
|
| _> According to the IEEE 754 standard, floating-point
| division by zero is not an error but results in special
| values: positive infinity, negative infinity, or Not a
| Number (NaN). The specific result depends on the
| numerator_
| dmoy wrote:
| Because sometimes it's very wrong
|
| Way back when during my EE course days, we had like a
| whole semester devoted to weird edge cases like this, and
| spent month on ieee754 (precision loss, Nan, divide by
| zero, etc)
|
| When you took an ieee754 divide by zero value as gospel
| and put it in the context of a voltage divisor that is
| always negative or zero, getting a positive infinity
| value out of divide by zero was very wrong, in the sense
| of "flip the switch and oh shit there's the magic smoke".
| The solution was a custom divide function that would know
| the context, and yield negative infinity (or some
| placeholder value). It was a contrived example for EE
| lab, but the lesson was - sometimes the standard is wrong
| and you will cause problems if it's blindly followed.
|
| Sometimes it's fine, but it depends on the domain
| nivertech wrote:
| With IEEE 754 you can always explicitly check for edge
| cases.
|
| But with exceptions you can't use SIMD / vectorization.
| dmoy wrote:
| Yea that's totally fair, you'd need to build it in as a
| first class behavior of your code, doesn't necessarily
| mean that exceptions is the right way to do it.
| johnyzee wrote:
| Unchecked exceptions are more like a shutdown event, which
| can be intercepted at any point along the call stack, which
| is useful and not like a return type.
| nivertech wrote:
| Why do you need the call stack at all?
| layer8 wrote:
| What about division of zero by zero?
| dmoy wrote:
| > division by zero should return Inf
|
| Sometimes yes, sometimes no?
|
| It's a domain specific answer, even ignoring the 0/0 case.
|
| And also even ignoring the "which side of the limit are you
| coming from?" where "a" and/or "b" might be negative. (Is
| it positive infinity or negative infinity? The sign of "a"
| alone doesn't tell you the answer)
|
| Because sometimes the question is like "how many things per
| box if there's N boxes"? Your answer isn't infinity, it's
| an invalid answer altogether.
|
| The _limit_ of 1 /x or -1/x might be infinity (or negative
| infinity), and in some cases that might be what you want.
| But sometimes it's not.
| Terr_ wrote:
| > So you think java's checked exceptions are a better model?
|
| Checked Exceptions are a good concept which just needed more
| syntactic-sugar. (Like easily specifying that one kind of
| exception should be wrapped into another.) The badness is not
| in the logic but in the ecology, the ways that junior/lazy
| developers are incentivized to make a long-term mess for
| short-term gain.
|
| Meme reaction: http://imgur.com/iYE5nLA
|
| Prior discussion:
| https://news.ycombinator.com/item?id=42946597
| criemen wrote:
| It's also logically incoherent - division by zero can't occur,
| because if b=0 then abs(b) < sys.float_info.epsilon.
|
| Furthermore, the code is happy to return NaN from the pre-checks,
| but replaces a NaN result from the division by None. That doesn't
| make any sense from an API design standpoint.
| dijksterhuis wrote:
| I mean, the first three cases are just attempting to turn dynamic
| into static typed... right? maybe just don't aim for uber-safety
| in a dynamically typed language? :shrugs:
|
| (I used to look out for kaparthy's papers ten years ago... i tend
| to let out an audible sigh when i see his name today)
| falcor84 wrote:
| You shouldn't have the same expectations from a person's tweet
| as you would from a paper. I don't see any issue with high
| profile people who are careful in their professional work,
| putting less thought-through output on social media. At least
| as long as they don't intentionally/negligently spreading
| misinformation, which I've never seen Karpathy do.
|
| I for one really enjoy both his longer form work and his
| shorter takes.
| falcor84 wrote:
| That code has many issues, but the one that bothers me the most
| in practice is this tendency of adding imports inside functions.
| I can only assume that it's an artifact of them optimizing for a
| minimal number of edits somewhere in the process, but I expect
| better.
| kccqzy wrote:
| It's to make imports lazy, to solve the issue of slow import at
| startup.
| falcor84 wrote:
| While there are some cases where lazy imports are
| appropriate, this function, and the vast majority of such
| lazy imports that I get from Claude are not.
|
| In particular, I can't think of any non-pathological
| situation where a python developer should import logging and
| update logging.basicConfig within an inner function.
| jpcompartir wrote:
| Most comments seem to be taking the code seriously, when it's
| clearly satirical?
| fkyoureadthedoc wrote:
| Not sure why but it made me think of FizzBuzzEnterpriseEdition
| https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
| ineedasername wrote:
| Woah, were they using junit 4.8.3 in that project? Someone was
| flying by the seat of their pants, I hope they got sign-off on
| that by legal & the CTO, that's the kind of cowboy coding
| choice that can hurt a career.
| glitchc wrote:
| But what's the prompt that led to this output? Is it just a
| simple "Write code to divide a by b?" or are there instructions
| added for code safety or specific behaviours?
|
| I know it's Karpathy, which is why the entire prompt is all the
| more important to see.
| johnisgood wrote:
| "Write me a code that divides a by b and make sure it is safe
| and handles all edge cases"[1] or something and some languages
| have more than others.
|
| [1] Probably with some "make you sure handle ALL cases in
| existence", or emphasis, along those lines.
| stargrazer wrote:
| but then, why code with exceptions, why not perform pre-
| flight/pre-validation checks and minimize exceptions to the truly
| unknown?
| jampekka wrote:
| I've noted that LLMs tend to produce defensive code to a fault.
| Lots of unnecessary checks, e.g. check for null/None/undefined
| multiple times for same valie. This can lead to really hard to
| read code, even for the LLM itself.
|
| The RL objectives probably heavily penalize exceptions, but don't
| reward much for code readability or simplicity.
| comex wrote:
| This is a parody but the phenomenon is real.
|
| My uninformed suspicion is that this kind of defensive
| programming somehow improves performance during RLVR. Perhaps the
| model sometimes comes up with programs that are buggy enough to
| emit exceptions, but close enough to correct that they produce
| the right answer after swallowing the exceptions. So the model
| learns that swallowing exceptions sometimes improves its reward.
| It also learns that swallowing exceptions rarely _reduces_ its
| reward, because if the model does come up with fully correct
| code, that code usually won't raise exceptions in the first place
| (at least not in the test cases it's being judged on), so adding
| exception swallowing won't fail the tests even if it's
| theoretically incorrect.
|
| Again, this is pure speculation. Even if I'm right, I'm sure
| another part of the reason is just that the training set contains
| a lot of code written by human beginners, who also like to ignore
| errors.
| karpathy wrote:
| Sorry I thought it would be clear and could have clarified that
| the code itself is just a joke illustrating the point, as an
| exaggeration. This was the thread if anyone is interested
|
| https://chatgpt.com/share/68e82db9-7a28-8007-9a99-bc6f0010d1...
| chis wrote:
| I think there's always a danger of these foundational model
| companies doing RLHF on non-expert users, and this feels like a
| case of that.
|
| The AIs in general feel really focused on making the user happy
| - your example, and another one is how they love adding emojis
| to the stout and over-commenting simple code.
| cma wrote:
| And more advanced users are more likely to opt out of
| training on their data, Google gets around it with a free api
| period where you can't opt out and I think from did some of
| that too, through partnerships with tool companies, but not
| sure if you can ever opt out there.
| why_at wrote:
| This part from the first try made me laugh:
| if random.random() < 0.01:
| logging.warning("This feels wrong. Aborting just in case.")
| return None
| bjourne wrote:
| This is stunning English: "Perfect setup for satire. Here's a
| Python function that fully commits to the bit -- a
| traumatically over-trained LLM trying to divide numbers while
| avoiding any conceivable danger:" "Traumatically over-trained",
| while scoring zero google hits, is an amazingly good
| description. How can it intuitively know what "traumatic over-
| training" should mean for LLMs without ever having been taught
| the concept?
___________________________________________________________________
(page generated 2025-10-09 23:00 UTC)