[HN Gopher] Null References: The Billion Dollar Mistake (2009) [...
       ___________________________________________________________________
        
       Null References: The Billion Dollar Mistake (2009) [video]
        
       Author : colinprince
       Score  : 28 points
       Date   : 2023-02-28 18:03 UTC (4 hours ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | [deleted]
        
       | brundolf wrote:
       | (2009)
        
         | dang wrote:
         | Added. Thanks!
        
       | WalterBright wrote:
       | Nah. The billion dollar mistake is actually C arrays decaying to
       | pointers, enabling buffer overflows, the #1 cause of bugs and
       | malware injection in shipped C programs.
       | 
       | https://www.digitalmars.com/articles/C-biggest-mistake.html
       | 
       | It's simple to fix this in C, too.
        
         | marcodiego wrote:
         | Genuinely curious: why don't suggest that to WG14?
         | 
         | (WG14) is the ISO workgroup which maintains the C
         | specification.
        
           | Arch-TK wrote:
           | And what would it achieve to "fix" this (without re-designing
           | the rest of the language)?
           | 
           | Every piece of code such as:                   int a[SIZE];
           | foo(a, SIZE);
           | 
           | Would have to be rewritten to:                   int a[SIZE];
           | foo(&a[0], SIZE);
           | 
           | And this additional noise would just make C harder to write
           | for no reason. Rather than making C harder to write, just
           | pick a different programming language.
        
             | WalterBright wrote:
             | The article I referenced says how this is fixed. It is not
             | harder to write at all.
        
         | Arch-TK wrote:
         | Nothing is decaying. It is an implicit conversion. To say
         | something "decays" implies something else is lost, the array is
         | still there.
        
           | WalterBright wrote:
           | Something is lost - the array length. "Decays" is the correct
           | word.
        
             | Arch-TK wrote:
             | * * *
        
         | slaymaker1907 wrote:
         | I don't think it's quite so simple to fix since you still need
         | to decide when to actually check the size vs. when it is safe
         | to omit. While branch predictors are great, you can end up
         | having so many potential branches that it starts to lose
         | effectiveness. In practice, I rarely see anything besides
         | strings that lack the size variable, it's just that the bounds
         | may not be checked.
         | 
         | Null terminated strings were a horrible mistake though and
         | really should have been fat pointers.
        
         | nicoburns wrote:
         | They're probably BOTH quite literally billion dollars mistakes.
        
           | WalterBright wrote:
           | I remember the bad old DOS days where a null pointer write
           | would scramble DOS. I absolutely hated that, and it cost me a
           | _lot_ of extra work. Enter protected mode programming. It was
           | a miracle! Null pointer writes now meant a seg fault with a
           | traceback, and voila! A few minutes of fix and I 'm on my
           | way. I immediately switched all my dev work to a protected
           | mode system, and ported to real mode only as a last step.
           | 
           | Buffer overflows are the primary entry point for malware. Seg
           | faults are not. Hence the former are _far_ more costly.
        
           | unxdfa wrote:
           | I can confirm I have fucked up on numerous occasions which
           | resulted in NPEs which caused business processes to fail
           | spectacularly and write off millions of dollars instantly.
           | Fortunately _in every case_ it was possible to recover,
           | replay or repair the data so the only real cost was a little
           | bit of reputation and time and money. But that adds up across
           | tens of thousands of engineers, probably to billions by now
           | globally.
           | 
           | I learned a lot from this and discovered that the real issue
           | is that a process can fail spectacularly and do any damage at
           | all. There are so many other concerns other than NPEs which
           | need to be considered.
        
         | xjay wrote:
         | Not to forget that classic increment of a signed integer
         | waiting to overflow and trigger an exception on some critical
         | (unpatched) hard disk drive controller out there..
         | ++countdown;
        
         | live_video wrote:
         | the choice of von neumann architecture over harvard
         | architecture enabled this
        
         | dpkirchner wrote:
         | We could go further: the billion dollar mistake was allowing
         | values intended to be used as data to be executed (pre-NX bit).
         | Zero-terminated C strings is up there as well.
        
           | renox wrote:
           | > Zero-terminated C strings is up there as well.
           | 
           | I disagree here, remember that at the time some strings with
           | length implementations used only one or two bytes for the
           | length, this can creates lots of issues that zero terminated
           | strings don't have.
           | 
           | Of course nowadays zero terminated strings don't make sense
           | anymore.
        
           | GuB-42 wrote:
           | The code vs data distinction is a hardware thing, this is not
           | about C. More precisely, it is the difference between a
           | Harvard and a Von Neumann architecture. A Harvard
           | architecture has completely separate paths between
           | instructions and data: different buses, different memories. A
           | Von Neumann architecture has common instruction and data
           | paths, and therefore, naturally, data is executable. You can
           | write C code for both.
           | 
           | Modern PC-style hardware is kind of a hybrid, acts like
           | Harvard with regard to cache, and like Von Neumann with
           | regard to RAM. Furthermore, it has a fancy MMU that allows
           | for things like the NX bit. All C compilers/linkers I am
           | aware of know the difference between code and data and are
           | able to put each one in the appropriate section, what is done
           | after that is the OS/hardware responsibility.
           | 
           | As for zero-terminated strings, I also think it is mostly a
           | mistake, though it does have a few advantages. You can still
           | work with size+pointer though, using mem- instead of the str-
           | functions, and "%.*s" in printf(), not ideal though.
        
         | Mindless2112 wrote:
         | Absolutely. Null pointers doesn't even make the list of things
         | that I worry about when writing C code.
        
         | LanceH wrote:
         | Which would all be a rounding area if C gets to take credit for
         | everything produced using it.
        
           | asguy wrote:
           | This is what C/C++ haters, anti x86/amd64 snobs, and Rust
           | elitists always forget: what works, works. Sure, it could be
           | a local maximum, hopefully there will be better, but who
           | knows?
           | 
           | To quote Sean Connery in The Rock:
           | 
           | Losers always whine about their best. Winners go home and
           | fuck the prom queen!
           | 
           | https://m.youtube.com/watch?v=gXDSxgDUv-c
        
             | Oxidation wrote:
             | It doesn't even have to be a local maximum, just higher up
             | some hillside will do (maybe it's even on a different
             | hillside to the one you're on at the moment).
        
             | pjmlp wrote:
             | Some of us are old enough to be coding when C was only
             | relevant for university departments privileged to have UNIX
             | boxes.
             | 
             | So we know there are other ways, we used systems with zero
             | lines of C into them.
             | 
             | The prom queen came naked offering herself to everyone and
             | the party was done for the other folks.
        
               | asguy wrote:
               | That excuse sounds like the guy in highschool who would
               | have treated the girl so much better, but she was into
               | assholes.
               | 
               | If you are old enough to remember those days, then you
               | remember COBOL, Algol, Fortran, Pascal, BASIC, Ada,
               | Oberon, Lisp/Scheme, Forth, O'Caml etc. They're all great
               | languages, some still have their uses. There's a reason
               | all of the major operating systems have cores written in
               | C/C++. It's entirely because they're pragmatic and
               | "work", and not some conspiracy.
               | 
               | Edit: although now that I write it, what if C/C++ was
               | planted on earth by an alien intelligence in order to
               | slow down the development of the human race.
        
               | pjmlp wrote:
               | The power of free beer with source tapes is very mighty.
               | 
               | Thankfully governments have finally start paying
               | attention regarding software liability.
        
               | WalterBright wrote:
               | > Thankfully governments have finally start paying
               | attention regarding software liability.
               | 
               | And we'll see a great slowdown in the software industry.
        
               | pjmlp wrote:
               | When people buy damaged goods, they ask for a refund,
               | they don't expect to close and reopen the box and have
               | the product reappear in perfect shape.
               | 
               | The industry has miseducated them, and now it is finally
               | happening, software products aren't a special snowflake.
               | 
               | Digital stores with returns, consulting contracts with
               | warranty clauses with fixes at the expense of provider,
               | and naturally cyber security bills.
               | 
               | Move fast and break things only works due to lack of
               | liability.
        
           | renox wrote:
           | While I agree with you. It's still infuriating that these
           | (serious) flaws in C apparently can't be fixed/evolved inside
           | the language and that instead you have to use a different
           | language instead..
           | 
           | That's throwing the baby with the bathwater :-(
        
         | munchler wrote:
         | That's a problem that's pretty much limited to C. Null pointers
         | have infected many other languages as well.
        
         | mjevans wrote:
         | I think golang's slices are a better solution than the linked
         | article.
         | 
         | https://go.dev/ref/spec#Slice_types
         | 
         | Slices can still be nil (null), but it isn't an unsafe memory
         | access operation, just another type of potentially useful or
         | potentially errant invocation to handle.
        
           | kristoff_it wrote:
           | Go's slices are more akin to ArrayLists / Vectors in other
           | languages, since they also manage the underlying buffer.
        
           | josephg wrote:
           | Zig and Rust also support array slices. But they can't be
           | null, because that's - as the article says - a mistake.
           | 
           | https://ziglang.org/documentation/master/#Slices
           | 
           | https://doc.rust-lang.org/book/ch04-03-slices.html
        
             | WalterBright wrote:
             | D slices can be null, but they're not a mistake, as the
             | runtime will not let you read/write a 0 length array.
        
             | pjmlp wrote:
             | Other languages avoid the mistake by preventing direct
             | access unless either there is a null check or they are
             | declared as non nullable.
        
         | convolvatron wrote:
         | I think this is actually an artifact of the 'check the single
         | return value for errors' that requires that the return domain
         | contain at least one error code point.
         | 
         | how else you might structure errors without changing much of
         | the rest of C left for an exercise
        
           | amenghra wrote:
           | just require the programmer to check errno after every
           | function call? /s
        
             | Gibbon1 wrote:
             | Strange but true you can create functions where you pass a
             | pointer to an error variable.                 error_t oops;
             | int x = foo(10, &oops);       if(oops)         goto whoops;
             | int x = foo(10, 0);  // yolo!
        
             | convolvatron wrote:
             | omg. I forgot about that. the one thing even worse than
             | overloading the return domain.
        
       | revskill wrote:
       | The compiler should check for null reference before deferencing
       | here.
        
         | josephg wrote:
         | Why? So the program can crash? It'll usually crash anyway when
         | you read from the first memory page.
        
         | Arch-TK wrote:
         | At the cost of basically all performance or incredible compiler
         | complexity.
         | 
         | A better solution: Implement a different language with a better
         | type system instead. Or pick one of the hundreds that already
         | exist and can represent the concept of a tagged union without
         | having to implement it manually.
        
           | pjmlp wrote:
           | As proven by languages like Eiffel or Kotlin, it is quite
           | alright.
        
           | kaba0 wrote:
           | I am absolutely in favor of languages that exclude nulls from
           | their type systems, bur your first point is a very simplictic
           | take.
           | 
           | Null pointer checks are _very, very_ cheap, no additional
           | memory fetch since the pointer value is needed either way,
           | easy to predict the slow path _and_ if we are talking about
           | languages that can deoptimize, then it is literally free
           | (checked by hardware either way) - > a null value will cause
           | a segfault, which will deoptimize the code on the slow path
           | and continue from there. So it is not a problem from a
           | performance point of view.
        
       | pjmlp wrote:
       | Besides the approach taken by some FP languages, others like
       | Eiffel already fixed it in the 1990's, while having nullable
       | types, by making it a compiler error to access reference types
       | without checking for null, unless the types are declared a non
       | nullable.
       | 
       | So it just took some time to get mainstream.
        
       | mkoubaa wrote:
       | I have a hard time taking this seriously. The alternative to null
       | pointers isn't no null pointers, it's programmers hand rolling
       | their own conventions around optional pointers, and it would have
       | been a nightmare ( at least in C)
        
       | petilon wrote:
       | If null references are a mistake, isn't initializing an array
       | index variable to -1 a mistake too?
        
         | itronitron wrote:
         | Yes, I suppose null references are only a problem if you are
         | unable to write a conditional statement.
        
           | josephg wrote:
           | The problem with null references is you _have to_ write those
           | conditional statements everywhere, otherwise your program
           | might crash. You generally know as the programmer which
           | pointers you expect to be nullable and which should always be
           | an object. But the compiler has no idea, so it can't help
           | make sure you have null checks in all the places you need
           | them.
        
       | dang wrote:
       | Related:
       | 
       |  _Tony Hoare 's Null References: The Billion Dollar Mistake_ -
       | https://news.ycombinator.com/item?id=30719472 - March 2022 (13
       | comments)
       | 
       |  _Null References: The Billion Dollar Mistake_ -
       | https://news.ycombinator.com/item?id=22019627 - Jan 2020 (150
       | comments)
       | 
       |  _Null References: The Billion Dollar Mistake - Tony Hoare (2009)
       | [video]_ - https://news.ycombinator.com/item?id=11798518 - May
       | 2016 (79 comments)
       | 
       |  _Tony Hoare / Historically Bad Ideas: "Null References: The
       | Billion Dollar Mistake"_ -
       | https://news.ycombinator.com/item?id=473158 - Feb 2009 (2
       | comments)
        
       | vippy wrote:
       | Big facts. Write Scala, stay frosty.
        
         | dang wrote:
         | Could you please stop posting unsubstantive comments? You've
         | unfortunately done that repeatedly, and we're trying for
         | something else here.
         | 
         | Fortunately you've also posted good comments, so this should be
         | easy to fix.
         | 
         | If you wouldn't mind reviewing
         | https://news.ycombinator.com/newsguidelines.html and taking the
         | intended spirit of the site more to heart, we'd be grateful.
        
       ___________________________________________________________________
       (page generated 2023-02-28 23:01 UTC)