[HN Gopher] The Descent to C (2013)
       ___________________________________________________________________
        
       The Descent to C (2013)
        
       Author : x-sp
       Score  : 111 points
       Date   : 2021-09-01 11:51 UTC (2 days ago)
        
 (HTM) web link (www.chiark.greenend.org.uk)
 (TXT) w3m dump (www.chiark.greenend.org.uk)
        
       | 1vuio0pswjnm7 wrote:
       | "The Python interpreter is written in C, for example."
       | 
       | https://github.com/RustPython/RustPython
        
       | dang wrote:
       | Some past threads:
       | 
       |  _The Descent to C_ -
       | https://news.ycombinator.com/item?id=15445059 - Oct 2017 (2
       | comments)
       | 
       |  _The Descent to C_ -
       | https://news.ycombinator.com/item?id=8127499 - Aug 2014 (15
       | comments)
       | 
       |  _The descent to C_ -
       | https://news.ycombinator.com/item?id=7134798 - Jan 2014 (230
       | comments)
        
       | nihilist_t21 wrote:
       | Hey! The author wrote PuTTY!
       | 
       | That was a very informative read. I feel like I am the exact
       | target audience: I'm coming from a programing background steeped
       | in C# and I'm learning C from the K&R book.
        
         | tempodox wrote:
         | There is also the excellent "Modern C" book:
         | 
         | https://modernc.gforge.inria.fr
         | 
         | The page has a link to a free PDF version.
        
         | besnn00 wrote:
         | Never thought pointers could expire, but it makes sense.
        
           | jandrese wrote:
           | That's a fairly common trap for new C coders. Setting a
           | pointer to something on the stack and then returning that
           | pointer when you exit the function. It's extra insidious
           | because it will appear to be working fine until you call
           | another function and then try to dereference the pointer.
           | Then your data suddenly becomes corrupt even though your
           | program was nowhere near it.
        
           | wizzwizz4 wrote:
           | This is (sort of) the idea behind Rust's lifetimes. (Rust
           | focuses more on scope, whereas pointer expiry is more about
           | "object lifetimes", but it's basically the same thing.)
        
             | tialaramex wrote:
             | I don't think that's true, it's just that the lifetime of a
             | stack object is limited by its scope. Clearly when it goes
             | out of scope its lifetime needs to end. You can end that
             | lifetime prematurely, for example if you drop(x) then the
             | lifetime of x ends immediately [ in fact the implementation
             | of drop is entirely empty, but it takes the actual x itself
             | as a parameter, not a reference to it, and so when the
             | empty function exits the parameter's lifetime ends and x
             | goes away ] even though it hasn't gone out of scope in your
             | function where you created x.
             | 
             | I believe once upon a time Rust needed a lot more hand-
             | holding rather than just inferring lifetimes from scope,
             | but in modern Rust the behaviour feels pretty natural for
             | scopes.
        
               | _kst_ wrote:
               | Scope and lifetime are two different things, but they're
               | related in the case of automatic (local) variables.
               | 
               |  _Scope_ is the region of program text in which an
               | identifier is visible.
               | 
               |  _Lifetime_ is the duration during program execution in
               | which an object (stored in memory) exists.
               | 
               | If you define a local variable, the scope of its
               | identifier extends from its declaration to the end of the
               | enclosing block; its lifetime is the execution of the
               | enclosing block.
               | 
               | If you allocate an object on the heap by calling
               | `malloc()`, its lifetime ends when its deallocated by
               | calling `free()`.
               | 
               | An object defined at file scope or with the `static`
               | keyword has a lifetime that's the entire execution of the
               | program.
               | 
               | In either case, if you have a pointer to the object, that
               | pointer becomes invalid when the object it points to
               | reaches the end of its lifetime. (And C doesn't make it
               | particularly difficult to cause problems by trying to
               | access an object that no longer exists.)
        
               | steveklabnik wrote:
               | Rust used to use lexical scope, but now uses scope based
               | on the control flow graph.
        
       | markhahn wrote:
       | Surprise: there's a whole bunch of different reality under your
       | the convenient illusions of your "high level" programming
       | language.
       | 
       | Hardware is the reality. It's not very much like the Java or
       | Python programming model. We shouldn't hide this from
       | programmers.
        
       | adamrezich wrote:
       | great article, very informative and easy to read. I wish I
       | would've known about this when I was learning C/C++ (around the
       | time it was published no less), coming from higher-level
       | languages like C# and PHP.
        
         | jimbob45 wrote:
         | How is C# higher-level than C? I'm not aware of anything you
         | can do in C that you can't do in C# as a first-class feature.
         | 
         | Edit: I define the level of a language as the lowest-possible
         | feature. I guess others define level as the highest-possible
         | feature? I don't really know who's right here.
        
           | Koshkin wrote:
           | In the same way as C is a higher-level language compared to
           | assembler.
        
           | jonsen wrote:
           | "...a high-level programming language is a programming
           | language with strong abstraction from the details of the
           | computer.":
           | 
           | https://en.m.wikipedia.org/wiki/High-
           | level_programming_langu...
        
           | dahfizz wrote:
           | C# has exceptions, OO classes / interfaces, `unsafe`, etc.
           | Lots of features that make it a higher level language than
           | pure C.
        
       | globular-toast wrote:
       | > No object orientation
       | 
       | I feel like C exists at a level below such concepts. Simply being
       | able to define a function `void do_stuff(struct mystruct *obj)`
       | opens the door to object-oriented style programming. A lot of
       | people seem to define OOP by the presence of superficial stuff
       | like inheritance, polymorphism etc, but really those are
       | additional concepts that aren't useful for every program. The
       | real difference is mutating state on the heap. So you could say C
       | _is_ an object-oriented language, by default, because it doesn 't
       | stop you doing this stuff, unlike a higher-level language like
       | Clojure which simply doesn't have mutation (for the most part).
       | Or you could say C is a functional language because if you don't
       | explicitly pass pointers then you get copies. Really it's both
       | and it's neither. It's whatever you want it to be.
        
         | PEJOE wrote:
         | That you can do functional or OOP in C does not make C either
         | kind of language, it just means that C is flexible enough that
         | you can make the computer do things the way you want it to, no
         | matter what that means, and other languages purposefully
         | prevent you from doing what you might want to do.
         | 
         | C++ is object oriented not because it has compile time support
         | for polymorphism or any of that other bad programming practice,
         | but because classes have code sections that live with them,
         | whether on the stack or in the heap, that can operate only on
         | memory belonging to that instance of the class.
         | 
         | Object oriented programming is a coding style and choice. Some
         | languages make it a first class part of the language design. It
         | is purposefully not part of C.
         | 
         | However you can do OOP like things in C: a popular paradigm is
         | to pass around pointers to structs that (should) live in the
         | heap, and to have a number of functions which work on these
         | structs. This is very similar in practice and mental modelling
         | to OOP as users of C++ might know it, but is distinct in that
         | no code ever lives in the stack or heap, and no code is
         | restricted from operating on any of the program memory.
        
           | AnimalMuppet wrote:
           | I mostly agree with you. But "or any of that other bad
           | programming practice"? Polymorphism is not a bad programming
           | practice. Yes, it can be misused. No, that doesn't make it
           | bad in and of itself.
        
           | tialaramex wrote:
           | Hmm. In what sense do you believe that class has a code
           | section that "lives" on the stack or heap?
           | 
           | On a modern system you can't usually do that because of W^X
           | rules (also on a non-x86 modern system the performance would
           | be _abysmal_ if you tried because why waste transistors
           | supporting something only crazy people would want?)
           | 
           | So perhaps _notionally_ in the abstract machine if I have
           | sixteen Clowns in a C++ vector there are sixteen copies of
           | the Clown method squirt_water_at() in the vector too, but I
           | assure you all the compiler emits is one copy of
           | squirt_water_at() for Clowns, to the text segment with the
           | rest of the program code, and _maybe_ if Clowns are virtual,
           | a pointer to a table of such functions lives with each Clown
           | just in case there are Jugglers and LionTamers in the vector
           | too - although compilers can sometimes figure out a rationale
           | for not bothering.
        
             | kaba0 wrote:
             | Regardinf W^X, doesn't the Linux kernel has some optional
             | expensive debug operation that can be turned on/off through
             | a self-modifying code removing the expensive branching?
        
         | AnimalMuppet wrote:
         | No, mutating stuff on the heap is _not_ the real definition of
         | OOP. That 's the definition of "mutable" programming, which is
         | not a term that we use a lot, but it obviously is the opposite
         | of "immutable" programming, which is where you _can 't_ change
         | stuff on the heap.
        
       | DannyB2 wrote:
       | The following is NOT a criticism of C. Just pointing out
       | different problem domains.
       | 
       | > Modern high-level languages generally try to arrange that you
       | don't need to think
       | 
       | > or even know - about how the memory in a computer is actually
       | organised
       | 
       | Modern high-level languages try to arrange that you don't need to
       | focus on the irrelevant. If you're working on, say, an accounting
       | system, memory layout is not part of the problem you are trying
       | to solve.
       | 
       | For certain applications, C is simply too low level.
       | 
       | A language is too low level when it forces you to focus on the
       | irrelevant.
       | 
       | For low level operations you probably cannot beat C.
        
         | kaba0 wrote:
         | As per the somewhat famous blog post: C is not a low-level
         | language. Especially on todays CPU's I fail to see why would we
         | consider C anything close to truly low level. It has no real
         | way of managing cache, has absolutely zero support for vector
         | instructions, etc. *
         | 
         | With these in mind, Rust is lower and higher level than C at
         | the same time.
         | 
         | * other than some compiler specific pragmas, but I would be
         | hesitant to call that natively supported
        
           | munificent wrote:
           | _> Especially on todays CPU's I fail to see why would we
           | consider C anything close to truly low level._
           | 
           | It's effectively the lowest you can go _without throwing
           | portability out the window._ It doesn 't let you manage
           | caches directly, but it gives you good control over memory
           | layout in general, and that's often enough to give you good
           | cache usage across a variety of chips.
           | 
           | If you want to go lower than that, you're probably looking
           | for assembly.
        
             | tialaramex wrote:
             | It's a poor fit for this role, even if it is in practice
             | what we have available today.
             | 
             | One of my favourite examples is volatile. Volatile is
             | _more_ crazy in C++ but it 's pretty crazy even in C. In
             | both cases the standard basically shrugs, "Hope you know
             | what you're doing because we sure don't" and offers no real
             | insight into what this feature promises to do for you. But
             | there is no other mechanism provided for MMIO.
             | 
             | Think of Rust's std::ptr::read_volatile (and
             | write_volatile). These intrinsics do the thing you actually
             | wanted (reading, or writing, a fixed size blob of "memory"
             | that presumably wasn't really just RAM) and thus are
             | important for writing device drivers and so on with MMIO.
             | 
             | [ You may be thinking, "But I need the _correct_ size of
             | blob read or written or my driver won 't work", Rust has
             | generics, so these functions are generic over the integer
             | type you're reading/ writing, if you read a u64 that's a
             | 64-bit read, if you write four u8s that 4 x 8-bit writes
             | and so on ]
             | 
             | But C's volatile is a type qualifier instead. Why? Would it
             | mean anything to, for example, integer divide an MMIO fetch
             | by fifteen and write it back? a/= 15; No. So then why make
             | it a type qualifier? When volatile was added to C they'd
             | only just invented simple optimisations like re-ordering so
             | they had no idea this was a bad idea, and it seemed simpler
             | than adding an intrinsic (though not by much) but today we
             | know better.
        
       | smcameron wrote:
       | > (In fact, that's what array[i] means - the language defines it
       | to be a synonym for *(array+i).)
       | 
       | To really drive home the primitiveness of C arrays, should
       | probably also mention that, because addition is commutative, you
       | _could_ also write                    i[array]
       | 
       | and somewhat surprisingly, it will compile, and work, and it
       | means "*(i + array)" which is equivalent to "*(array + i)"
       | 
       | But nobody really does that, because that would be kind of
       | insane.
        
         | lifthrasiir wrote:
         | If array were, say, uint32_t* then what `*(array + i)` would do
         | is actually `(intptr_t)array + i * 4` and not `(intptr_t)array
         | + i`. If array were uint16_t* then it's `(intptr_t)array + i *
         | 2`. In short the way the pointer arithmetic gets translated
         | greatly depends on the type of the pointee and thus is not as
         | primitive as it can be.
        
           | kzrdude wrote:
           | I think it depends on the coder if it makes more sense to
           | them to write out the scale by element size or not. For me,
           | the `+` is just pointer arithmetic and of course an addition
           | in number of elements (so I don't think of the scaling at
           | all).
           | 
           | Just saying that "actually it's just array + i" makes more
           | sense - for me(!).
        
         | Zababa wrote:
         | This is actually a really good way to drive home that arrays
         | don't really "exists" in C and are just syntaxic sugar for
         | pointer arithmetic.
        
         | unwind wrote:
         | Once the point is home, you can drive it a little bit more by
         | exploiting the fact that string literals can be converted to
         | pointers to the first characters, and do
         | putc(2["ABCDEF"], stdout);
         | 
         | This prints 'C'.
        
           | [deleted]
        
           | [deleted]
        
           | danielozcpp wrote:
           | int const* const x; // C
           | 
           | int const& x; // C++
           | 
           | A reference is functionally equivalent to a const pointer.
           | (Reference reassignment is disallowed. Likewise, you cannot
           | reassign a const pointer. A const pointer is meant to keep
           | its pointee [address].) The difference between them is that
           | C++ const references also allow non-lvalue arguments
           | (temporaries).
           | 
           | It is much easier to read from right to left when decoding
           | types. Look for yourself:
           | 
           | - double (* const convert_to_deg)(double const x) // const
           | pointer to function taking a const double and returning
           | double
           | 
           | - int const (* ptr_to_arr)[42]; // pointer to array of 42
           | const ints
           | 
           | - int const * arr_of_ptrs[42]; // array of 42 pointers to
           | const ints
           | 
           | - int fun_returning_array_of_ints()[42];
           | 
           | Try it out yourself: https://cdecl.org/
           | 
           | Hence, I am an "East conster". (Many people are "West
           | consters" though.)
           | 
           | You can return function pointers:
           | 
           | typedef struct player_t player_t; // let it be opaque ;)
           | 
           | int game_strategy1(player_t const * const p)
           | 
           | {                   /* Eliminate player */
           | return 666;
           | 
           | }
           | 
           | int game_strategy2(player_t const * const p)
           | 
           | {                   /* Follow player */
           | return 007;
           | 
           | }
           | 
           | int (* const game_strategy(int const
           | strategy_to_use))(player_t const * const p)
           | 
           | {                   if (strategy_to_use == 0)
           | return &game_strategy1;              return &game_strategy2;
           | 
           | }
           | 
           | Functional programming = immutable (const) values + pure
           | functions (no side effects).
           | 
           | Consting for me is also a form of
           | documentation/specification.
           | 
           | "East const" for life! :)
        
             | harry8 wrote:
             | 10 or 42?
        
               | danielozcpp wrote:
               | Thank you. 42. I edited my comment above.
        
         | bachmeier wrote:
         | > nobody really does that, because that would be kind of insane
         | 
         | Yeah. That's why people don't do things in C. It's more like
         | most C programmers probably weren't aware of this. After your
         | comment, we'll start to see C codebases everywhere with that.
        
           | dahfizz wrote:
           | I learned of this in college. Most C programmers know how C
           | arrays work.
        
           | LeifCarrotson wrote:
           | Most _programmers_ probably weren 't aware of this, but no
           | true Scottish C programmer would be unaware of it.
           | 
           | If you want to manipulate memory directly - which is risky
           | though sometimes useful - C is one of the best languages in
           | which to do it. Memory addresses are numbers, and C will let
           | you work with those numbers in whatever way you want: add,
           | subtract, multiply, divide... and if you didn't shudder at
           | the suggestion of dividing a pointer because there are very,
           | very few reasons to do so then C is not the language for you!
           | 
           | If you don't want to manipulate memory directly, you probably
           | shouldn't be using C; stick with a nice garbage-collected,
           | type-safe, object-oriented, cross-platform language. If you
           | do want to manipulate memory directly, but you want more
           | guarantees on what you can do with pointers, try Rust.
        
             | [deleted]
        
             | bluejekyll wrote:
             | Just wanted to clarify that Rust allows for the same
             | manipulation as C, it just all has to happen in the context
             | of an unsafe code block. I think your comment might be
             | taken to imply that Rust isn't as capable as C in that
             | regard.
        
           | junon wrote:
           | Eh? Says who? Every C programmer I've ever met knows about
           | this. It's basic C.
        
             | jart wrote:
             | Yeah you have to do something like this if you want to
             | truly raise eyebrows.                   /\         */ best
             | c comment         *\         /
        
               | Zababa wrote:
               | One trick that I like is replacing { and } with <% and
               | %>.
        
               | junon wrote:
               | Wow, a new C trick I didn't know about. What's the
               | history here?
               | 
               | Also, gross.
        
           | colejohnson66 wrote:
           | This "trick" has been know at least as far back as 2008:
           | https://stackoverflow.com/questions/381542/with-arrays-
           | why-i...
        
             | tialaramex wrote:
             | It's actually spelled out in K&R in more technical
             | language:
             | 
             | "The array subscripting operation is defined so that E1[E2]
             | is identical to *(E1+E2). Therefore, despite its
             | asymmetrical appearance, subscripting is a _commutative
             | operation_. "
             | 
             | (my emphasis)
        
             | lifthrasiir wrote:
             | One of the very first IOCCC winners used the trick in 1984
             | (1984/anonymous):                   int
             | i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
             | o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---
             | j,i/i);}
        
               | jakeva wrote:
               | what does this program do? i compiled it and ran it and
               | got no output. i hope i haven't fork bombed myself or
               | something
        
               | lifthrasiir wrote:
               | Prints "hello, world!", assuming sizeof(int) ==
               | sizeof(char*) (and the same alignments and ABI, etc).
        
               | jfrunyon wrote:
               | I think this is equivalent to:                   int i;
               | int main(){             for(                 ;
               | i["]<i;++i){--i;}"]; // loop until i == 14? (the NUL byte
               | on the end)                 read('-'-'-',i+++"hello,
               | world!\n",'/'/'/') // read(0, <one byte of "hello,
               | world!\n" at a time>, 1)             ) {};         }
               | int read(j,i,p){              write(j/p+p,i---j,i/i); //
               | write(0/1+1, i-- - 0, 1) --> write(1, i, 1) --> write a
               | byte to STDOUT         }
        
         | travelbuffoon wrote:
         | Speaking of such funny business: unfortunately I have seen
         | *(array + i) quite a few times.
         | 
         | To make it worse, it's in a parser for external data in binary
         | format, where you really shouldn't be playing funny tricks.
        
         | a3n wrote:
         | This sub-thread is better than a Vim thread. :)
        
         | lebuffon wrote:
         | Interesting. Looks like Forth. :-)
        
       | aidenn0 wrote:
       | Some nitpicks with the article:
       | 
       | 1. Much of section 2 is wrong except the part about arrays
       | representing contiguous objects. The rest is largely an
       | implementation detail.
       | 
       | Zeta C and Vacietis work considerably differently, as allowed by
       | the standard
       | 
       | In addition there are many (mostly obsolete now) architectures in
       | which, when you convert a pointer to an integer, you can't
       | perform arithmetic and convert back because a pointer isn't just
       | an integer address; it could represent segments or support
       | hardware tags.
       | 
       | > C will typically let you just construct any pointer value you
       | like by casting an integer to a pointer type, or by taking an
       | existing pointer to one type and casting it so that it becomes a
       | pointer to an entirely different type.
       | 
       | To be fair, they do say "typically" in here, but these behaviors
       | are (depending on the case) all either implementation defined or
       | undefined; the C standard specifies a union as the only well-
       | defined way to type-pun to non character types.
       | 
       | > The undefined-behaviour problem with integer overflow wouldn't
       | happen in machine code; that's a consequence of C needing to run
       | fast on lots of very different kinds of computer, which is a
       | problem machine code doesn't even try to solve
       | 
       | Some architectures trap on integer overflow, which I suspect is
       | the reason why integer overflow is undefined rather than
       | implementation defined. Certainly compilers _today_ take
       | advantage of the fact that it is undefined to make certain useful
       | optimizations, but from what I can tell of the history that 's
       | not why it was undefined in the first place.
        
         | junon wrote:
         | To be clear, _signed_ integer overflow is undefined. _Unsigned_
         | integer overflow is well defined.
         | 
         | This is why some C programmers dictate that all code must use
         | signed integers to avoid unexpected bugs, but many others
         | (including myself) disagree that's a good way of going about it
         | since, as you said, it's not guaranteed to trap or do anything
         | to help the programmer.
        
           | aidenn0 wrote:
           | I've never seen signed-integers being required. They tend to
           | introduce unexpected bugs rather than prevent it. Of course
           | you can accidentally end up with signed integers when you
           | didn't intend it. Here's my favorite accidental undefined
           | signed integer overflow, assuming 32-bit integer size (the
           | 64-bit version is similar)                 uint32_t
           | foo(uint8_t x) { return x << 24; }
           | 
           | Yes, this is signed integer overflow, since x gets upgraded
           | to a _signed_ integer before the shift, if an integer is
           | 32-bits in size, this can result in undefined behavior if the
           | top bit of X is set. Fortunately I 've never seen a compiler
           | optimize this to stupidity.
        
             | junon wrote:
             | If memory serves, the LLVM Project prescribes signed
             | integers be used in all cases except where unsigned is
             | mandatory.
        
       | Animats wrote:
       | It's not inherent in C being a low-level language that it has
       | such a painful memory model. It's a consequence of having to fit
       | the compiler into a really tiny machine by modern standards.
       | Originally, there were no function prototypes.
       | 
       | I once proposed a backwards-compatible way out for C.[1]
       | Basically, you get to talk about arrays as objects with a length,
       | even if that length is in some other variable. And you get slices
       | and references. It was discussed enough to establish that it
       | could work, but the effort to push it forward wasn't worth it.
       | 
       | Slices are important. They let you do most of the things people
       | do with pointer arithmetic. Once you have size and slices, you
       | can still operate near the memory address level.
       | 
       | [1] http://www.animats.com/papers/languages/safearraysforc43.pdf
        
       | AceJohnny2 wrote:
       | > _So why is C like this, anyway?_
       | 
       | Worth mentioning that C is over 40 years old, and was designed to
       | be easily portable across a range of machines that had less
       | compute power and memory than today's smaller microcontrollers.
       | 
       | As a result, a lot of things were left undefined, or were
       | designed in a way to be _easy to implement_ rather than _easy to
       | program for_.
       | 
       | There existed other programming languages that were better, but
       | their compilers weren't as broadly available, and their better
       | features came at the cost of speed, which at the time was a
       | premium.
        
         | fsckboy wrote:
         | > left undefined, or were designed in a way to be easy to
         | implement rather than easy to program for.
         | 
         | I'd tweak your statement a little, or even a lot: "left
         | undefined" most often meant "left to be defined by the compiler
         | writers to fit the architecture of the underlying hardware, in
         | a way that would make it easy to program to beneficially
         | exploit features of the architecture"; and (yes) in a way "that
         | would not be very portable, and might even be subject to change
         | between compilers".
         | 
         | did the underlying machine use 2's complement? did the
         | underlying machine have addressable bytes? big endian? 8, 16,
         | 32 or 36 bits?
         | 
         | These are all things you need to know to write tight efficient
         | code in the days of slow clockspeeds and limited RAM. C let you
         | do that without using assembly, but by using the "undefined"
         | features of the language, because they were clearly defined
         | locally and were features that were very important to be easy
         | to write code for.
         | 
         | consider how you would implement setjump and longjump, or even
         | printf, or efficiently unpack or serialize bits for a
         | communications protocol, without these supposedly "undefined
         | features", or how you would write those if those features were
         | actually undefined. People who put strlen(str) or a divide and
         | a mod in the control expression for a loop would know better if
         | they understood a bit more about the undefined features.
         | 
         | this is in contrast btw with some other things that actually
         | are undefined, such as what the order of evaluation would be
         | for complex expressions making up argument lists, etc.
         | 
         | I'm writing this explanation not so much to explain these
         | technical details to noobs, but rather to get the people who
         | understand this stuff to stop throwing around the term
         | "undefined" with regard to C because they are cooperating in
         | the evisceration of some ideas that are really worth exploring
         | or understanding more deeply.
        
       ___________________________________________________________________
       (page generated 2021-09-03 23:02 UTC)