[HN Gopher] The Descent to C (2013)
___________________________________________________________________
The Descent to C (2013)
Author : x-sp
Score : 111 points
Date : 2021-09-01 11:51 UTC (2 days ago)
(HTM) web link (www.chiark.greenend.org.uk)
(TXT) w3m dump (www.chiark.greenend.org.uk)
| 1vuio0pswjnm7 wrote:
| "The Python interpreter is written in C, for example."
|
| https://github.com/RustPython/RustPython
| dang wrote:
| Some past threads:
|
| _The Descent to C_ -
| https://news.ycombinator.com/item?id=15445059 - Oct 2017 (2
| comments)
|
| _The Descent to C_ -
| https://news.ycombinator.com/item?id=8127499 - Aug 2014 (15
| comments)
|
| _The descent to C_ -
| https://news.ycombinator.com/item?id=7134798 - Jan 2014 (230
| comments)
| nihilist_t21 wrote:
| Hey! The author wrote PuTTY!
|
| That was a very informative read. I feel like I am the exact
| target audience: I'm coming from a programing background steeped
| in C# and I'm learning C from the K&R book.
| tempodox wrote:
| There is also the excellent "Modern C" book:
|
| https://modernc.gforge.inria.fr
|
| The page has a link to a free PDF version.
| besnn00 wrote:
| Never thought pointers could expire, but it makes sense.
| jandrese wrote:
| That's a fairly common trap for new C coders. Setting a
| pointer to something on the stack and then returning that
| pointer when you exit the function. It's extra insidious
| because it will appear to be working fine until you call
| another function and then try to dereference the pointer.
| Then your data suddenly becomes corrupt even though your
| program was nowhere near it.
| wizzwizz4 wrote:
| This is (sort of) the idea behind Rust's lifetimes. (Rust
| focuses more on scope, whereas pointer expiry is more about
| "object lifetimes", but it's basically the same thing.)
| tialaramex wrote:
| I don't think that's true, it's just that the lifetime of a
| stack object is limited by its scope. Clearly when it goes
| out of scope its lifetime needs to end. You can end that
| lifetime prematurely, for example if you drop(x) then the
| lifetime of x ends immediately [ in fact the implementation
| of drop is entirely empty, but it takes the actual x itself
| as a parameter, not a reference to it, and so when the
| empty function exits the parameter's lifetime ends and x
| goes away ] even though it hasn't gone out of scope in your
| function where you created x.
|
| I believe once upon a time Rust needed a lot more hand-
| holding rather than just inferring lifetimes from scope,
| but in modern Rust the behaviour feels pretty natural for
| scopes.
| _kst_ wrote:
| Scope and lifetime are two different things, but they're
| related in the case of automatic (local) variables.
|
| _Scope_ is the region of program text in which an
| identifier is visible.
|
| _Lifetime_ is the duration during program execution in
| which an object (stored in memory) exists.
|
| If you define a local variable, the scope of its
| identifier extends from its declaration to the end of the
| enclosing block; its lifetime is the execution of the
| enclosing block.
|
| If you allocate an object on the heap by calling
| `malloc()`, its lifetime ends when its deallocated by
| calling `free()`.
|
| An object defined at file scope or with the `static`
| keyword has a lifetime that's the entire execution of the
| program.
|
| In either case, if you have a pointer to the object, that
| pointer becomes invalid when the object it points to
| reaches the end of its lifetime. (And C doesn't make it
| particularly difficult to cause problems by trying to
| access an object that no longer exists.)
| steveklabnik wrote:
| Rust used to use lexical scope, but now uses scope based
| on the control flow graph.
| markhahn wrote:
| Surprise: there's a whole bunch of different reality under your
| the convenient illusions of your "high level" programming
| language.
|
| Hardware is the reality. It's not very much like the Java or
| Python programming model. We shouldn't hide this from
| programmers.
| adamrezich wrote:
| great article, very informative and easy to read. I wish I
| would've known about this when I was learning C/C++ (around the
| time it was published no less), coming from higher-level
| languages like C# and PHP.
| jimbob45 wrote:
| How is C# higher-level than C? I'm not aware of anything you
| can do in C that you can't do in C# as a first-class feature.
|
| Edit: I define the level of a language as the lowest-possible
| feature. I guess others define level as the highest-possible
| feature? I don't really know who's right here.
| Koshkin wrote:
| In the same way as C is a higher-level language compared to
| assembler.
| jonsen wrote:
| "...a high-level programming language is a programming
| language with strong abstraction from the details of the
| computer.":
|
| https://en.m.wikipedia.org/wiki/High-
| level_programming_langu...
| dahfizz wrote:
| C# has exceptions, OO classes / interfaces, `unsafe`, etc.
| Lots of features that make it a higher level language than
| pure C.
| globular-toast wrote:
| > No object orientation
|
| I feel like C exists at a level below such concepts. Simply being
| able to define a function `void do_stuff(struct mystruct *obj)`
| opens the door to object-oriented style programming. A lot of
| people seem to define OOP by the presence of superficial stuff
| like inheritance, polymorphism etc, but really those are
| additional concepts that aren't useful for every program. The
| real difference is mutating state on the heap. So you could say C
| _is_ an object-oriented language, by default, because it doesn 't
| stop you doing this stuff, unlike a higher-level language like
| Clojure which simply doesn't have mutation (for the most part).
| Or you could say C is a functional language because if you don't
| explicitly pass pointers then you get copies. Really it's both
| and it's neither. It's whatever you want it to be.
| PEJOE wrote:
| That you can do functional or OOP in C does not make C either
| kind of language, it just means that C is flexible enough that
| you can make the computer do things the way you want it to, no
| matter what that means, and other languages purposefully
| prevent you from doing what you might want to do.
|
| C++ is object oriented not because it has compile time support
| for polymorphism or any of that other bad programming practice,
| but because classes have code sections that live with them,
| whether on the stack or in the heap, that can operate only on
| memory belonging to that instance of the class.
|
| Object oriented programming is a coding style and choice. Some
| languages make it a first class part of the language design. It
| is purposefully not part of C.
|
| However you can do OOP like things in C: a popular paradigm is
| to pass around pointers to structs that (should) live in the
| heap, and to have a number of functions which work on these
| structs. This is very similar in practice and mental modelling
| to OOP as users of C++ might know it, but is distinct in that
| no code ever lives in the stack or heap, and no code is
| restricted from operating on any of the program memory.
| AnimalMuppet wrote:
| I mostly agree with you. But "or any of that other bad
| programming practice"? Polymorphism is not a bad programming
| practice. Yes, it can be misused. No, that doesn't make it
| bad in and of itself.
| tialaramex wrote:
| Hmm. In what sense do you believe that class has a code
| section that "lives" on the stack or heap?
|
| On a modern system you can't usually do that because of W^X
| rules (also on a non-x86 modern system the performance would
| be _abysmal_ if you tried because why waste transistors
| supporting something only crazy people would want?)
|
| So perhaps _notionally_ in the abstract machine if I have
| sixteen Clowns in a C++ vector there are sixteen copies of
| the Clown method squirt_water_at() in the vector too, but I
| assure you all the compiler emits is one copy of
| squirt_water_at() for Clowns, to the text segment with the
| rest of the program code, and _maybe_ if Clowns are virtual,
| a pointer to a table of such functions lives with each Clown
| just in case there are Jugglers and LionTamers in the vector
| too - although compilers can sometimes figure out a rationale
| for not bothering.
| kaba0 wrote:
| Regardinf W^X, doesn't the Linux kernel has some optional
| expensive debug operation that can be turned on/off through
| a self-modifying code removing the expensive branching?
| AnimalMuppet wrote:
| No, mutating stuff on the heap is _not_ the real definition of
| OOP. That 's the definition of "mutable" programming, which is
| not a term that we use a lot, but it obviously is the opposite
| of "immutable" programming, which is where you _can 't_ change
| stuff on the heap.
| DannyB2 wrote:
| The following is NOT a criticism of C. Just pointing out
| different problem domains.
|
| > Modern high-level languages generally try to arrange that you
| don't need to think
|
| > or even know - about how the memory in a computer is actually
| organised
|
| Modern high-level languages try to arrange that you don't need to
| focus on the irrelevant. If you're working on, say, an accounting
| system, memory layout is not part of the problem you are trying
| to solve.
|
| For certain applications, C is simply too low level.
|
| A language is too low level when it forces you to focus on the
| irrelevant.
|
| For low level operations you probably cannot beat C.
| kaba0 wrote:
| As per the somewhat famous blog post: C is not a low-level
| language. Especially on todays CPU's I fail to see why would we
| consider C anything close to truly low level. It has no real
| way of managing cache, has absolutely zero support for vector
| instructions, etc. *
|
| With these in mind, Rust is lower and higher level than C at
| the same time.
|
| * other than some compiler specific pragmas, but I would be
| hesitant to call that natively supported
| munificent wrote:
| _> Especially on todays CPU's I fail to see why would we
| consider C anything close to truly low level._
|
| It's effectively the lowest you can go _without throwing
| portability out the window._ It doesn 't let you manage
| caches directly, but it gives you good control over memory
| layout in general, and that's often enough to give you good
| cache usage across a variety of chips.
|
| If you want to go lower than that, you're probably looking
| for assembly.
| tialaramex wrote:
| It's a poor fit for this role, even if it is in practice
| what we have available today.
|
| One of my favourite examples is volatile. Volatile is
| _more_ crazy in C++ but it 's pretty crazy even in C. In
| both cases the standard basically shrugs, "Hope you know
| what you're doing because we sure don't" and offers no real
| insight into what this feature promises to do for you. But
| there is no other mechanism provided for MMIO.
|
| Think of Rust's std::ptr::read_volatile (and
| write_volatile). These intrinsics do the thing you actually
| wanted (reading, or writing, a fixed size blob of "memory"
| that presumably wasn't really just RAM) and thus are
| important for writing device drivers and so on with MMIO.
|
| [ You may be thinking, "But I need the _correct_ size of
| blob read or written or my driver won 't work", Rust has
| generics, so these functions are generic over the integer
| type you're reading/ writing, if you read a u64 that's a
| 64-bit read, if you write four u8s that 4 x 8-bit writes
| and so on ]
|
| But C's volatile is a type qualifier instead. Why? Would it
| mean anything to, for example, integer divide an MMIO fetch
| by fifteen and write it back? a/= 15; No. So then why make
| it a type qualifier? When volatile was added to C they'd
| only just invented simple optimisations like re-ordering so
| they had no idea this was a bad idea, and it seemed simpler
| than adding an intrinsic (though not by much) but today we
| know better.
| smcameron wrote:
| > (In fact, that's what array[i] means - the language defines it
| to be a synonym for *(array+i).)
|
| To really drive home the primitiveness of C arrays, should
| probably also mention that, because addition is commutative, you
| _could_ also write i[array]
|
| and somewhat surprisingly, it will compile, and work, and it
| means "*(i + array)" which is equivalent to "*(array + i)"
|
| But nobody really does that, because that would be kind of
| insane.
| lifthrasiir wrote:
| If array were, say, uint32_t* then what `*(array + i)` would do
| is actually `(intptr_t)array + i * 4` and not `(intptr_t)array
| + i`. If array were uint16_t* then it's `(intptr_t)array + i *
| 2`. In short the way the pointer arithmetic gets translated
| greatly depends on the type of the pointee and thus is not as
| primitive as it can be.
| kzrdude wrote:
| I think it depends on the coder if it makes more sense to
| them to write out the scale by element size or not. For me,
| the `+` is just pointer arithmetic and of course an addition
| in number of elements (so I don't think of the scaling at
| all).
|
| Just saying that "actually it's just array + i" makes more
| sense - for me(!).
| Zababa wrote:
| This is actually a really good way to drive home that arrays
| don't really "exists" in C and are just syntaxic sugar for
| pointer arithmetic.
| unwind wrote:
| Once the point is home, you can drive it a little bit more by
| exploiting the fact that string literals can be converted to
| pointers to the first characters, and do
| putc(2["ABCDEF"], stdout);
|
| This prints 'C'.
| [deleted]
| [deleted]
| danielozcpp wrote:
| int const* const x; // C
|
| int const& x; // C++
|
| A reference is functionally equivalent to a const pointer.
| (Reference reassignment is disallowed. Likewise, you cannot
| reassign a const pointer. A const pointer is meant to keep
| its pointee [address].) The difference between them is that
| C++ const references also allow non-lvalue arguments
| (temporaries).
|
| It is much easier to read from right to left when decoding
| types. Look for yourself:
|
| - double (* const convert_to_deg)(double const x) // const
| pointer to function taking a const double and returning
| double
|
| - int const (* ptr_to_arr)[42]; // pointer to array of 42
| const ints
|
| - int const * arr_of_ptrs[42]; // array of 42 pointers to
| const ints
|
| - int fun_returning_array_of_ints()[42];
|
| Try it out yourself: https://cdecl.org/
|
| Hence, I am an "East conster". (Many people are "West
| consters" though.)
|
| You can return function pointers:
|
| typedef struct player_t player_t; // let it be opaque ;)
|
| int game_strategy1(player_t const * const p)
|
| { /* Eliminate player */
| return 666;
|
| }
|
| int game_strategy2(player_t const * const p)
|
| { /* Follow player */
| return 007;
|
| }
|
| int (* const game_strategy(int const
| strategy_to_use))(player_t const * const p)
|
| { if (strategy_to_use == 0)
| return &game_strategy1; return &game_strategy2;
|
| }
|
| Functional programming = immutable (const) values + pure
| functions (no side effects).
|
| Consting for me is also a form of
| documentation/specification.
|
| "East const" for life! :)
| harry8 wrote:
| 10 or 42?
| danielozcpp wrote:
| Thank you. 42. I edited my comment above.
| bachmeier wrote:
| > nobody really does that, because that would be kind of insane
|
| Yeah. That's why people don't do things in C. It's more like
| most C programmers probably weren't aware of this. After your
| comment, we'll start to see C codebases everywhere with that.
| dahfizz wrote:
| I learned of this in college. Most C programmers know how C
| arrays work.
| LeifCarrotson wrote:
| Most _programmers_ probably weren 't aware of this, but no
| true Scottish C programmer would be unaware of it.
|
| If you want to manipulate memory directly - which is risky
| though sometimes useful - C is one of the best languages in
| which to do it. Memory addresses are numbers, and C will let
| you work with those numbers in whatever way you want: add,
| subtract, multiply, divide... and if you didn't shudder at
| the suggestion of dividing a pointer because there are very,
| very few reasons to do so then C is not the language for you!
|
| If you don't want to manipulate memory directly, you probably
| shouldn't be using C; stick with a nice garbage-collected,
| type-safe, object-oriented, cross-platform language. If you
| do want to manipulate memory directly, but you want more
| guarantees on what you can do with pointers, try Rust.
| [deleted]
| bluejekyll wrote:
| Just wanted to clarify that Rust allows for the same
| manipulation as C, it just all has to happen in the context
| of an unsafe code block. I think your comment might be
| taken to imply that Rust isn't as capable as C in that
| regard.
| junon wrote:
| Eh? Says who? Every C programmer I've ever met knows about
| this. It's basic C.
| jart wrote:
| Yeah you have to do something like this if you want to
| truly raise eyebrows. /\ */ best
| c comment *\ /
| Zababa wrote:
| One trick that I like is replacing { and } with <% and
| %>.
| junon wrote:
| Wow, a new C trick I didn't know about. What's the
| history here?
|
| Also, gross.
| colejohnson66 wrote:
| This "trick" has been know at least as far back as 2008:
| https://stackoverflow.com/questions/381542/with-arrays-
| why-i...
| tialaramex wrote:
| It's actually spelled out in K&R in more technical
| language:
|
| "The array subscripting operation is defined so that E1[E2]
| is identical to *(E1+E2). Therefore, despite its
| asymmetrical appearance, subscripting is a _commutative
| operation_. "
|
| (my emphasis)
| lifthrasiir wrote:
| One of the very first IOCCC winners used the trick in 1984
| (1984/anonymous): int
| i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
| o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---
| j,i/i);}
| jakeva wrote:
| what does this program do? i compiled it and ran it and
| got no output. i hope i haven't fork bombed myself or
| something
| lifthrasiir wrote:
| Prints "hello, world!", assuming sizeof(int) ==
| sizeof(char*) (and the same alignments and ABI, etc).
| jfrunyon wrote:
| I think this is equivalent to: int i;
| int main(){ for( ;
| i["]<i;++i){--i;}"]; // loop until i == 14? (the NUL byte
| on the end) read('-'-'-',i+++"hello,
| world!\n",'/'/'/') // read(0, <one byte of "hello,
| world!\n" at a time>, 1) ) {}; }
| int read(j,i,p){ write(j/p+p,i---j,i/i); //
| write(0/1+1, i-- - 0, 1) --> write(1, i, 1) --> write a
| byte to STDOUT }
| travelbuffoon wrote:
| Speaking of such funny business: unfortunately I have seen
| *(array + i) quite a few times.
|
| To make it worse, it's in a parser for external data in binary
| format, where you really shouldn't be playing funny tricks.
| a3n wrote:
| This sub-thread is better than a Vim thread. :)
| lebuffon wrote:
| Interesting. Looks like Forth. :-)
| aidenn0 wrote:
| Some nitpicks with the article:
|
| 1. Much of section 2 is wrong except the part about arrays
| representing contiguous objects. The rest is largely an
| implementation detail.
|
| Zeta C and Vacietis work considerably differently, as allowed by
| the standard
|
| In addition there are many (mostly obsolete now) architectures in
| which, when you convert a pointer to an integer, you can't
| perform arithmetic and convert back because a pointer isn't just
| an integer address; it could represent segments or support
| hardware tags.
|
| > C will typically let you just construct any pointer value you
| like by casting an integer to a pointer type, or by taking an
| existing pointer to one type and casting it so that it becomes a
| pointer to an entirely different type.
|
| To be fair, they do say "typically" in here, but these behaviors
| are (depending on the case) all either implementation defined or
| undefined; the C standard specifies a union as the only well-
| defined way to type-pun to non character types.
|
| > The undefined-behaviour problem with integer overflow wouldn't
| happen in machine code; that's a consequence of C needing to run
| fast on lots of very different kinds of computer, which is a
| problem machine code doesn't even try to solve
|
| Some architectures trap on integer overflow, which I suspect is
| the reason why integer overflow is undefined rather than
| implementation defined. Certainly compilers _today_ take
| advantage of the fact that it is undefined to make certain useful
| optimizations, but from what I can tell of the history that 's
| not why it was undefined in the first place.
| junon wrote:
| To be clear, _signed_ integer overflow is undefined. _Unsigned_
| integer overflow is well defined.
|
| This is why some C programmers dictate that all code must use
| signed integers to avoid unexpected bugs, but many others
| (including myself) disagree that's a good way of going about it
| since, as you said, it's not guaranteed to trap or do anything
| to help the programmer.
| aidenn0 wrote:
| I've never seen signed-integers being required. They tend to
| introduce unexpected bugs rather than prevent it. Of course
| you can accidentally end up with signed integers when you
| didn't intend it. Here's my favorite accidental undefined
| signed integer overflow, assuming 32-bit integer size (the
| 64-bit version is similar) uint32_t
| foo(uint8_t x) { return x << 24; }
|
| Yes, this is signed integer overflow, since x gets upgraded
| to a _signed_ integer before the shift, if an integer is
| 32-bits in size, this can result in undefined behavior if the
| top bit of X is set. Fortunately I 've never seen a compiler
| optimize this to stupidity.
| junon wrote:
| If memory serves, the LLVM Project prescribes signed
| integers be used in all cases except where unsigned is
| mandatory.
| Animats wrote:
| It's not inherent in C being a low-level language that it has
| such a painful memory model. It's a consequence of having to fit
| the compiler into a really tiny machine by modern standards.
| Originally, there were no function prototypes.
|
| I once proposed a backwards-compatible way out for C.[1]
| Basically, you get to talk about arrays as objects with a length,
| even if that length is in some other variable. And you get slices
| and references. It was discussed enough to establish that it
| could work, but the effort to push it forward wasn't worth it.
|
| Slices are important. They let you do most of the things people
| do with pointer arithmetic. Once you have size and slices, you
| can still operate near the memory address level.
|
| [1] http://www.animats.com/papers/languages/safearraysforc43.pdf
| AceJohnny2 wrote:
| > _So why is C like this, anyway?_
|
| Worth mentioning that C is over 40 years old, and was designed to
| be easily portable across a range of machines that had less
| compute power and memory than today's smaller microcontrollers.
|
| As a result, a lot of things were left undefined, or were
| designed in a way to be _easy to implement_ rather than _easy to
| program for_.
|
| There existed other programming languages that were better, but
| their compilers weren't as broadly available, and their better
| features came at the cost of speed, which at the time was a
| premium.
| fsckboy wrote:
| > left undefined, or were designed in a way to be easy to
| implement rather than easy to program for.
|
| I'd tweak your statement a little, or even a lot: "left
| undefined" most often meant "left to be defined by the compiler
| writers to fit the architecture of the underlying hardware, in
| a way that would make it easy to program to beneficially
| exploit features of the architecture"; and (yes) in a way "that
| would not be very portable, and might even be subject to change
| between compilers".
|
| did the underlying machine use 2's complement? did the
| underlying machine have addressable bytes? big endian? 8, 16,
| 32 or 36 bits?
|
| These are all things you need to know to write tight efficient
| code in the days of slow clockspeeds and limited RAM. C let you
| do that without using assembly, but by using the "undefined"
| features of the language, because they were clearly defined
| locally and were features that were very important to be easy
| to write code for.
|
| consider how you would implement setjump and longjump, or even
| printf, or efficiently unpack or serialize bits for a
| communications protocol, without these supposedly "undefined
| features", or how you would write those if those features were
| actually undefined. People who put strlen(str) or a divide and
| a mod in the control expression for a loop would know better if
| they understood a bit more about the undefined features.
|
| this is in contrast btw with some other things that actually
| are undefined, such as what the order of evaluation would be
| for complex expressions making up argument lists, etc.
|
| I'm writing this explanation not so much to explain these
| technical details to noobs, but rather to get the people who
| understand this stuff to stop throwing around the term
| "undefined" with regard to C because they are cooperating in
| the evisceration of some ideas that are really worth exploring
| or understanding more deeply.
___________________________________________________________________
(page generated 2021-09-03 23:02 UTC)