[HN Gopher] C Portability Lessons from Weird Machines
___________________________________________________________________
C Portability Lessons from Weird Machines
Author : rsecora
Score : 102 points
Date : 2022-02-21 16:45 UTC (6 hours ago)
(HTM) web link (begriffs.com)
(TXT) w3m dump (begriffs.com)
| zokier wrote:
| I don't think there is anything wrong in writing platform-
| specific code; in certain circles there is this weird
| fetishitization of portability, placing it on the highest
| pedestal as a metric of quality. This happens in C programming
| and also for example in shell scripting, people advocating for
| relying only on POSIX-defined behavior. If a platform specific
| way of doing something works better in some use-case then there
| should be no shame in using that. What is important is that the
| code relies on well-defined behavior, and also that the platform
| assumptions/requirements are documented to a degree.
|
| Of course it is wonderful that you can make programs that are
| indeed portable between this huge range of computers; just that
| not every program needs to do so.
| Filligree wrote:
| > Of course it is wonderful that you can make programs that are
| indeed portable between this huge range of computers; just that
| not every program needs to do so.
|
| Isn't most code that would behave differently on different
| architectures subject to undefined behaviour, however? The
| signed overflow case mentioned, for example.
|
| Sure, some of it is implementation-defined, but in practice you
| need to write ultra-portable code anyway in order for your
| compiler not to pull the rug out underneath you.
| LeifCarrotson wrote:
| > If a platform specific way of doing something works better in
| some use-case then there should be no shame in using that.
|
| I fully agree, but the real problem is not the limitation to
| platform-specific logic but the conflation of fundamental
| requirements and incidental requirements. There's no good way
| to know when you meant "int" as just a counter of a user's
| files, or "int" as in a 32-bit bitmask, or "int" as the value
| of a pointer. For the former, it probably doesn't matter if
| someone later compiles it for a different architecture, for the
| latter, if you mean int32_t or uintptr_t, use that!
| ithinkso wrote:
| I love how the concept of 'platform independent' evolved - you
| would think that it means you can run it anywhere but almost
| all software that uses 'platform-independent' code is very
| platform-dependent
|
| Because if you make an Electron app it's only logical that
| because it is platform-independent it can only be run on macOS
| xemdetia wrote:
| I would say most _modern_ code is not the kind of code written
| with the absurd platform assumptions that old code used to do.
| There 's not magic addresses you have to know to talk to
| hardware, there's no implicit memory segmentation/memory
| looping, and so on. Ever since most modern OSs try to prevent
| direct access or randomize address spaces and so on it just is
| hard to write code that is insane like the way we were.
|
| So the question is are you contesting the POSIX defined
| behaviour over using more logical interfaces from the OS or the
| wild west where people were hacking around broken, platform-
| specific features and often broken or awkward system libraries
| or just the more modern case where people use the standard
| interface instead of a more performant one. In the latter case
| I agree I wish there was a more 'nice' way of doing more
| dynamic and efficient feature detection without making simple C
| programs crazy complex.
| Someone wrote:
| I think it also helps that modern hardware is a lot less
| diverse. Most of the tools only run on systems that are
| little endian, where _NULL_ is zero, chars are 8 bits, ints
| are two's complement and silently wrap around, floats are 32
| bits IEEE 754, etc, so code that erroneously assumes those to
| be true in theory isn't portable, but in practice is.
|
| And newer C standards code may even unbreak such code. Ints
| already are two's complement in the latest version of C++ and
| will be in the next version of C, for example.
| thesuperbigfrog wrote:
| >> in certain circles there is this weird fetishitization of
| portability, placing it on the highest pedestal as a metric of
| quality.
|
| It's not a fetish if you have ever ported legacy code that was
| not written with potability in mind.
|
| "ints are always 32-bit and can be used to store pointers with
| a simple cast" may have worked when the legacy program was
| written, but it sure makes porting it a pain.
| nyanpasu64 wrote:
| And `unsigned long` can store pointers just fine on Linux 32,
| Linux 64, Windows x86-32, and MSYS2 64... but not Windows
| x86-64. https://github.com/cc65/cc65/issues/1680#issuecomment
| -104641...
| bombcar wrote:
| C (and to some extent shell) programmers are the ones with the
| most experience of the machine under them changing, perhaps
| drastically - few other languages have even been around long
| enough for that to have happened.
|
| Java sidesteps this, of course, by defining a JVM to run on and
| leaving the underlying details to the implementation.
| rwmj wrote:
| C23 just dropped support for any non-twos-complement
| architectures. No more C on Unisys for you! http://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
| eqvinox wrote:
| That doesn't preclude C23 on Unisys, it just forces the
| compiler to hide that fact from the programmer ;D
|
| (SCNR)
| titzer wrote:
| It's amazing the abilities that 50 years can bring a
| programming language. Longest, most painful design debate ever.
| viddi wrote:
| Haven't read the article yet, but I have noticed that the tab
| keeps loading even after 10 minutes. Aborting the loading process
| leads to broken media.
|
| I am no expert in HTML video delivery and haven't tried it out,
| but maybe setting the preload attribute to "none" or "metadata"
| might help?
| PhantomGremlin wrote:
| _the MIPS R3000 processor ... raises an exception for signed
| integer overflow, unlike many other processors which silently
| wrap to negative values._
|
| Too bad programmer laziness won and most current hardware doesn't
| support this.
|
| As a teenager I remember getting hit by this all the time in
| assembly language programming for the IBM S/360. (FORTRAN turned
| it off). S0C8 Fixed-point overflow exception
|
| When you're a kid you just do things quickly. This was the
| machine's way of slapping you upside your head and saying: "are
| you sure about that?"
| laumars wrote:
| > When you're a kid you just do things quickly.
|
| I don't think this is a age problem. Plenty of adults are lazy
| and plenty of kids aren't.
| masklinn wrote:
| > Too bad programmer laziness won and most current hardware
| doesn't support this.
|
| There were discussions around this a few years back when Regher
| brought up the subject, and one of the issues commonly brought
| up is if you want to handle (or force handling of) overflow,
| traps are pretty shit, because it means you have to update the
| trap handler _before each instruction which can overflow_ ,
| because a global interrupt handler won't help you as it will
| just be a slower overflow flag (at which point you might as
| well just use an overflow flag). Traps are fine if you can set
| up a single trap handler then go through the entire program,
| but that's not how high-level languages deal with these issues.
|
| 32b x86 had INTO, and compilers didn't bother using it.
| flohofwoe wrote:
| Modulo wraparound is just as much a feature in some situations
| as it is a bug in others. And signed vs unsigned are just
| different views on the same bag of bits (assuming two's
| complement numbers), most operations on two's complement
| numbers are 'sign agnostic' - I guess from a hardware
| designer's pov, that's the whole point :)
|
| The question is rather: was it really a good idea to bake
| 'signedness' into the type system? ;)
| pornel wrote:
| That's why Rust has separate operations for wrapping and non-
| wrapping arithmetic. When wrapping matters (e.g. you're
| writing a hash function), you make it explicit you want
| wrapping. Otherwise arithmetic can check for overflow (and
| does by default in debug builds).
| zozbot234 wrote:
| Modulo wraparound _is_ convenient in non-trivial expressions
| involving addition, subtraction and multiplication because it
| will always give a correct in-range result if one exists.
| "Checking for overflow" in such cases is necessarily more
| complex than a simple check per operation; it must be
| designed case by case.
| zozbot234 wrote:
| Overflow checks are trivial, there's no need for special
| hardware support. It's pretty much exclusively a language-level
| concern.
| addaon wrote:
| Overflow checks can be very expensive without hardware
| support. Even on platforms with lightweight support (e.g. x86
| 'INTO'), you're replacing one of the fastest instructions out
| there -- think of how many execution units can handle a basic
| add -- with a sequence of two dependent instructions.
| zozbot234 wrote:
| A vast majority of the cost is missed optimization due to
| having to compute partial states in connection to overflow
| errors. The checks themselves are trivially predicted, and
| that's when the compiler can't optimize them out.
| monocasa wrote:
| In practice the vast majority of MIPS code uses addu, the non
| trapping variant.
|
| And in x86 land there's the into instruction, interrupt if
| overflow bit set, so you're left with the same options.
| spc476 wrote:
| Which has to be done after every instruction
| (http://boston.conman.org/2015/09/05.2) but it quite slow.
| Using a conditional jump after each instruction is faster
| than using INTO (http://boston.conman.org/2015/09/07.1).
| monocasa wrote:
| It's more complicated than shows up in micro benchmarks
| like that. Since when you do it, it's pretty much every
| add, you end up polluting your branch predictor by using jo
| instructions everywhere and it can lead to worse overall
| perf.
| colejohnson66 wrote:
| My guess would be a pipelining issue where `INTO` isn't
| treated as a `Jcc`, but as an `INT` (mainly because it _is_
| an interrupt). Agner Fog 's instruction tables[0] show (for
| the Pentium 4) `Jcc` takes one uOP with a throughput of
| 2-4. `INTO`, OTOH, when _not taken_ uses four uOPs with a
| throughput of _18_! Zen 3 is much better with a throughput
| of 2, but that 's still worse than `JO raiseINTO`.
|
| [0]: https://www.agner.org/optimize/instruction_tables.pdf
| rjsw wrote:
| There are C compilers for the PDP-10, it must count as fairly
| weird.
| AnimalMuppet wrote:
| Overall a good article. I was a bit amused and/or disgruntled to
| see a TRS80 in the "Motorola 68000" section, though...
| rjsw wrote:
| Why disgruntled? I never saw a Model 16 [1] but they did exist.
|
| [1] https://en.wikipedia.org/wiki/TRS-80_Model_II#model16
| astrobe_ wrote:
| > Everyone who writes about programming the Intel 286 says what a
| pain its segmented memory architecture was
|
| Actually this concerns more pre-80286 processors, since 80286
| introduced virtual memory, and the segment registers were less
| prominent in "protected mode". Moreover I wouldn't say it was a
| pain, at least at the assembly level, once you understood the
| trick. C had not concept of segmented memory, so you had to tell
| the compiler which "memory model" it should use.
|
| > One significant quirk is that the machine is very sensitive to
| data alignment.
|
| I remembered from school time about a "barrel register" that
| allowed to remove this limitation, but it was introduced in
| 68020.
|
| On the topic itself, I like to say that a program is portable if
| it has been ported once (likewise a module is reusable if it has
| been reused once). I remember porting a program from a 68K
| descendant to ARM, the only non-obvious portability issue was
| that in C, the _char_ type is that the standard doesn 't mandate
| the _char_ type to be signed or unsigned (it 's implementation-
| defined).
| spc476 wrote:
| The segment registers were less prominent on the 80386 in
| protected mode since you also have paging, and each segment can
| be 4G in size. On the 80286 in protected mode the segment
| registers are still very much there (no paging, each segment is
| still limited to 64k).
| zwieback wrote:
| > > Everyone who writes about programming the Intel 286 says
| what a pain its segmented memory architecture was
|
| > Actually this concerns more pre-80286 processors, since 80286
| introduced virtual memory,
|
| 86 had segments, 286 added protected mode, 386 added virtual. I
| would agree, though, 286 wasn't as bad as people make it sound.
| In OS/2 1.x it was quite usable.
| shadowofneptune wrote:
| Having done some 8086 programming recently, I did find segments
| rather helpful once you get used to them. They make it easier
| to think about handling data in a modular fashion; a large (64k
| maximum) data structure can be given its own segment. The 286
| went farther by providing protection to allocated segments. I
| have a feeling overlays only really become a nuisance once you
| start working on projects far bigger than were ever intended
| for that generation of '86. MS-DOS not having a true successor
| didn't help either.
| zwieback wrote:
| I wrote a fair amount of code for TI's TMS320C4x DSPs. They had
| 32 bit sized char, short, int, long, float and double and a long
| double with 40 bits.
|
| Took a bit to get used to but really the only way to get to the
| good stuff was by writing assembly code and hand-tuning all the
| pipeline stuff.
| rsecora wrote:
| It still amazes me how the PDP-11 has the NUXI [1] problem at
| nibble level and how the PDP-11 was bytesexual [2].
|
| [1] http://catb.org/jargon/html/N/NUXI-problem.html
|
| [2] http://catb.org/jargon/html/B/bytesexual.html
| gus_massa wrote:
| [If you remove the spaces at the beginning of the line, HN will
| make the links clicky. You probably need to add an enter in
| between to get the correct formatting.]
| rsecora wrote:
| Done, thank you
| gwern wrote:
| Note: "weird machines" here has nothing to do with the well-known
| security concept, just referring to unusual or obscure computers.
| nivertech wrote:
| The author forgot to mention that 8051 has a bit-addressable
| lower part of RAM.
|
| PDP-11 had a weird RAM overlay scheme of squeezing 256KB RAM into
| a 64KB 16-bit address space.
|
| IBM System/360 also had a weird addressing scheme with base
| register and up to 4KB offsets.
|
| https://en.wikipedia.org/wiki/IBM_System/360_architecture#Ad...
| ChuckMcM wrote:
| I scored 7. (have written C code on six of the architectures
| mentioned (PDP 11, i86, VAX, 68K, IBM 360, AT&T 3B2, and DG
| Eclipse) I have also written C code on the DEC KL-10 (36 bit
| machine) which isn't covered. And while I have a PDP-8, I only
| have FOCAL and FORTRAN for it rather than C. I'm sure there is a
| C compiler out there somewhere :-).
|
| With the imminent C23 spec I'm really amazed at how well C has
| held up over the last half century. A lot of things in computers
| are very 'temporal' (in that there are a lot of things that are
| all available at a certain point in time that are required for
| the system to work) but C has managed to dodge much of that.
| eqvinox wrote:
| On a slightly related note, chances are good anyone reading this
| has an 8051 within a few meters of them - they're close to
| omnipresent in USB chips, particularly hubs, storage bridges and
| keyboards / mice. The architecture is equally atrocious as the
| 6502.
|
| btw: a good indicator is GCC support - AVR, also an 8-bit uC - is
| perfectly well supported by GCC. 8051 and 6502, you need
| something like SDCC [http://sdcc.sourceforge.net/]
| dfox wrote:
| One thing to keep in mind while programming AVR in C is that it
| still is small-ish MCU with different address spaces. This
| means that if you do not use correct compiler-specific type
| annotations for read-only data these will get copied into RAM
| on startup (both avr-libc and arduino contain some macrology
| that catches the obvious cases like some string constants, but
| you still need to keep this in mind).
| RicoElectrico wrote:
| Hope RISC-V will displace 8051 over time. It's such an absurd
| thing to extend this architecture in myriad non-interoperable
| (although backwards-compatible with OG 8051) ways. And don't
| forget about the XRAM nonsense. Yuck.
| unwiredben wrote:
| For 6502 fans, there's a new port of Clang and LLVM that seems
| to be doing some nice code generation. See https://llvm-
| mos.org/
| yuubi wrote:
| the 6502 has a single 16-bit address space with some parts
| (zero page, stack) addressable by means other than full 16-bit
| addresses. the 8051 has 16-bit read-only code space, 16-bit
| read/write external memory space, and 8-bit internal memory
| space, except half of it is special: if you use indirect access
| (address in an 8-bit register), you get memory. but if you
| encode that same address literally in an instruction, you get a
| peripheral register.
|
| at least that's the part I remember
| jazzyjackson wrote:
| > The architecture is equally atrocious as the 6502.
|
| I only ever hear glowing/nostalgic reviews of 6502 programming,
| I guess from the retro/8bit gaming scene, curious what you find
| so atrocious.
| tenebrisalietum wrote:
| 6502 is awesome to program from assembly.
|
| What makes the 6502 atrocious for C is:
|
| - internal CPU registers are 8 bits, no more, no less and you
| only have 3 of them (5 if you count the stack pointer and
| processor status register).
|
| - fixed 8-bit stack pointer so things like automatic
| variables and pass-by-value can't take up a lot of space.
|
| - things like "access memory location Z + contents of
| register A + this offset" aren't possible without a lot of
| instructions.
|
| - no hardware divide or multiply.
|
| Many CPUs have instructions that map neatly to C operations,
| but not 6502. With enough instructions C is hostable by any
| CPU (e.g. ZPU) but a lot of work is needed to do that on the
| 6502 and the real question is - will it fit in 16K, 32K, as
| most 6502 CPUs only have 16 address lines - meaning they only
| see 64K of addresses at once. Mechanisms exist to extend that
| but they are platform specific.
|
| IMHO Z80 is better in this regard with it's 16-bit stack
| pointer and combinable index registers.
| adrian_b wrote:
| 6502 was nice only in comparison with Intel 8080/8085, but it
| was very limited in comparison with better architectures.
|
| The great quality of 6502 was that it allowed a very cheap
| implementation, resulting in a very low price.
|
| A very large number of computers used 6502 because it was the
| cheapest, not because it was better than the alternatives.
|
| For a very large number of programmers, 6502 was the 1st CPU
| whose assembly language they have learned and possibly the
| only one, as later they have used only high-level languages,
| so it is fondly remembered. That does not mean that it was
| really good.
|
| I also have nostalgic happy memories about my first programs,
| which I have entered on punched cards. That does not mean
| that using punched cards is preferable to modern computers.
|
| Programming a 6502 was tedious, because you had only 8-bit
| operations (even Intel 8080 had 16-bit additions, which
| simplified a lot multiply routines and many other
| computations) and you had only a small number of 8-bit
| registers with dedicated functions.
|
| So most or all variables of normal sizes had to be stored in
| memory and almost everything that would require a single
| instruction in modern microcontrollers, e.g. ARM, required a
| long sequence of instructions on 6502. (e.g. a single 32-bit
| integer addition would have required a dozen instructions in
| the best case, for statically allocated variables, but up to
| 20 instructions or even more when run-time address
| computations were also required, for dynamically-allocated
| variables.)
|
| A good macroassemmbler could simplify a lot the programming
| on a 6502, by writing all programs with a set of
| macroinstructions designed to simulate a more powerful CPU.
|
| I do not know whether good macro-assemblers existed for 6502,
| as in those days I have used mostly CP/M computers, which had
| a good macro-assembler from Microsoft, or the much more
| powerful Motorola 6809. I have programmed 6502 only a couple
| of times, at some friends who had Commodore computers, and it
| was weak in comparison with more recent CPUs, e.g. Zilog Z80
| (which appeared one year later than 6502).
| mwcampbell wrote:
| > I do not know whether good macro-assemblers existed for
| 6502
|
| They certainly did. I don't know about the communities that
| grew around Commodore, Atari, or other 6502-based
| computers, but in the Apple II world, there were multiple
| macro assemblers available. Possibly the most famous was
| Merlin. As a pre-teen, I used the Mindcraft Assembler.
| Mindcraft even sold another product called Macrosoft, which
| was a macro library for their assembler that tried to
| combine the speed of assembly language with a BASIC-like
| syntax. The major downside, compared to both hand-coded
| assembly and Applesoft BASIC (which was stored in a pre-
| tokenized binary format), was the size of the executable.
|
| Edit: Speaking of simulating a more powerful CPU, Steve
| Wozniak implemented the SWEET16 [1] bytecode interpreter as
| part of his original Apple II ROM. Apple Pascal used
| p-code. And a more recent bytecode interpreter for the
| Apple II is PLASMA [2].
|
| [1]: https://en.wikipedia.org/wiki/SWEET16
|
| [2]: https://github.com/dschmenk/plasma
| le-mark wrote:
| Wozniak's sweet 16 was along these lines:
|
| https://en.m.wikipedia.org/wiki/SWEET16
| eqvinox wrote:
| > curious what you find so atrocious.
|
| In the context of the original post, it's a bad target for C
| -- I have no clue about other 6502 use :)
| kwertyoowiyop wrote:
| Given C's origin on the PDP-11, it's amazing it ended up so
| portable to all these crazy architectures. Even as an old-timer,
| the 8051 section made me say "WTF"!
___________________________________________________________________
(page generated 2022-02-21 23:00 UTC)