[HN Gopher] The Byte Order Fiasco (2021)
___________________________________________________________________
The Byte Order Fiasco (2021)
Author : fanf2
Score : 35 points
Date : 2024-06-30 17:42 UTC (5 hours ago)
(HTM) web link (justine.lol)
(TXT) w3m dump (justine.lol)
| throw0101b wrote:
| > _Clang and GCC are reaching for any optimization they can get.
| Undefined Behavior may be hostile and dangerous, but it 's legal.
| So don't let your code become a casualty._
|
| Perhaps we need a _-Wundefined-behaviour_ so that compilers print
| out messages when they use those type of 'tricks'. If you see
| them you can then choose to adjust your code in a way that it
| follows defined path(s) of the standard(s) in question.
| Uvix wrote:
| Isn't that what _-fsanitize=undefined_ does?
| MaulingMonkey wrote:
| `-fsanitize=undefined` enables some runtime warnings/checks,
| `-Wundefined-behaviour` would presumably enable some kind of
| compile time warnings/checks.
| saagarjha wrote:
| You'll have far too many messages.
| amelius wrote:
| Another approach would be to implement undefined behavior in
| a way that is random.
|
| https://en.wikipedia.org/wiki/Chaos_engineering
| tedunangst wrote:
| Do you want a warning on every for (int i = 0; i < n; i++)
| loop?
| karatinversion wrote:
| I always thought the problem with this was that the compilers
| do loads of these optimisations in very mundane ways. Eg if I
| have a #define FOO 17 void bar(int x, int
| y) { if (x + y >= FOO) { //do stuff }
| } void baz(int x) { bar(x, FOO); }
|
| the compiler can inline the call to bar in baz, and then
| optimise the condition to (x>=0)... because signed integer
| overflow is undefined, so can't happen, so the two conditions
| are equivalent.
|
| The countless messages about optimisations like that would
| swamp ones about real dangerous optimisations.
| userbinator wrote:
| Or better yet, perhaps they should just do the sane thing
| instead of being a hostile pedantic smartass.
|
| Undefined shouldn't mean "whatever", it should be an
| opportunity to consider what makes the most sense.
| amelius wrote:
| But then you have to write a standard.
| IshKebab wrote:
| These days I think the sane option is to just add a static assert
| that the machine is little endian and move on with your life.
| Unless you're writing glibc or something do you _really_ care
| about supporting ancient IBM mainframes?
|
| Also recommending octal is sadistic!
| bvrmn wrote:
| > add a static assert that the machine is little endian and
| move on with your life
|
| It's not clear how it would free you from interpreting BE data
| from incoming streams/blobs.
| forrestthewoods wrote:
| I feel like we're at a point where you should assume little
| endian serialization and treat anything big endian as a slow
| path you don't care about. There's no real reason for any
| blob, stream, or socket to use big endian for anything
| afaict.
|
| If some legacy system still serializes big endian data then
| call bswap and call it a day.
| bvrmn wrote:
| AFAIK quite a number of protocols and file formats use BE
| without any sign to become a legacy even in a distant
| future.
| saagarjha wrote:
| You do realize that most of the networking stack is big-
| endian, right?
| syncsynchalt wrote:
| The internet is big-endian, and generally data sent over
| the wire is converted to/from BE. For example the numbers
| in IP or TCP headers are big-endian, and any RFC that
| defines a protocol including binary data will generally go
| with big-endian numbers.
|
| I believe this dates from Bolt Baranek and Newman basing
| the IMP on a BE architecture. Similarly computers tend to
| be LE these days because that's what the "winning" PC
| architecture (x86) uses.
| userbinator wrote:
| Only the early protocols below the application layer are
| BE. A lot of the later stuff switched to LE.
| kortilla wrote:
| Yes, those "early protocols" carry everything. Until
| applications stop opening sockets, this problem doesn't
| go away.
| forrestthewoods wrote:
| > any RFC that defines a protocol including binary data
| will generally go with big-endian numbers
|
| I'm not sure this is true. And if it is true it really
| shouldn't be. There are effectively no modern big endian
| CPUs. If designing a new protocol there is, afaict, zero
| benefit to serializing anything as big endian.
|
| It's unfortunate that TCP headers and networking are big
| endian. It's a historical artifact.
|
| Converting data to/from BE is a waste. I've designed and
| implemented a variety of simple communication protocols.
| They all define the wire format to be LE. Works great,
| zero issues, zero regrets.
| bla3 wrote:
| https://commandcenter.blogspot.com/2012/04/byte-order-
| fallac... covers that part.
| MenhirMike wrote:
| Big Endian is also called Network Order because some networking
| protocols use it. And of course, UTF-16 BE is a thing.
|
| There is a non-trivial chance that you will have to deal with
| BE data regardless if your machine is LE or BE.
| o11c wrote:
| When I dealt with this, there were a couple major gotchas:
|
| * Compilers seem to reliably detect byteswap, but are(were) very
| hit-or-miss with the shift patterns for reading/writing to memory
| directly, so you still need(ed) an ifdef. I know compilers have
| improved but there are so many patterns that I'm still paranoid.
|
| * There are a _lot_ of "optimized" headers that actually
| pessimize by inserting inline assembly that the compiler can't
| optimize through (in particularly, the compiler can't inline
| constants and can't choose `movbe` instead of `bswap`), so do not
| trust any standard API; write your own with memcpy + ifdef'd
| C-only swapping.
|
| * For speaking wire protocols, generating (struct-based?) code is
| far better than writing code that mentions offsets directly,
| which in turn is far better than the `mempcpy`-like code which
| the link suggests.
| akira2501 wrote:
| > Now you don't need to use those APIs because you know the
| secret.
|
| Was that a desired outcome? The endian.3 and byteorder.3 manual
| pages make it easy.
| pizlonator wrote:
| > Modern compiler policies don't even permit code that looks like
| like that anymore. Your compiler might see that and emit assembly
| that formats your hard drive with btrfs.
|
| This is total FUD. Some sanitizers might be unhappy with that
| code, but that's just sanitizers creating problems where there
| need not be any.
|
| The llvm policy here is that must alias trumps TBAA, so clang
| will reliably compile the cast of char* to uint32_t* and do what
| systems programmers expect. If it didn't then too much stuff
| would break.
___________________________________________________________________
(page generated 2024-06-30 23:01 UTC)