[HN Gopher] Possible reasons for 8-bit bytes
___________________________________________________________________
Possible reasons for 8-bit bytes
Author : cpach
Score : 89 points
Date : 2023-03-07 13:14 UTC (9 hours ago)
(HTM) web link (jvns.ca)
(TXT) w3m dump (jvns.ca)
| billpg wrote:
| This was three or four jobs ago, but I remember reviewing
| someone's C code and they kept different collections of char* and
| int* pointers where they could have used a single collection of
| void* and the handler code would have been a lot simpler.
|
| The justification was that on this particular platform, char*
| pointers were differently structured to int* pointers, because
| char* pointers had to reference a single byte and int* pointers
| didn't.
|
| EDIT - I appear to have cut this story short. See my response to
| "wyldfire" for the rest. Sorry for causing confusion.
| dahart wrote:
| It is true that on at least some platforms an int* that is 4
| byte aligned is _faster_ to access than a pointer that is not
| aligned. I don't know if there are platforms where int* is
| assumed to be 4-byte aligned, or if the C standard allows or
| disallows that, but it seems plausible that some compiler
| somewhere defaulted to assuming an int* is aligned. Some
| compilers might generate 2 load instructions for an unaligned
| load, which incurs extra latency even if your data is already
| in the cache line. These days usually you might use some kind
| of alignment directive to enforce these things, which works on
| any pointer type, but it does seem possible that the person's
| code you reviewed wasn't incorrect to assume there's a
| difference between those pointer types, even if there was a
| better option.
| wyldfire wrote:
| > because char* pointers had to reference a single byte and
| int* pointers didn't.
|
| I must be missing some context or you have a typo. Probably
| most architectures I've ever worked with had `int *` refer to a
| register/word-sized value, and I've not yet worked with an
| architecture that had single-byte registers.
|
| Decades ago I worked on a codebase that used void * everywhere
| and rampant casting of pointer types to and fro. It was a total
| nightmare - the compiler was completely out of the loop and
| runtime was the only place to find your bugs.
| mlyle wrote:
| There are architectures where all you have is word addressing
| to memory. If you want to get a specific byte out, you need
| to retrieve it and shift/mask yourself. In turn, a pointer to
| a byte is a software construct rather than something there's
| actual direct architectural support for.
| loeg wrote:
| Do C compilers for those platforms transparently implement
| this for your char pointers as GP suggests? I would expect
| that you would need to do it manually and that native C
| pointers would only address the same words as the machine
| itself.
| billpg wrote:
| Depends on how helpful the compiler is. This particular
| compiler had an option to switch off adding in bit
| shifting code when reading characters and instead set
| CHAR_BIT to 32, meaning strings would have each character
| taking up 32 bits of space. (So many zero bits, but
| already handles emojis.)
| mlyle wrote:
| > Do C compilers for those platforms transparently
| implement this for your char pointers as GP suggests?
|
| Yes. Lots of little microcontrollers and older big
| machines have this "feature" and C compilers fix it for
| you.
|
| There are nightmarish microcontrollers with Harvard
| architectures and standards-compliant C compilers that
| fix this up all behind the scenes for you. E.g. the 8051
| is ubiquitous, and it has a Harvard architecture: there
| are separate buses/instructions to access program memory
| and normal data memory. The program memory is only word
| addressable, and the data memory is byte addressable.
|
| So, a "pointer" in many C environments for 8051 says what
| bus the data is on and stashes in other bits what the
| byte address is, if applicable. And dereferencing the
| pointer involves a whole lot of conditional operations.
|
| Then there's things like the PDP-10, where there's
| hardware support for doing fancy things with byte
| pointers, but the pointers still have a different format
| than word pointers (e.g. they stash the byte offset in
| the high bits, not the low bits).
|
| The C standards makes relatively few demands upon
| pointers so that you can do interesting things if
| necessary for an architecture.
| yurish wrote:
| I have seen a DSP processor that could address only
| 16-bit words. And C compiler did not fix it, bytes had 16
| bits there.
| loeg wrote:
| Yeah, this is what I have heard of and was expecting.
| Sibling comment says it's not universal -- some C
| compilers for these platforms emulate byte addressing.
| csense wrote:
| x86 is byte addressable, but internally, the x86 memory bus
| is word addressable. So an x86 CPU does the shift/mask
| process you're referring to internally. Which means it's
| actually slower to access (for example) a 32-bit value that
| is not aligned to a 4-byte boundary.
|
| C/C++ compilers often by default add extra bytes if
| necessary to make sure everything's aligned. So if you have
| struct X { int a; char b; int c; char d; } and struct Y {
| int a; int b; char c; char d; } actually X takes up more
| memory than Y, because X needs 6 extra bytes to align the
| int fields to 32-bit boundaries (or 14 bytes to align to a
| 64-bit boundary) while Y only needs 2 bytes (or 6 bytes for
| 64-bit).
|
| Meaning you can sometimes save significant amounts of
| memory in a C/C++ program by re-ordering struct fields [1].
|
| [1] http://www.catb.org/esr/structure-packing/
| mlyle wrote:
| Sure, unaligned access to memory is always expensive (on
| architectures that allow it at all).
|
| But I'm talking about retrieving the 9th to 16th bit of a
| word, which is a little different. x86 does this just
| fine/quickly, because bytes are addressable.
| kjs3 wrote:
| _an architecture that had single-byte registers_
|
| Wild guess, but the OP might be talking about the Intel 8051.
| Single-byte registers, and depending on the C compiler (and
| there are a few of them) 8-bit int* pointing to the first
| 128/256 bytes of memory, but up to 64K of (much slower)
| memory is supported in different memory spaces with different
| instructions and a 16-bit register called DPTR (and some
| implementations have 2 DPTR registers). C support for these
| additional spaces is mostly via compiler extensions analogous
| but different from the old 8086 NEAR and FAR pointers. I'm
| obviously greatly simplifying and leaving out a ton of
| details.
|
| Oh, yeah...on 8051 you need to support bit addressing as
| well, at least for the 16 bytes from 20h to 2Fh. It's an odd
| chip.
| billpg wrote:
| I forget the details (long time ago) but char* and int*
| pointers had a different internal structure. The assembly
| generated by the compiler when code accessed a char* pointer
| was optimized for accessing single bytes and was very
| different to the code generated for an int* pointer.
|
| Digging deeper, this particular microcontroller was tuned for
| accessing 32 bits at a time. Accessing individual bytes
| needed extra bit-shuffling code to be added by the compiler.
| wyldfire wrote:
| > char* and int* pointers had a different internal
| structure. The assembly generated by the compiler when code
| accessed a char* pointer was optimized for accessing single
| bytes and was very different to the code generated for an
| int* pointer.
|
| But -- they _are_ different. Architectures where they 're
| treated the same are probably the exception. Depending on
| what you mean by "very different" - most architectures will
| emit different code for byte access versus word access.
| billpg wrote:
| Accessing a 32 bit word was a simple read op.
|
| Accessing an 8 bit byte from a pointer, the compiler
| would insert assembly code into the generated object
| code. The "normal" part of the pointer would be read,
| loading four characters into a 32 bit register. Two extra
| bits were squirreled away somewhere in the pointer and
| these would feed into a shift instruction so the
| requested byte would appear in the lowest-significant 8
| bits of the register. Finally, an AND instruction would
| clear the top 24 bits.
| leeter wrote:
| Sounds like M68K or something similar, although Alpha AXP
| had similar byte level access issues. A compiler on either
| of those platforms likely would add a lot of fix up code to
| deal with the fact they have to load the aligned (either
| 16bit in M68K case or 32Bit IIRC in Alpha) and then do
| bitwise and shifts depending on the pointers lower bits.
|
| Raymond's blog on the Alpha https://devblogs.microsoft.com/
| oldnewthing/20170816-00/?p=96...
| monocasa wrote:
| M68k was byte addressable just fine. Early alpha had that
| issue though, as did later cray compilers. Alpha fixed it
| with BWX (byte word extension). Early cray compilers
| simply defined char as being 64bits, but later added
| support for the shift/mask/thick pointer scheme to pack 8
| chars in a word.
| leeter wrote:
| Must have depended on variant, the one we used in college
| would throw a GP fault for misaligned access. It
| literally didn't have an A0 line. That said it's been
| over 10 years and I could be remembering the very hard
| instruction alignment rules as applying to data too...
| monocasa wrote:
| 16bits had to be aligned. It didn't have an A0 because of
| the 16bit pathway, but it did have byte select lines
| (#UDS, #LDS) for when you'd move.b d0,ADDR so that
| devices external to the CPU could see an 8-bit data
| access if that's what you were doing.
| 908B64B197 wrote:
| > ("a word is the natural unit of data used by a particular
| processor design") Apparently on x86 the word size is 16 bits,
| even though the registers are 64 bits.
|
| That's true for the the original x86 instruction set. IA-32 has
| 32 bit word size and x86-64 has... you guessed it 64.
|
| 16 and 32 bit registers are still retained for compatibility
| reasons (just look at the instruction set!).
| stefan_ wrote:
| It's extra fun because not only are the registers retained,
| they were only _extended_. So you can use their 16 and 32 bit
| names to refer to smaller sized parts of them.
| loeg wrote:
| Some x86 categorizations would call those dwords and qwords
| respectively.
| tom_ wrote:
| Words aeem to be always 16 bits for the x86 and derivatives -
| see the data types section of the software developer manual.
| dragonwriter wrote:
| I think "word" _as a datatype_ evolved from its original and
| more general computing meaning of "the natural unit of data
| used by a processor design" (the thing we talk about with an
| "x-bit processor") to "16 bits" during the period of 16-bit
| dominance and the explosion of computing that was happening
| around it.
|
| Essentially, enough stuff got written assume=ing that "word"
| was 16 bits (and double word, quad word, etc., had their
| obvious relationship) that even though the term had not
| previously been fixed, it would break the world to let it
| change, even as processors with larger word sizes (in the
| "natural unit of data" sense) became available, then popular,
| then dominant.
| IIAOPSW wrote:
| Binary coded decimal makes perfect sense if you're going to
| output the value to a succession of 7 segment displays (such as
| in a calculator). You would have to do that conversion in
| hardware anyway. A single repeated circuit mapping 4 bits to 7
| segments gets you the rest of the way to readable output. Now
| that I think about it, its surprising ASCII wasn't designed
| around ease of translation to segmented display.
| kibwen wrote:
| _> Now that I think about it, its surprising ASCII wasn 't
| designed around ease of translation to segmented display._
|
| Wikipedia has a section on the design considerations of ASCII:
| https://en.wikipedia.org/wiki/ASCII#Design_considerations
| jodrellblank wrote:
| I love that there's a fractal world down there; from that the
| digits 0-9 start with bit pattern 0011 and then their value
| in binary to make easy convertion to/from BCD, and that
| control codes Start Message and End Message were positioned
| to maximise the Hamming distance so they're maximally
| different and least likely to be misinterpreted as each other
| in case of bits being mixed up, that 7-bit ASCII used on
| 8-bit tape drives left room for a parity bit for each
| character, that lowercase and uppercase letters differ only
| by the toggling of a single bit, that some of the
| digit/shift-symbol pairings date back to the first typewriter
| with a shift key in 1878...
| bregma wrote:
| Maybe because ASCII is from the early 1960s and 7-segment
| displays didn't become widespread until 15 years or so later.
| karmakaze wrote:
| EBCDIC 1963/64 (i.e. E-BCD-IC) was an extension of BCD to
| support characters.
|
| [0] https://en.wikipedia.org/wiki/EBCDIC
| ant6n wrote:
| Maybe another vague reason: when PCs came about in the era of the
| 8008...8086s, 64K of RAM was like a high, reasonable amount. So
| you need 16-bit pointers, which require exactly 2 bytes.
| kleton wrote:
| ML might benefit a lot from 10bit bytes. Accelerators have a
| separate memory space from the CPU after all, and have their own
| hbm dram as close as possible to the dies. In exchange, you could
| get decent exponent size on a float10 that might not kill your
| gradients when training a model
| londons_explore wrote:
| There seems to be as-yet no consensus on the best math
| primitives for ML.
|
| People have invented new ones for ML (eg the Brain Float16),
| but even then some people have demonstrated training on int8 or
| even int4.
|
| There isn't even consensus on how to map the state space onto
| the numberline - is linear (as in ints) or exponential (as in
| floats) better? Perhaps some entirely new mapping?
|
| And obviously there could be different optimal numbersystems
| for different ML applications or different phases of training
| or inference.
| kibwen wrote:
| The reason to have a distinction between bits and bytes in the
| first place is so that you can have a unit of addressing that is
| different from the smallest unit of information.
|
| But what would we lose if we just got rid of the notion of bytes
| and just let every bit be addressable?
|
| To start, we'd still be able to fit the entire address space into
| a 64-bit pointer. The maximum address space would merely be
| reduced from 16 exabytes to 2 exabytes.
|
| I presume there's some efficiency reason why we can't address
| bits in the first place. How much does that still apply? I admit,
| I'd just rather live in a world where I don't have to think about
| alignment or padding ever again. :P
| jecel wrote:
| The TMS340 family used bit addresses, but pointers were 32
| bits.
|
| https://en.wikipedia.org/wiki/TMS34010
| ElevenLathe wrote:
| 64 bits of addressing is actually much more than most (any?)
| actually-existing processors have, for the simple reason that
| there is little demand for processors that can address 16
| exabytes of memory and all those address lines still cost
| money.
| FullyFunctional wrote:
| More to the point, storing the _pointers_ cost memory.
| Switching from 32-bit to 64-bit effectively halved the caches
| for pointer-rich programs. AMD64 was a win largely due to all
| the things they did to compensate (including doubling the
| number of registers).
| cpleppert wrote:
| There are a couple of efficiency reason besides the simple fact
| that every piece of hardware in existence operates on data
| sizes in powers of the byte. To start off with it would be
| fantastically inefficient to build a cpu that could load
| arbitrary bit locations so you would either be restricted to
| loading memory locations that are some reasonable fraction of
| the internal cache line or pay a massive performance penalty to
| load a bit address. Realistically what would you gain by doing
| this when the cpu would have to divide any location by eight
| (or some other fraction) to figure out which cache line it
| needs to load?
|
| The article touches on this but having your addressable unit
| fit a single character is incredibly convenient. If you are
| manipulating text you will never worry about single bits in
| isolation. Ditto for mathematical operations, do you really
| have a need for numbers less than 255? It is a lot more
| convenient to think about memory locations as some reasonable
| unit that covers 99% of your computing use cases.
| beecafe wrote:
| [dead]
| AdamH12113 wrote:
| For those who are confused about bytes vs. words:
|
| The formal definition of a byte is that it's the smallest
| _addressable_ unit of memory. Think of a memory as a linear
| string of bits. A memory address points to a specific group of
| bits (say, 8 of them). If you add 1 to the address, the new
| address points to the group of bits immediately after the first
| group. The size of those bit groups is 1 byte.
|
| In modern usage, "byte" has come to mean "a group of 8 bits",
| even in situations where there is no memory addressing. This is
| due to the overwhelming dominance of systems with 8-bit bytes.
| Another term for a group of 8 bits is "octet", which is used in
| e.g. the TCP standard.
|
| Words are a bit fuzzier. One way to think of a word is that it's
| the largest number of bits acted on in a single operation without
| any special handling. The word size is typically the size of a
| CPU register or memory bus. x86 is a little weird with its
| register addressing, but if you look at an ARM Cortex-M you will
| see that its general-purpose CPU registers are 32 bits wide.
| There are instructions for working on smaller or larger units of
| data, but if you just do a generic MOV, LDR (load), or ADD
| instruction, you will act on 32 register bits. This is what it
| means for 32 bits to be the "natural" unit of data. So we say
| that an ARM Cortex-M is a 32-bit CPU, even though there are a few
| instructions that modify 64 bits (two registers) at once.
|
| Some of the fuzziness in the definition comes from the fact that
| the sizes of the CPU registers, address space, and physical
| address bus can all be different. The original AMD64 CPUs had
| 64-bit registers, implemented a 48-bit address space, and brought
| out 40 address lines. x86-64 CPUs now have 256-bit SIMD
| instructions. "32-bit" and "64-bit" were also used as marketing
| terms, with the definitions stretched accordingly.
|
| What it comes down to is that "word" is a very old term that is
| no longer quite as useful for describing CPUs. But memories also
| have word sizes, and here there is a concrete definition. The
| word size of a memory is the number of bits you can read or write
| at once -- that is, the number of data lines brought out from the
| memory IC.
|
| (Note that a memory "word" is technically also a "byte" from the
| memory's point of view -- it's both the natural unit of data and
| the smallest addressable unit of data. CPU bytes are split out
| from the memory word by the memory bus or the CPU itself. Since
| computers are all about running software, we take the CPU's
| perspective when talking about byte size.)
| FullyFunctional wrote:
| It's not entirely historically accurate. Early machines were
| "word addressable" (where the words wasn't 8-bit) which by your
| definition should have been called "byte addressable".
|
| There were even bit addressable computers, but it didn't catch
| on :)
|
| If it wasn't for text, there would be nothing "natural" about
| an 8-bit byte (but powers-of-two are natural in binary
| computers).
| fanf2 wrote:
| In the Microsoft world, "word" generally means 16 bits, because
| their usage dates back to the 16 bit era. Other sizes are
| double words and quad words
|
| In the ARM ARM, a word is 32 bits, because that was the Arm's
| original word size. Other sizes are half words and double
| words.
|
| It is a very context-sensitive term.
| AdamH12113 wrote:
| >In the Microsoft world, "word" generally means 16 bits,
| because their usage dates back to the 16 bit era. Other sizes
| are double words and quad words
|
| Ah, yes. That terminology is still used in the Windows
| registry, although Windows 10 seems to be limited to DWORD
| and QWORD. Probably dates back to the 286 or earlier. :-)
| ajross wrote:
| FWIW, those conventions come from Intel originally, Microsoft
| took it from them. ARM borrowed from VAX Unix conventions,
| who got it from DEC.
| cwoolfe wrote:
| Because humans have 10 fingers and 8 is the closest power-of-two
| to that.
| gtop3 wrote:
| The article points out that a power of two bit count is
| actually less important than many of us assume at first.
| williamDafoe wrote:
| I worked on the UIUC PLATO system in the 1970s : CDC-6600, 7600
| cpus with 60-bit words. Back then everything used magnetic core
| memory and that memory was unbelievably expensive! Sewn together
| by women in southeast Asia, maybe $1 per word!
|
| Having 6-bit bytes on a CDC was a terrific PITA! The byte size
| was a tradeoffs between saving MONEY (RAM) and the hassle of
| shift codes (070) used to get uppercase letters and rare symbols!
| Once semiconductor memory began to be available (2M words of
| 'ECS' - "extended core storage" - actually semiconductor memory -
| was added to our 1M byte memory in ~1978) computer architects
| could afford to burn the extra 2 bits in every word to make
| programming easier...
|
| At about the same time microprocessors like the 8008 were
| starting to take off (1975). If the basic instruction could not
| support a 0-100 value it would be virtually useless! There was
| only 1 microprocessor that DID NOT use the 8-bit byte and that
| was the 12-bit intersil 6100 which copied the pdp-8 instruction
| set!
|
| Also the invention of double precision floating point made 32-bit
| floating point okay. From the 40s till the 70s the most critical
| decision in computer architecture was the size of the floating
| point word: 36, 48, 52, 60 bits ... But 32 is clearly inadequate.
| But the idea that you could have a second larger floating point
| fpu that handled 32 AND 64-bit words made 32-bit floating point
| acceptable..
|
| Also in the early 1970s text processing took off, partly from the
| invention of ASCII (1963), partly from 8-bit microprocessors,
| partly from a little known OS whose fundamental idea was that
| characters should be the only unit of I/O (Unix -father of
| Linux).
|
| So why do we have 8-bit bytes? Thank you, Gordon Moore!
| kjs3 wrote:
| I worked on the later CDC Cyber 170/180 machines, and yeah
| there was a C compiler (2, in fact). 60-bit words, 18-bit
| addresses and index registers, and the choice of 5-bit or
| 12-bit chars. The highly extended CDC Pascal dialect papered
| over more of this weirdness and was much less torturous to use.
| The Algol compiler was interesting as well.
|
| The 180 introduced a somewhat less wild, certainly more C
| friendly, 64-bit arch revision.
|
| _There was only 1 microprocessor that DID NOT use the 8-bit
| byte_
|
| Toshiba had a 12-bit single chip processor at one time I'm
| pretty sure you could make a similar claim about. More of a
| microcontroller for automotive that general purpose processor,
| tho.
| gumby wrote:
| Author doesn't mention that several of those machines with 36-bit
| words had byte instructions allowing you to point at particular
| byte (your choice as to width, from 1-36 bits wide) and/or to
| stride through memory byte by byte (so an array of 3-bit fields
| was as easy to manipulate as any other size).
|
| Also the ones I used to program (PDP-6/10/20) had an 18-bit
| address space, which you may note is a CONS cell. In fact the
| PDP-6 (first installed in 1964) was designed with LISP in mind
| and several of its common instructions were LISP primitives (like
| CAR and CDR).
| drfuchs wrote:
| Even more so, 6-bit characters were often used (supporting
| upper case only), in order to squeeze six characters into a
| word. Great for filenames and user id's. And for text files,
| 7-bit was enough to get upper and lower case and all the
| symbols, and you could pack five characters into a word. What
| could be better?
| downvotetruth wrote:
| The obvious or most commonly occurring characters
| [A-Za-z0-9\\. ]+ or upperloweralphanumericdotspace 6 bit
| encoding seems absent.
| samtho wrote:
| I'm kind of disappointed that embedded computing was not
| mentioned. It is the longest running use-case for resource
| constrained applications and there are cases where not only are
| you using 8-bit bytes but also an 8 bit CPU. BCD is still widely
| used in this case to encode data to 7 segment displays or just as
| data is relayed over the wire between chips.
| williamDafoe wrote:
| I agree completely! See my answer up above. Only 7 or 8 bits
| makes sense for a microprocessor, not useful if you cannot
| store 0-100 in a byte! With ASCII(1963) becoming ubiquitous,
| the 8008 had to be 8-bits! Otherwise it would have been the
| 7007 lol ...
| [deleted]
| [deleted]
| moremetadata wrote:
| > why was BCD popular?
|
| https://www.truenorthfloatingpoint.com/problem
|
| Floating point arithmetic has its problems.
|
| [1] Ariane 5 ROCKET, Flight 501
|
| [2] Vancouver Stock Exchange
|
| [3] PATRIOT MISSILE FAILURE
|
| [4] The sinking of the Sleipner A offshore platform
|
| [1] https://en.wikipedia.org/wiki/Ariane_flight_V88
|
| [2] https://en.wikipedia.org
| /wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price
|
| [3] https://www-users.cse.umn.edu/~arnold/disasters/patriot.html
|
| [4] https://en.wikipedia.org/wiki/Sleipner_A#Collapse
| elpocko wrote:
| Can you elaborate? How/why is BCD a better alternative to
| floating point arithmetic?
| moremetadata wrote:
| For the reasons others have mentioned, plus BCD doesnt suffer
| data type issues in the same way unless the output data type
| is wrong, but then the coder has more problems than they
| realise.
|
| The only real disadvantage for BCD is its not as quick as
| Floating point arithmetic, or bit swapping data types, but
| with todays faster processors, for most people I'd say the
| slower speed of BCD is a non issue.
|
| Throw in other hardware issues, like bit swapping in non ECC
| memory and the chances of error's accumulate if not using
| BCD.
| finnh wrote:
| floating point error. BCD guarantees you that 1/10th,
| 1/100th, 1/100th, etc (to some configurable level) will be
| perfectly accurate, without accumulating error during repeat
| calculations.
|
| floating point cannot do that, its precision is based on
| powers of 2 (1/2, 1/4, 1/8, and so on). For small values (in
| the range 0-1), there are _so many_ values represented that
| the powers of 2 map pretty tightly to the powers of 10. But
| as you repeat calculations, or get into larger values (say,
| in the range 1,000,000 - 1,000,001), the floating points
| become more sparse and errors crop up even easier.
|
| For example, using 32 bit floating point values, each
| consecutive floating point in the range 1,000,000 - 1,000,001
| is 0.0625 away from the next. jshell>
| Math.ulp((float)1_000_000) $5 ==> 0.0625
| ajross wrote:
| As others are pointing out, decimal fidelity and "error"
| are different things. Any fixed point mantissa
| representation in any base has a minimal precision of one
| unit in its last place, the question is just which numbers
| are exactly representable and which results have only
| inexact representations that can accumulate error.
|
| BCD is attractive to human beings programming computers to
| duplicate algorithms (generally financial ones) intended
| for other human beings to execute using arabic numerals.
| But it's not any more "accurate" (per transistor, it's
| actually less accurate due to the overhead).
| danbruc wrote:
| You are confusing two things. Usually you represent decimal
| numbers as rational fractions p/q with two integers. If you
| fix q, you get a fixed point format, if you allow q to
| vary, you get a floating point format. Unless you are
| representing rational numbers you usually limit the
| possible values of q, usually either powers of two or ten.
| Powers of two will give you your familiar floating point
| numbers but there are also base ten floating point numbers,
| for example currency data types.
|
| BCD is a completely different thing, instead of tightly
| encoding an integer you encode it digit by digit wasting
| some fraction of a bit each time but make conversion to and
| from decimal numbers much easier. But there is no advantage
| compared to a base ten fixed or floating point
| representation when it comes to representable numbers.
| elpocko wrote:
| You can have infinite precision in pretty much any accurate
| representation though, no? Where is the advantage in using
| BCD over any other fixed point representation?
| KMag wrote:
| The Ariane bug was an overflow casting 64-bit floating point to
| 16-bit integer. It would still have overflowed at the same
| point if it had been 64-bit decimal floating point using the
| same units. The integer part of the floating point number still
| wouldn't have fit in a signed 16-bit integer.
|
| As per the provided link, the Patriot missile error was 24-bit
| fixed point arithmetic, not floating point. Granted, a fixed-
| point representation in tenths of a second would have fixed
| this particular problem, as would have using a clock frequency
| that's a power of 1/2 (in Hz). Though, using a base 10
| representation would have prevented this rounding error, it
| would also have reduced the time before overflow.
|
| I think IEEE-754r decimal floating point is a huge step
| forward. In particular, I think there was a huge missed
| opportunity in defining open spreadsheet formats that decimal
| floating point option wasn't introduced.
|
| However, binary floating point rounding is irrelevant to the
| Patriot fixed-point bug.
|
| It's not reasonable to expect accountants and laypeople to
| understand binary floating point rounding. I've seen plenty of
| programmers make goofy rounding errors in financial models and
| trading systems. I've encountered a few developers who
| literally believed the least significant few bits of a floating
| point calculation are literally non-deterministic. (The best I
| can tell, they thought spilling/loading x87 80-bit floats from
| 64-bit stack-allocated storage resulted in whatever bits were
| already present in the low-order bits in the x87 registers.)
| pestatije wrote:
| BCD is not floating point
| coldtea wrote:
| That's the parent's point
| pflanze wrote:
| Avoiding floating point doesn't imply BCD. Any
| representation for integers would do fine, including
| binary.
|
| There are two reasons for BCD, (1) to avoid the cost of
| division for conversion to human readable representation as
| implied in the OP, (2) when used to represent floating
| point, to avoid "odd" representations in the human format
| resulting from the conversion (like 1/10 not shown as 0.1).
| (2) implies floating point.
|
| Eben in floating point represented using BCD you'd have
| rounding errors when doing number calculations, that's
| independent of the conversion to human readable formats; so
| I don't see any reason to think that BCD would have avoided
| any disasters unless humans were involved. BCD or not is
| all about talking to humans, not to physics.
| coldtea wrote:
| > _Avoiding floating point doesn 't imply BCD_
|
| Parent didn't say it's a logical necessity, as in "avoid
| floating point ==> MUST use BCD".
|
| Just casually mentioned that one reason BCD got popular
| to sidestep such issues in floating point.
|
| (I'm not saying that's the reason, or that it's the best
| such option. It might even be historically untrue that
| this was the reason - just saying the parent's statements
| can and probably should be read like that).
| pflanze wrote:
| Sidestep which issue? The one of human representation, or
| the problems with floating point?
|
| If they _just_ want to side step problems with floating
| point rounding targetting the physical world, they need
| to go with integers. Choosing BCD to represent those
| integers makes no sense at all for that purpose. All I
| sense is a conflation of issues.
|
| Also, thinking about it from a different angle, avoiding
| issues with the physical world is one of properly
| calculating so that rounding errors become no issues.
| Choosing integers probably helps with that more in the
| sense that it is making the programmer aware. Integers
| are still discrete and you'll have rounding issues.
| Higher precision can hide risks from rounding errors
| becoming relevant, which is why f64 is often chosen over
| f32. Going with an explicit resolution and range will
| presumably (I'm not a specialist in this area) make
| issues more upfront. Maybe at the risk of missing some
| others (like with the Ariane rocket that blew up because
| of a range overflow on integer numbers -- Edit: that
| didn't happen _on_ the integer numbers though, but when
| converting to them).
|
| A BCD number representation helps over the binary
| representation when humans are involved who shouldn't be
| surprised by the machine having different rounding than
| what the human is used to from base 10. And _maybe_
| historically the cost of conversion. That 's all. (Pocket
| calculators, and finance are the only areas I'm aware of
| where that matters.)
|
| PS. danbruc
| (https://news.ycombinator.com/item?id=35057850) says it
| better than me.
| sargstuff wrote:
| Modern day vaccum tube hobby take on 8 bit ascii from
| unabstracted signal procesing point of view (pre-type punning):
|
| 1920's-1950's were initially reusing prior experience/knowledge
| of each punch card card hole as an individual electric on/off
| switch [1],
|
| Electronic relays required 4 electrical inputs [2](flow
| control/reset done per 'end of current row hole punches)
|
| 10 holes per line -> 3 relays!; 8 holes per line -> 2 relays,
| where each relay deals with 4 bits.
|
| Switching away from physical punch card media to electric/audio,
| 7holes per line, with extra bit for indicating 'done' with
| current set of row holes.
|
| 8 holes per line needed 'software support' or make use of the
| hardware for 3rd relay (formerly need for 10 holes in a line)
|
| Numbers faster because with 6 bits, don't need 3rd relay to do
| flow control.
|
| Wonder if the pairing of binary sequence with graphic glyp could
| be considered to be the origin of the closure concept.
|
| modern day abstractions based on '4 wire relay' concept:
|
| tcp/ip twisted pair
|
| usb prior to 3.2 vs. usb 3.2 variable lane width
|
| epci fixed lane vs. latest epci spec with variable width lane
|
| -----
|
| [1] : http://quadibloc.com/comp/cardint.htm
|
| [2] : https://en.wikipedia.org/wiki/Vacuum_tube
| PeterWhittaker wrote:
| Or maybe it was C? http://www.catb.org/~esr/faqs/things-every-
| hacker-once-knew/...
| cpleppert wrote:
| The transition started before C, EBCDIC was 8 bits and ASCII
| was essentially a byte encoding. Unless you were designing some
| exotic hardware you probably needed to handle text and that was
| an eight bit byte. One motivation for the C type system was to
| extend the B programming language to support ASCII characters.
___________________________________________________________________
(page generated 2023-03-07 23:00 UTC)