hngopher.com

       [HN Gopher] Possible reasons for 8-bit bytes
       ___________________________________________________________________
        
       Possible reasons for 8-bit bytes
        
       Author : cpach
       Score  : 89 points
       Date   : 2023-03-07 13:14 UTC (9 hours ago)
        
 (HTM) web link (jvns.ca)
 (TXT) w3m dump (jvns.ca)
        
       | billpg wrote:
       | This was three or four jobs ago, but I remember reviewing
       | someone's C code and they kept different collections of char* and
       | int* pointers where they could have used a single collection of
       | void* and the handler code would have been a lot simpler.
       | 
       | The justification was that on this particular platform, char*
       | pointers were differently structured to int* pointers, because
       | char* pointers had to reference a single byte and int* pointers
       | didn't.
       | 
       | EDIT - I appear to have cut this story short. See my response to
       | "wyldfire" for the rest. Sorry for causing confusion.
        
         | dahart wrote:
         | It is true that on at least some platforms an int* that is 4
         | byte aligned is _faster_ to access than a pointer that is not
         | aligned. I don't know if there are platforms where int* is
         | assumed to be 4-byte aligned, or if the C standard allows or
         | disallows that, but it seems plausible that some compiler
         | somewhere defaulted to assuming an int* is aligned. Some
         | compilers might generate 2 load instructions for an unaligned
         | load, which incurs extra latency even if your data is already
         | in the cache line. These days usually you might use some kind
         | of alignment directive to enforce these things, which works on
         | any pointer type, but it does seem possible that the person's
         | code you reviewed wasn't incorrect to assume there's a
         | difference between those pointer types, even if there was a
         | better option.
        
         | wyldfire wrote:
         | > because char* pointers had to reference a single byte and
         | int* pointers didn't.
         | 
         | I must be missing some context or you have a typo. Probably
         | most architectures I've ever worked with had `int *` refer to a
         | register/word-sized value, and I've not yet worked with an
         | architecture that had single-byte registers.
         | 
         | Decades ago I worked on a codebase that used void * everywhere
         | and rampant casting of pointer types to and fro. It was a total
         | nightmare - the compiler was completely out of the loop and
         | runtime was the only place to find your bugs.
        
           | mlyle wrote:
           | There are architectures where all you have is word addressing
           | to memory. If you want to get a specific byte out, you need
           | to retrieve it and shift/mask yourself. In turn, a pointer to
           | a byte is a software construct rather than something there's
           | actual direct architectural support for.
        
             | loeg wrote:
             | Do C compilers for those platforms transparently implement
             | this for your char pointers as GP suggests? I would expect
             | that you would need to do it manually and that native C
             | pointers would only address the same words as the machine
             | itself.
        
               | billpg wrote:
               | Depends on how helpful the compiler is. This particular
               | compiler had an option to switch off adding in bit
               | shifting code when reading characters and instead set
               | CHAR_BIT to 32, meaning strings would have each character
               | taking up 32 bits of space. (So many zero bits, but
               | already handles emojis.)
        
               | mlyle wrote:
               | > Do C compilers for those platforms transparently
               | implement this for your char pointers as GP suggests?
               | 
               | Yes. Lots of little microcontrollers and older big
               | machines have this "feature" and C compilers fix it for
               | you.
               | 
               | There are nightmarish microcontrollers with Harvard
               | architectures and standards-compliant C compilers that
               | fix this up all behind the scenes for you. E.g. the 8051
               | is ubiquitous, and it has a Harvard architecture: there
               | are separate buses/instructions to access program memory
               | and normal data memory. The program memory is only word
               | addressable, and the data memory is byte addressable.
               | 
               | So, a "pointer" in many C environments for 8051 says what
               | bus the data is on and stashes in other bits what the
               | byte address is, if applicable. And dereferencing the
               | pointer involves a whole lot of conditional operations.
               | 
               | Then there's things like the PDP-10, where there's
               | hardware support for doing fancy things with byte
               | pointers, but the pointers still have a different format
               | than word pointers (e.g. they stash the byte offset in
               | the high bits, not the low bits).
               | 
               | The C standards makes relatively few demands upon
               | pointers so that you can do interesting things if
               | necessary for an architecture.
        
               | yurish wrote:
               | I have seen a DSP processor that could address only
               | 16-bit words. And C compiler did not fix it, bytes had 16
               | bits there.
        
               | loeg wrote:
               | Yeah, this is what I have heard of and was expecting.
               | Sibling comment says it's not universal -- some C
               | compilers for these platforms emulate byte addressing.
        
             | csense wrote:
             | x86 is byte addressable, but internally, the x86 memory bus
             | is word addressable. So an x86 CPU does the shift/mask
             | process you're referring to internally. Which means it's
             | actually slower to access (for example) a 32-bit value that
             | is not aligned to a 4-byte boundary.
             | 
             | C/C++ compilers often by default add extra bytes if
             | necessary to make sure everything's aligned. So if you have
             | struct X { int a; char b; int c; char d; } and struct Y {
             | int a; int b; char c; char d; } actually X takes up more
             | memory than Y, because X needs 6 extra bytes to align the
             | int fields to 32-bit boundaries (or 14 bytes to align to a
             | 64-bit boundary) while Y only needs 2 bytes (or 6 bytes for
             | 64-bit).
             | 
             | Meaning you can sometimes save significant amounts of
             | memory in a C/C++ program by re-ordering struct fields [1].
             | 
             | [1] http://www.catb.org/esr/structure-packing/
        
               | mlyle wrote:
               | Sure, unaligned access to memory is always expensive (on
               | architectures that allow it at all).
               | 
               | But I'm talking about retrieving the 9th to 16th bit of a
               | word, which is a little different. x86 does this just
               | fine/quickly, because bytes are addressable.
        
           | kjs3 wrote:
           | _an architecture that had single-byte registers_
           | 
           | Wild guess, but the OP might be talking about the Intel 8051.
           | Single-byte registers, and depending on the C compiler (and
           | there are a few of them) 8-bit int* pointing to the first
           | 128/256 bytes of memory, but up to 64K of (much slower)
           | memory is supported in different memory spaces with different
           | instructions and a 16-bit register called DPTR (and some
           | implementations have 2 DPTR registers). C support for these
           | additional spaces is mostly via compiler extensions analogous
           | but different from the old 8086 NEAR and FAR pointers. I'm
           | obviously greatly simplifying and leaving out a ton of
           | details.
           | 
           | Oh, yeah...on 8051 you need to support bit addressing as
           | well, at least for the 16 bytes from 20h to 2Fh. It's an odd
           | chip.
        
           | billpg wrote:
           | I forget the details (long time ago) but char* and int*
           | pointers had a different internal structure. The assembly
           | generated by the compiler when code accessed a char* pointer
           | was optimized for accessing single bytes and was very
           | different to the code generated for an int* pointer.
           | 
           | Digging deeper, this particular microcontroller was tuned for
           | accessing 32 bits at a time. Accessing individual bytes
           | needed extra bit-shuffling code to be added by the compiler.
        
             | wyldfire wrote:
             | > char* and int* pointers had a different internal
             | structure. The assembly generated by the compiler when code
             | accessed a char* pointer was optimized for accessing single
             | bytes and was very different to the code generated for an
             | int* pointer.
             | 
             | But -- they _are_ different. Architectures where they 're
             | treated the same are probably the exception. Depending on
             | what you mean by "very different" - most architectures will
             | emit different code for byte access versus word access.
        
               | billpg wrote:
               | Accessing a 32 bit word was a simple read op.
               | 
               | Accessing an 8 bit byte from a pointer, the compiler
               | would insert assembly code into the generated object
               | code. The "normal" part of the pointer would be read,
               | loading four characters into a 32 bit register. Two extra
               | bits were squirreled away somewhere in the pointer and
               | these would feed into a shift instruction so the
               | requested byte would appear in the lowest-significant 8
               | bits of the register. Finally, an AND instruction would
               | clear the top 24 bits.
        
             | leeter wrote:
             | Sounds like M68K or something similar, although Alpha AXP
             | had similar byte level access issues. A compiler on either
             | of those platforms likely would add a lot of fix up code to
             | deal with the fact they have to load the aligned (either
             | 16bit in M68K case or 32Bit IIRC in Alpha) and then do
             | bitwise and shifts depending on the pointers lower bits.
             | 
             | Raymond's blog on the Alpha https://devblogs.microsoft.com/
             | oldnewthing/20170816-00/?p=96...
        
               | monocasa wrote:
               | M68k was byte addressable just fine. Early alpha had that
               | issue though, as did later cray compilers. Alpha fixed it
               | with BWX (byte word extension). Early cray compilers
               | simply defined char as being 64bits, but later added
               | support for the shift/mask/thick pointer scheme to pack 8
               | chars in a word.
        
               | leeter wrote:
               | Must have depended on variant, the one we used in college
               | would throw a GP fault for misaligned access. It
               | literally didn't have an A0 line. That said it's been
               | over 10 years and I could be remembering the very hard
               | instruction alignment rules as applying to data too...
        
               | monocasa wrote:
               | 16bits had to be aligned. It didn't have an A0 because of
               | the 16bit pathway, but it did have byte select lines
               | (#UDS, #LDS) for when you'd move.b d0,ADDR so that
               | devices external to the CPU could see an 8-bit data
               | access if that's what you were doing.
        
       | 908B64B197 wrote:
       | > ("a word is the natural unit of data used by a particular
       | processor design") Apparently on x86 the word size is 16 bits,
       | even though the registers are 64 bits.
       | 
       | That's true for the the original x86 instruction set. IA-32 has
       | 32 bit word size and x86-64 has... you guessed it 64.
       | 
       | 16 and 32 bit registers are still retained for compatibility
       | reasons (just look at the instruction set!).
        
         | stefan_ wrote:
         | It's extra fun because not only are the registers retained,
         | they were only _extended_. So you can use their 16 and 32 bit
         | names to refer to smaller sized parts of them.
        
         | loeg wrote:
         | Some x86 categorizations would call those dwords and qwords
         | respectively.
        
         | tom_ wrote:
         | Words aeem to be always 16 bits for the x86 and derivatives -
         | see the data types section of the software developer manual.
        
           | dragonwriter wrote:
           | I think "word" _as a datatype_ evolved from its original and
           | more general computing meaning of  "the natural unit of data
           | used by a processor design" (the thing we talk about with an
           | "x-bit processor") to "16 bits" during the period of 16-bit
           | dominance and the explosion of computing that was happening
           | around it.
           | 
           | Essentially, enough stuff got written assume=ing that "word"
           | was 16 bits (and double word, quad word, etc., had their
           | obvious relationship) that even though the term had not
           | previously been fixed, it would break the world to let it
           | change, even as processors with larger word sizes (in the
           | "natural unit of data" sense) became available, then popular,
           | then dominant.
        
       | IIAOPSW wrote:
       | Binary coded decimal makes perfect sense if you're going to
       | output the value to a succession of 7 segment displays (such as
       | in a calculator). You would have to do that conversion in
       | hardware anyway. A single repeated circuit mapping 4 bits to 7
       | segments gets you the rest of the way to readable output. Now
       | that I think about it, its surprising ASCII wasn't designed
       | around ease of translation to segmented display.
        
         | kibwen wrote:
         | _> Now that I think about it, its surprising ASCII wasn 't
         | designed around ease of translation to segmented display._
         | 
         | Wikipedia has a section on the design considerations of ASCII:
         | https://en.wikipedia.org/wiki/ASCII#Design_considerations
        
           | jodrellblank wrote:
           | I love that there's a fractal world down there; from that the
           | digits 0-9 start with bit pattern 0011 and then their value
           | in binary to make easy convertion to/from BCD, and that
           | control codes Start Message and End Message were positioned
           | to maximise the Hamming distance so they're maximally
           | different and least likely to be misinterpreted as each other
           | in case of bits being mixed up, that 7-bit ASCII used on
           | 8-bit tape drives left room for a parity bit for each
           | character, that lowercase and uppercase letters differ only
           | by the toggling of a single bit, that some of the
           | digit/shift-symbol pairings date back to the first typewriter
           | with a shift key in 1878...
        
         | bregma wrote:
         | Maybe because ASCII is from the early 1960s and 7-segment
         | displays didn't become widespread until 15 years or so later.
        
         | karmakaze wrote:
         | EBCDIC 1963/64 (i.e. E-BCD-IC) was an extension of BCD to
         | support characters.
         | 
         | [0] https://en.wikipedia.org/wiki/EBCDIC
        
       | ant6n wrote:
       | Maybe another vague reason: when PCs came about in the era of the
       | 8008...8086s, 64K of RAM was like a high, reasonable amount. So
       | you need 16-bit pointers, which require exactly 2 bytes.
        
       | kleton wrote:
       | ML might benefit a lot from 10bit bytes. Accelerators have a
       | separate memory space from the CPU after all, and have their own
       | hbm dram as close as possible to the dies. In exchange, you could
       | get decent exponent size on a float10 that might not kill your
       | gradients when training a model
        
         | londons_explore wrote:
         | There seems to be as-yet no consensus on the best math
         | primitives for ML.
         | 
         | People have invented new ones for ML (eg the Brain Float16),
         | but even then some people have demonstrated training on int8 or
         | even int4.
         | 
         | There isn't even consensus on how to map the state space onto
         | the numberline - is linear (as in ints) or exponential (as in
         | floats) better? Perhaps some entirely new mapping?
         | 
         | And obviously there could be different optimal numbersystems
         | for different ML applications or different phases of training
         | or inference.
        
       | kibwen wrote:
       | The reason to have a distinction between bits and bytes in the
       | first place is so that you can have a unit of addressing that is
       | different from the smallest unit of information.
       | 
       | But what would we lose if we just got rid of the notion of bytes
       | and just let every bit be addressable?
       | 
       | To start, we'd still be able to fit the entire address space into
       | a 64-bit pointer. The maximum address space would merely be
       | reduced from 16 exabytes to 2 exabytes.
       | 
       | I presume there's some efficiency reason why we can't address
       | bits in the first place. How much does that still apply? I admit,
       | I'd just rather live in a world where I don't have to think about
       | alignment or padding ever again. :P
        
         | jecel wrote:
         | The TMS340 family used bit addresses, but pointers were 32
         | bits.
         | 
         | https://en.wikipedia.org/wiki/TMS34010
        
         | ElevenLathe wrote:
         | 64 bits of addressing is actually much more than most (any?)
         | actually-existing processors have, for the simple reason that
         | there is little demand for processors that can address 16
         | exabytes of memory and all those address lines still cost
         | money.
        
           | FullyFunctional wrote:
           | More to the point, storing the _pointers_ cost memory.
           | Switching from 32-bit to 64-bit effectively halved the caches
           | for pointer-rich programs. AMD64 was a win largely due to all
           | the things they did to compensate (including doubling the
           | number of registers).
        
         | cpleppert wrote:
         | There are a couple of efficiency reason besides the simple fact
         | that every piece of hardware in existence operates on data
         | sizes in powers of the byte. To start off with it would be
         | fantastically inefficient to build a cpu that could load
         | arbitrary bit locations so you would either be restricted to
         | loading memory locations that are some reasonable fraction of
         | the internal cache line or pay a massive performance penalty to
         | load a bit address. Realistically what would you gain by doing
         | this when the cpu would have to divide any location by eight
         | (or some other fraction) to figure out which cache line it
         | needs to load?
         | 
         | The article touches on this but having your addressable unit
         | fit a single character is incredibly convenient. If you are
         | manipulating text you will never worry about single bits in
         | isolation. Ditto for mathematical operations, do you really
         | have a need for numbers less than 255? It is a lot more
         | convenient to think about memory locations as some reasonable
         | unit that covers 99% of your computing use cases.
        
         | beecafe wrote:
         | [dead]
        
       | AdamH12113 wrote:
       | For those who are confused about bytes vs. words:
       | 
       | The formal definition of a byte is that it's the smallest
       | _addressable_ unit of memory. Think of a memory as a linear
       | string of bits. A memory address points to a specific group of
       | bits (say, 8 of them). If you add 1 to the address, the new
       | address points to the group of bits immediately after the first
       | group. The size of those bit groups is 1 byte.
       | 
       | In modern usage, "byte" has come to mean "a group of 8 bits",
       | even in situations where there is no memory addressing. This is
       | due to the overwhelming dominance of systems with 8-bit bytes.
       | Another term for a group of 8 bits is "octet", which is used in
       | e.g. the TCP standard.
       | 
       | Words are a bit fuzzier. One way to think of a word is that it's
       | the largest number of bits acted on in a single operation without
       | any special handling. The word size is typically the size of a
       | CPU register or memory bus. x86 is a little weird with its
       | register addressing, but if you look at an ARM Cortex-M you will
       | see that its general-purpose CPU registers are 32 bits wide.
       | There are instructions for working on smaller or larger units of
       | data, but if you just do a generic MOV, LDR (load), or ADD
       | instruction, you will act on 32 register bits. This is what it
       | means for 32 bits to be the "natural" unit of data. So we say
       | that an ARM Cortex-M is a 32-bit CPU, even though there are a few
       | instructions that modify 64 bits (two registers) at once.
       | 
       | Some of the fuzziness in the definition comes from the fact that
       | the sizes of the CPU registers, address space, and physical
       | address bus can all be different. The original AMD64 CPUs had
       | 64-bit registers, implemented a 48-bit address space, and brought
       | out 40 address lines. x86-64 CPUs now have 256-bit SIMD
       | instructions. "32-bit" and "64-bit" were also used as marketing
       | terms, with the definitions stretched accordingly.
       | 
       | What it comes down to is that "word" is a very old term that is
       | no longer quite as useful for describing CPUs. But memories also
       | have word sizes, and here there is a concrete definition. The
       | word size of a memory is the number of bits you can read or write
       | at once -- that is, the number of data lines brought out from the
       | memory IC.
       | 
       | (Note that a memory "word" is technically also a "byte" from the
       | memory's point of view -- it's both the natural unit of data and
       | the smallest addressable unit of data. CPU bytes are split out
       | from the memory word by the memory bus or the CPU itself. Since
       | computers are all about running software, we take the CPU's
       | perspective when talking about byte size.)
        
         | FullyFunctional wrote:
         | It's not entirely historically accurate. Early machines were
         | "word addressable" (where the words wasn't 8-bit) which by your
         | definition should have been called "byte addressable".
         | 
         | There were even bit addressable computers, but it didn't catch
         | on :)
         | 
         | If it wasn't for text, there would be nothing "natural" about
         | an 8-bit byte (but powers-of-two are natural in binary
         | computers).
        
         | fanf2 wrote:
         | In the Microsoft world, "word" generally means 16 bits, because
         | their usage dates back to the 16 bit era. Other sizes are
         | double words and quad words
         | 
         | In the ARM ARM, a word is 32 bits, because that was the Arm's
         | original word size. Other sizes are half words and double
         | words.
         | 
         | It is a very context-sensitive term.
        
           | AdamH12113 wrote:
           | >In the Microsoft world, "word" generally means 16 bits,
           | because their usage dates back to the 16 bit era. Other sizes
           | are double words and quad words
           | 
           | Ah, yes. That terminology is still used in the Windows
           | registry, although Windows 10 seems to be limited to DWORD
           | and QWORD. Probably dates back to the 286 or earlier. :-)
        
           | ajross wrote:
           | FWIW, those conventions come from Intel originally, Microsoft
           | took it from them. ARM borrowed from VAX Unix conventions,
           | who got it from DEC.
        
       | cwoolfe wrote:
       | Because humans have 10 fingers and 8 is the closest power-of-two
       | to that.
        
         | gtop3 wrote:
         | The article points out that a power of two bit count is
         | actually less important than many of us assume at first.
        
       | williamDafoe wrote:
       | I worked on the UIUC PLATO system in the 1970s : CDC-6600, 7600
       | cpus with 60-bit words. Back then everything used magnetic core
       | memory and that memory was unbelievably expensive! Sewn together
       | by women in southeast Asia, maybe $1 per word!
       | 
       | Having 6-bit bytes on a CDC was a terrific PITA! The byte size
       | was a tradeoffs between saving MONEY (RAM) and the hassle of
       | shift codes (070) used to get uppercase letters and rare symbols!
       | Once semiconductor memory began to be available (2M words of
       | 'ECS' - "extended core storage" - actually semiconductor memory -
       | was added to our 1M byte memory in ~1978) computer architects
       | could afford to burn the extra 2 bits in every word to make
       | programming easier...
       | 
       | At about the same time microprocessors like the 8008 were
       | starting to take off (1975). If the basic instruction could not
       | support a 0-100 value it would be virtually useless! There was
       | only 1 microprocessor that DID NOT use the 8-bit byte and that
       | was the 12-bit intersil 6100 which copied the pdp-8 instruction
       | set!
       | 
       | Also the invention of double precision floating point made 32-bit
       | floating point okay. From the 40s till the 70s the most critical
       | decision in computer architecture was the size of the floating
       | point word: 36, 48, 52, 60 bits ... But 32 is clearly inadequate.
       | But the idea that you could have a second larger floating point
       | fpu that handled 32 AND 64-bit words made 32-bit floating point
       | acceptable..
       | 
       | Also in the early 1970s text processing took off, partly from the
       | invention of ASCII (1963), partly from 8-bit microprocessors,
       | partly from a little known OS whose fundamental idea was that
       | characters should be the only unit of I/O (Unix -father of
       | Linux).
       | 
       | So why do we have 8-bit bytes? Thank you, Gordon Moore!
        
         | kjs3 wrote:
         | I worked on the later CDC Cyber 170/180 machines, and yeah
         | there was a C compiler (2, in fact). 60-bit words, 18-bit
         | addresses and index registers, and the choice of 5-bit or
         | 12-bit chars. The highly extended CDC Pascal dialect papered
         | over more of this weirdness and was much less torturous to use.
         | The Algol compiler was interesting as well.
         | 
         | The 180 introduced a somewhat less wild, certainly more C
         | friendly, 64-bit arch revision.
         | 
         |  _There was only 1 microprocessor that DID NOT use the 8-bit
         | byte_
         | 
         | Toshiba had a 12-bit single chip processor at one time I'm
         | pretty sure you could make a similar claim about. More of a
         | microcontroller for automotive that general purpose processor,
         | tho.
        
       | gumby wrote:
       | Author doesn't mention that several of those machines with 36-bit
       | words had byte instructions allowing you to point at particular
       | byte (your choice as to width, from 1-36 bits wide) and/or to
       | stride through memory byte by byte (so an array of 3-bit fields
       | was as easy to manipulate as any other size).
       | 
       | Also the ones I used to program (PDP-6/10/20) had an 18-bit
       | address space, which you may note is a CONS cell. In fact the
       | PDP-6 (first installed in 1964) was designed with LISP in mind
       | and several of its common instructions were LISP primitives (like
       | CAR and CDR).
        
         | drfuchs wrote:
         | Even more so, 6-bit characters were often used (supporting
         | upper case only), in order to squeeze six characters into a
         | word. Great for filenames and user id's. And for text files,
         | 7-bit was enough to get upper and lower case and all the
         | symbols, and you could pack five characters into a word. What
         | could be better?
        
           | downvotetruth wrote:
           | The obvious or most commonly occurring characters
           | [A-Za-z0-9\\. ]+ or upperloweralphanumericdotspace 6 bit
           | encoding seems absent.
        
       | samtho wrote:
       | I'm kind of disappointed that embedded computing was not
       | mentioned. It is the longest running use-case for resource
       | constrained applications and there are cases where not only are
       | you using 8-bit bytes but also an 8 bit CPU. BCD is still widely
       | used in this case to encode data to 7 segment displays or just as
       | data is relayed over the wire between chips.
        
         | williamDafoe wrote:
         | I agree completely! See my answer up above. Only 7 or 8 bits
         | makes sense for a microprocessor, not useful if you cannot
         | store 0-100 in a byte! With ASCII(1963) becoming ubiquitous,
         | the 8008 had to be 8-bits! Otherwise it would have been the
         | 7007 lol ...
        
       | [deleted]
        
       | [deleted]
        
       | moremetadata wrote:
       | > why was BCD popular?
       | 
       | https://www.truenorthfloatingpoint.com/problem
       | 
       | Floating point arithmetic has its problems.
       | 
       | [1] Ariane 5 ROCKET, Flight 501
       | 
       | [2] Vancouver Stock Exchange
       | 
       | [3] PATRIOT MISSILE FAILURE
       | 
       | [4] The sinking of the Sleipner A offshore platform
       | 
       | [1] https://en.wikipedia.org/wiki/Ariane_flight_V88
       | 
       | [2] https://en.wikipedia.org
       | /wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price
       | 
       | [3] https://www-users.cse.umn.edu/~arnold/disasters/patriot.html
       | 
       | [4] https://en.wikipedia.org/wiki/Sleipner_A#Collapse
        
         | elpocko wrote:
         | Can you elaborate? How/why is BCD a better alternative to
         | floating point arithmetic?
        
           | moremetadata wrote:
           | For the reasons others have mentioned, plus BCD doesnt suffer
           | data type issues in the same way unless the output data type
           | is wrong, but then the coder has more problems than they
           | realise.
           | 
           | The only real disadvantage for BCD is its not as quick as
           | Floating point arithmetic, or bit swapping data types, but
           | with todays faster processors, for most people I'd say the
           | slower speed of BCD is a non issue.
           | 
           | Throw in other hardware issues, like bit swapping in non ECC
           | memory and the chances of error's accumulate if not using
           | BCD.
        
           | finnh wrote:
           | floating point error. BCD guarantees you that 1/10th,
           | 1/100th, 1/100th, etc (to some configurable level) will be
           | perfectly accurate, without accumulating error during repeat
           | calculations.
           | 
           | floating point cannot do that, its precision is based on
           | powers of 2 (1/2, 1/4, 1/8, and so on). For small values (in
           | the range 0-1), there are _so many_ values represented that
           | the powers of 2 map pretty tightly to the powers of 10. But
           | as you repeat calculations, or get into larger values (say,
           | in the range 1,000,000 - 1,000,001), the floating points
           | become more sparse and errors crop up even easier.
           | 
           | For example, using 32 bit floating point values, each
           | consecutive floating point in the range 1,000,000 - 1,000,001
           | is 0.0625 away from the next.                 jshell>
           | Math.ulp((float)1_000_000)       $5 ==> 0.0625
        
             | ajross wrote:
             | As others are pointing out, decimal fidelity and "error"
             | are different things. Any fixed point mantissa
             | representation in any base has a minimal precision of one
             | unit in its last place, the question is just which numbers
             | are exactly representable and which results have only
             | inexact representations that can accumulate error.
             | 
             | BCD is attractive to human beings programming computers to
             | duplicate algorithms (generally financial ones) intended
             | for other human beings to execute using arabic numerals.
             | But it's not any more "accurate" (per transistor, it's
             | actually less accurate due to the overhead).
        
             | danbruc wrote:
             | You are confusing two things. Usually you represent decimal
             | numbers as rational fractions p/q with two integers. If you
             | fix q, you get a fixed point format, if you allow q to
             | vary, you get a floating point format. Unless you are
             | representing rational numbers you usually limit the
             | possible values of q, usually either powers of two or ten.
             | Powers of two will give you your familiar floating point
             | numbers but there are also base ten floating point numbers,
             | for example currency data types.
             | 
             | BCD is a completely different thing, instead of tightly
             | encoding an integer you encode it digit by digit wasting
             | some fraction of a bit each time but make conversion to and
             | from decimal numbers much easier. But there is no advantage
             | compared to a base ten fixed or floating point
             | representation when it comes to representable numbers.
        
             | elpocko wrote:
             | You can have infinite precision in pretty much any accurate
             | representation though, no? Where is the advantage in using
             | BCD over any other fixed point representation?
        
         | KMag wrote:
         | The Ariane bug was an overflow casting 64-bit floating point to
         | 16-bit integer. It would still have overflowed at the same
         | point if it had been 64-bit decimal floating point using the
         | same units. The integer part of the floating point number still
         | wouldn't have fit in a signed 16-bit integer.
         | 
         | As per the provided link, the Patriot missile error was 24-bit
         | fixed point arithmetic, not floating point. Granted, a fixed-
         | point representation in tenths of a second would have fixed
         | this particular problem, as would have using a clock frequency
         | that's a power of 1/2 (in Hz). Though, using a base 10
         | representation would have prevented this rounding error, it
         | would also have reduced the time before overflow.
         | 
         | I think IEEE-754r decimal floating point is a huge step
         | forward. In particular, I think there was a huge missed
         | opportunity in defining open spreadsheet formats that decimal
         | floating point option wasn't introduced.
         | 
         | However, binary floating point rounding is irrelevant to the
         | Patriot fixed-point bug.
         | 
         | It's not reasonable to expect accountants and laypeople to
         | understand binary floating point rounding. I've seen plenty of
         | programmers make goofy rounding errors in financial models and
         | trading systems. I've encountered a few developers who
         | literally believed the least significant few bits of a floating
         | point calculation are literally non-deterministic. (The best I
         | can tell, they thought spilling/loading x87 80-bit floats from
         | 64-bit stack-allocated storage resulted in whatever bits were
         | already present in the low-order bits in the x87 registers.)
        
         | pestatije wrote:
         | BCD is not floating point
        
           | coldtea wrote:
           | That's the parent's point
        
             | pflanze wrote:
             | Avoiding floating point doesn't imply BCD. Any
             | representation for integers would do fine, including
             | binary.
             | 
             | There are two reasons for BCD, (1) to avoid the cost of
             | division for conversion to human readable representation as
             | implied in the OP, (2) when used to represent floating
             | point, to avoid "odd" representations in the human format
             | resulting from the conversion (like 1/10 not shown as 0.1).
             | (2) implies floating point.
             | 
             | Eben in floating point represented using BCD you'd have
             | rounding errors when doing number calculations, that's
             | independent of the conversion to human readable formats; so
             | I don't see any reason to think that BCD would have avoided
             | any disasters unless humans were involved. BCD or not is
             | all about talking to humans, not to physics.
        
               | coldtea wrote:
               | > _Avoiding floating point doesn 't imply BCD_
               | 
               | Parent didn't say it's a logical necessity, as in "avoid
               | floating point ==> MUST use BCD".
               | 
               | Just casually mentioned that one reason BCD got popular
               | to sidestep such issues in floating point.
               | 
               | (I'm not saying that's the reason, or that it's the best
               | such option. It might even be historically untrue that
               | this was the reason - just saying the parent's statements
               | can and probably should be read like that).
        
               | pflanze wrote:
               | Sidestep which issue? The one of human representation, or
               | the problems with floating point?
               | 
               | If they _just_ want to side step problems with floating
               | point rounding targetting the physical world, they need
               | to go with integers. Choosing BCD to represent those
               | integers makes no sense at all for that purpose. All I
               | sense is a conflation of issues.
               | 
               | Also, thinking about it from a different angle, avoiding
               | issues with the physical world is one of properly
               | calculating so that rounding errors become no issues.
               | Choosing integers probably helps with that more in the
               | sense that it is making the programmer aware. Integers
               | are still discrete and you'll have rounding issues.
               | Higher precision can hide risks from rounding errors
               | becoming relevant, which is why f64 is often chosen over
               | f32. Going with an explicit resolution and range will
               | presumably (I'm not a specialist in this area) make
               | issues more upfront. Maybe at the risk of missing some
               | others (like with the Ariane rocket that blew up because
               | of a range overflow on integer numbers -- Edit: that
               | didn't happen _on_ the integer numbers though, but when
               | converting to them).
               | 
               | A BCD number representation helps over the binary
               | representation when humans are involved who shouldn't be
               | surprised by the machine having different rounding than
               | what the human is used to from base 10. And _maybe_
               | historically the cost of conversion. That 's all. (Pocket
               | calculators, and finance are the only areas I'm aware of
               | where that matters.)
               | 
               | PS. danbruc
               | (https://news.ycombinator.com/item?id=35057850) says it
               | better than me.
        
       | sargstuff wrote:
       | Modern day vaccum tube hobby take on 8 bit ascii from
       | unabstracted signal procesing point of view (pre-type punning):
       | 
       | 1920's-1950's were initially reusing prior experience/knowledge
       | of each punch card card hole as an individual electric on/off
       | switch [1],
       | 
       | Electronic relays required 4 electrical inputs [2](flow
       | control/reset done per 'end of current row hole punches)
       | 
       | 10 holes per line -> 3 relays!; 8 holes per line -> 2 relays,
       | where each relay deals with 4 bits.
       | 
       | Switching away from physical punch card media to electric/audio,
       | 7holes per line, with extra bit for indicating 'done' with
       | current set of row holes.
       | 
       | 8 holes per line needed 'software support' or make use of the
       | hardware for 3rd relay (formerly need for 10 holes in a line)
       | 
       | Numbers faster because with 6 bits, don't need 3rd relay to do
       | flow control.
       | 
       | Wonder if the pairing of binary sequence with graphic glyp could
       | be considered to be the origin of the closure concept.
       | 
       | modern day abstractions based on '4 wire relay' concept:
       | 
       | tcp/ip twisted pair
       | 
       | usb prior to 3.2 vs. usb 3.2 variable lane width
       | 
       | epci fixed lane vs. latest epci spec with variable width lane
       | 
       | -----
       | 
       | [1] : http://quadibloc.com/comp/cardint.htm
       | 
       | [2] : https://en.wikipedia.org/wiki/Vacuum_tube
        
       | PeterWhittaker wrote:
       | Or maybe it was C? http://www.catb.org/~esr/faqs/things-every-
       | hacker-once-knew/...
        
         | cpleppert wrote:
         | The transition started before C, EBCDIC was 8 bits and ASCII
         | was essentially a byte encoding. Unless you were designing some
         | exotic hardware you probably needed to handle text and that was
         | an eight bit byte. One motivation for the C type system was to
         | extend the B programming language to support ASCII characters.
        
       ___________________________________________________________________
       (page generated 2023-03-07 23:00 UTC)