[HN Gopher] New integer types I'd like to see
       ___________________________________________________________________
        
       New integer types I'd like to see
        
       Author : ibobev
       Score  : 89 points
       Date   : 2022-09-30 09:30 UTC (13 hours ago)
        
 (HTM) web link (www.foonathan.net)
 (TXT) w3m dump (www.foonathan.net)
        
       | dheera wrote:
       | > So here's my wish: a signed integer where INT_MIN == -INT_MAX
       | 
       | Just set INT_MIN to -127 and never use -128.
       | 
       | > Second, you're getting symmetry back. All the operations
       | mentioned above are now symmetric and can't overflow. This makes
       | them a lot easier to reason about.
       | 
       | But you seem to want the actual bits to be symmetric with the
       | sign bit just being a literal + or -. This makes it easier for
       | _humans_ to reason with, but it does NOT make it easier for a CPU
       | 's adder to add -15 and +63, which in the current standard signed
       | int implementation does not even require the CPU adder to know
       | that it's a signed int.
       | 
       | Unsigned and signed int adding work exactly the same bit-wise,
       | and can use the same hardwired circuits to process at blazing
       | speed. That's the beauty of the current standard implementation.
        
       | 0xblood wrote:
       | If I want to do bitwise operations on an integer, and I want to
       | set/mask the MSB (maybe it's just a bitfield for me), I mask with
       | INT_NAN? That also sounds very weird
        
       | codeflo wrote:
       | > Instead, let's just say arithmetic on INT_NAN is undefined
       | behavior; sanitizers can then insert assertions in debug mode.
       | 
       | I mean, if using that value is supposed to be UB anyway, then the
       | current behavior if standard ints conforms to the spec. Doesn't
       | that make the proposal largely pointless?
        
       | sharikous wrote:
       | From a computational point of view it would be nice to see,
       | actually, a type that has no zero and is symmetric around it= ...
       | -3/2,-1/2,1/2,3/2 ... It would be trivial to implement
       | efficiently in hardware (as opposite to the checks required for
       | INT_NAN or INT_MIN) and it would have sense in some cases (like
       | some kinds of discretizations)
       | 
       | From a general programming viewpoint pervasive bigints, ints with
       | NaN/+Inf/-Inf and bitfields would be interesting too, but I don't
       | know if the they are worth the complexity they introduce
        
       | helloooooooo wrote:
       | Ah so they want to replace undefined behaviour with a different
       | kind of undefined behaviour
        
         | klysm wrote:
         | I'm not sure this is a particularly useful critique. I learned
         | a good amount from this post.
        
       | shultays wrote:
       | Any reasons for not adding these as a library and expect it from
       | the language? C++ gives you all tools for that
        
         | pornel wrote:
         | LLVM already supports types like i31 and i7, but without
         | language support it may be difficult to _reliably_ convince the
         | optimizer that 's what you mean.
        
         | manwe150 wrote:
         | Hardware support. It is trivial to write libraries that can
         | emulate these in many languages, but the performance drop is
         | going to be pretty sharp if you are coding your basic
         | arithmetic by hand. Unless you are writing code for an FPGA...
        
           | jeffffff wrote:
           | the one about unsigned integers with one bit missing would be
           | trivial to implement as a library in C++ with no significant
           | downside. all you have to do is make a class wrapping a
           | signed int and put debug checks for if the high 1 bit is set
           | behind an if constexpr in operator= and the copy constructor.
           | in most other languages this would bring a big performance
           | penalty, but this is one thing that C++ is actually very good
           | at.
        
             | eklitzke wrote:
             | This is incorrect. For one thing, you're going to need to
             | overload all the operators that mutate the integer in
             | place, e.g. operator++, operator +=, operator -=, operator
             | *=, shifts, etc. And those checks can be quite expensive.
             | For example, your code is probably littered with for loops
             | that are implemented using operator++ or operator+=, and
             | that means on every loop iteration you need to check for
             | overflow, which is expensive if the loop body is simple.
             | GCC and Clang already implement -ftrapv which does
             | something similar (it adds compiler checks at all places
             | where signed integers can overflow and trap if overflow
             | occurs). I've used -ftrapv in debug builds but for most
             | programs you don't want it in non-debug builds because the
             | overhead is too high.
        
           | shultays wrote:
           | You could use the same hardware support in your library
           | couldn't you? Compile flags to check your target hardware and
           | enable/disable code if you have the support
        
       | amluto wrote:
       | I would much rather see most of this be generalized to a
       | parametrized bounded integer type, e.g. int<lowerbound,
       | upperbound>.
        
         | jeffffff wrote:
         | i think you could actually do that in C++ as a library with a
         | bit of metaprogramming. the arithmetic operations will get
         | weird though unless everything is the same type, for example
         | what is the result type of int<-10, 10>+int<0, 20>? (the
         | "right" answer is probably a megabyte of compiler errors)
        
           | evouga wrote:
           | Surely just `int`?
        
         | IAmLiterallyAB wrote:
         | Exactly. In fact, I would go far as to make this the _only_
         | integer type. u16, u32 etc can just be typedefs.
         | 
         | The ability to be precise with integer ranges could prevent
         | many types of arithmetic and OOB errors.
        
           | tialaramex wrote:
           | WUFFS types can be refined as well as having a native
           | representation so e.g.
           | 
           | base.u8[0 .. =99] is a single byte with values from 0 to 99
           | inclusive
           | 
           | base.u16[0 .. =99] is a 16-bit type with the same values
           | 
           | base.u16[100 .. =400] is still 16 bits but values from 100 to
           | 400 inclusive
           | 
           | In WUFFS all arithmetic overflow or array bounds misses are
           | compile time errors. From the point of view of WUFFS if
           | pixels is an array with 200 elements, and idx is a
           | base.u8[0.. = 240] then pixels[idx] is a type error as it
           | wouldn't make sense to index 240 into an array of 200
           | elements so that doesn't compile.
           | 
           | As well as the obvious safety implication, this also means
           | WUFFS can in principle go just as fast as if you were
           | inhumanly careful with a conventional low level language and
           | skipped any safety checks, since it knows it already took
           | care of that. In practice it transpiles to C (which doesn't
           | know what safety checks are).
        
         | astrange wrote:
         | This would be a big improvement over how a lot of languages
         | handle types - namely it'd let you stop calling "i16" "i32"
         | types when they're actually storage sizes.
         | 
         | "Safe" languages like Java try to achieve safety by making you
         | write boilerplate casts here - "short = int + int" is illegal
         | without a (short) cast even though that doesn't change the
         | meaning of the program.
         | 
         | If it was "x = i32<0,3> + i32<0,3>" then it highlights how
         | silly this is and maybe they wouldn't make you write a
         | (i16<0,6>) cast.
        
       | MarkusWandel wrote:
       | This is probably not what you're looking for because it might not
       | be high performance. But in hardware synthesis/modeling, we use
       | the so called "AC data types" and I just googled that and found
       | it on Github.
       | 
       | https://github.com/hlslibs/ac_types
       | 
       | Arbitrary bit length, symmetrical and unsymmetrical, rounding and
       | wraparound behaviour specifiable etc. etc.
        
       | CalChris wrote:
       | Maybe this is just too prosaic, but I'd like 64-bit unsigned
       | integers in Java. I'd like Unums, which are not integers,
       | especially the low precision variants.
        
         | Viliam1234 wrote:
         | What is the purpose of having a 64-bit unsigned integer? Could
         | you tell me an example of a use case where 9x10^18 is not
         | enough (so you cannot use unsigned 64-bit integer), but
         | 18x10^18 will definitely be enough (so you do not need
         | BigInteger)?
        
       | [deleted]
        
       | [deleted]
        
       | IYasha wrote:
       | Sorry, bro. Even if we have time machines, we're in the bad
       | timeline.
        
       | torstenvl wrote:
       | Congratulations on inventing slower one's complement.
       | 
       | > _I'd like to have . . . a 63 bit unsigned integer. . . .
       | However, it is undefined behavior if it ever stores a negative
       | value._
       | 
       | No. There is no "debug mode" for undefined behavior. Undefined
       | behavior is always undefined behavior.
       | 
       | It also doesn't "automatically come[] with a precondition that it
       | cannot be negative" - only that _if_ it 's ever negative, your
       | program might crash for reasons that cannot be found in your
       | code.
       | 
       | If you really want this, use uint64_t and treat any value over
       | 0x7fffffffffffffff as an error.
       | 
       | Curious about the drive-by downvoters: Can you point out where
       | you think I'm wrong?
        
         | Kranar wrote:
         | > Undefined behavior is always undefined behavior.
         | 
         | This is not true; the standard explicitly allows
         | implementations to, and I quote directly from the standard,
         | "behave during translation or program execution in a documented
         | manner characteristic of the environment".
         | 
         | This idea that undefined behavior is always undefined behavior
         | and there's no way to reason about it is purely academic,
         | incorrect, and not in any way justified either by source
         | material or in practice.
         | 
         | As for debugging undefined behavior, UBSan [1] is an excellent
         | tool that is well supported by GCC and clang, and MSVC is
         | working to add support for it as well.
         | 
         | [1] https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
        
         | erk__ wrote:
         | 63 bit integers are actually used in Ocaml which is then used
         | to figure out if a value is a integer or a pointer
         | 
         | https://blog.janestreet.com/what-is-gained-and-lost-with-63-...
        
           | matheusmoreira wrote:
           | Also used in Ruby virtual machines and no doubt many others.
        
           | [deleted]
        
         | jeffffff wrote:
         | torstenvl is correct. UB is UB whether you are in debug mode or
         | release mode. making it UB for it to store a negative value
         | doesn't make sense. you could put debug checks for it behind an
         | if constexpr or a macro but please C++ has enough UB already,
         | don't add more. disappointing to see that this is downvoted.
        
       | Sniffnoy wrote:
       | Interestingly, in Ethereum smart contract languages (e.g.
       | Solidity and Vyper), having separate types for bit vectors vs
       | integers is common. I don't know why it's caught on there but
       | nowhere else. Obviously those languages have a pretty specialized
       | niche and aren't really going to be used outside of that context.
        
         | zelphirkalt wrote:
         | Does this really help? Sometimes being able to treat integers
         | as bitvectors and vice versa is something desirable. I think
         | one needs appropriate functions, which treat integers or
         | bitvectors as such, but not necessarily a split in types,
         | possibly requiring any kind of type conversion.
        
       | wyldfire wrote:
       | I'd like to see integer types for C/C++ that are able to saturate
       | or trap on overflow - or both types. I'm pretty sure it's even
       | been proposed as a C standard but was rejected or postponed.
       | 
       | It seems like a really obvious feature that's easy to support but
       | isn't.
        
         | Symmetry wrote:
         | Likewise. That's what I was expecting going in rather than
         | Lovecraftian monstrosities like INT_NAN. Plus they're actually
         | something that's supported in some existing ISAs.
        
           | [deleted]
        
         | mikewarot wrote:
         | Free Pascal (fpc) supports trapping on overflow, it uses the
         | overflow flag and tests it after integer operations, it's
         | pretty basic. Here's the assembly output of in Int64 add
         | # [16] c := a + b;         movq U_$P$TEST_$$_A(%rip),%rdx
         | movq U_$P$TEST_$$_B(%rip),%rax         addq %rdx,%rax
         | jno .Lj9         call FPC_OVERFLOW       .Lj9:         movq
         | %rax,U_$P$TEST_$$_C(%rip)
         | 
         | It should be easy to support in C/C++, as FPC uses the same
         | LLVM backend.
        
           | chrisseaton wrote:
           | That's bizarre - it jumps on _not_ overflow? And jumps
           | forward? But that's statically predicted not-taken - it'll
           | statically mispredict and stall the processor every time!
        
             | magicalhippo wrote:
             | My gut feeling says this is due to FreePascal supporting
             | platforms[1] that only has short conditional jumps (-128 to
             | +127 bytes), like i8086[2]. The alternative on those
             | platforms would be a unconditional jump after the
             | conditional, to jump over the call to the error handler.
             | 
             | On i8086 at least a non-taken conditional jump is 4 cycles
             | while a unconditional jump is 15, for a total of 19 cycles,
             | while a taken conditional jump is only 16. So it makes
             | sense to almost always take the conditional jump.
             | 
             | [1]:
             | https://wiki.freepascal.org/Free_Pascal_supported_targets
             | 
             | [2]: https://edge.edx.org/c4x/BITSPilani/EEE231/asset/8086_
             | family... (page 2-45, or 60 in the PDF)
             | 
             | [3]: page 2-58 in[2] or 73 in the PDF
        
             | mikewarot wrote:
             | It's the way that error handling is done in Free Pascal.
             | I've never worried about the code it generates before, I
             | just care about correct results. Yeah, it could be better.
        
           | csmpltn wrote:
           | That's hardware specific, right? You're not guaranteed to
           | have a solution on every platform.
        
             | rep_lodsb wrote:
             | The only modern architecture which does not have overflow
             | detection is RISC-V.
        
               | ama5322 wrote:
               | Interesting. Is it a deliberate decision or just an
               | omission?
        
               | snvzz wrote:
               | Deliberate.
               | 
               | RISC-V architects weighted pros and cons of having a
               | flags register, and pros and cons of having overflow
               | exceptions.
               | 
               | They concluded it is best to not have flags at all
               | (conditional branches do their own testing, and no flag
               | dependencies need to be tracked which simplify
               | superscalar implementations) nor overflow checks (flow
               | breaking and costly; if you need the check, the cost of a
               | software check is minimal, by design).
        
               | [deleted]
        
             | mikequinlan wrote:
             | All platforms can detect overflow in one way or another;
             | for example by checking for an unexpected sign change.
        
               | chrisseaton wrote:
               | AMD64 addressing modes cannot detect overflow, and this
               | is where in practice 90% of arithmetic is done!
        
               | mikequinlan wrote:
               | I don't know specific AMD64 addressing modes, but you
               | should always be able to detect overflow in signed 2s
               | complement addition by checking to see if both addends
               | are positive and the result is negative. There is a
               | similar check for subtraction.
        
               | chrisseaton wrote:
               | Yeah, but the point is addressing modes fuse addition and
               | multiplication and using the result, so there's no
               | opportunity to check anything after the operations.
        
               | mikewarot wrote:
               | That can't be true. Why would a compiler use memory
               | access to do math? Math operations are register based,
               | none of the quantities are addresses.
               | 
               |  _How_ can you use memory addressing modes to do math,
               | like 987*765?
        
               | chrisseaton wrote:
               | > That can't be true.
               | 
               | Lol. See for yourself if you don't believe me.
               | 
               | https://godbolt.org/z/o14f7cdsc
               | 
               | > Why would a compiler use memory access to do math?
               | 
               | Because there are efficient instructions for it. Have you
               | written an arithmetic code generation pass? It's normal
               | to do it like this.
               | 
               | > Math operations are register based, none of the
               | quantities are addresses.
               | 
               | An address is just a number. Yeah they're in registers or
               | can be immediate. So what?
               | 
               | > How can you use memory addressing modes to do math,
               | like 987*765?
               | 
               | Because AMD64 addressing modes have a base, a scale, and
               | an offset component. That's multiplying and adding.
               | 
               | Let me know how you think you can work out if that lea
               | overflowed!
        
               | mikewarot wrote:
               | Wow, that's nuts. I'd have never expected that abuse.
               | 
               | You can use -ftrapv to check for integer overflows, like
               | this
               | 
               | https://godbolt.org/z/snbdadv8b
               | 
               | >Let me know how you think you can work out if that lea
               | overflowed!
               | 
               | LEA (Load Effective Address) can't overflow... memory is
               | supposed to wrap, the byte after the last byte is byte 0.
        
               | chrisseaton wrote:
               | Right but have you seen what that's done to your machine
               | code? That's why people don't do it normally.
               | 
               | > LEA (Load Effective Address) can't overflow...
               | 
               | You're mistaken - it's defined to overflow.
               | 
               | https://reverseengineering.stackexchange.com/questions/11
               | 442...
        
               | zorgmonkey wrote:
               | I'm not 100% sure, but I think that the behavior of LEA
               | is to wrap, but it does not set the overflow flag
        
         | zajio1am wrote:
         | Standard (signed) integers in C could be implemented to trap on
         | overflow. Overflow on signed integer is undefined operation,
         | like NULL dereferencing or division by zero, so trap is
         | perfectly expected behavior.
         | 
         | The issue is that unsigned integers must not trap on overflow,
         | and CPUs, if they could be configured to trap on arithmetic
         | overflow, usually do not have separate opcodes for overflowing
         | arithmetic operations and trapping arithmetic operations.
        
           | wyldfire wrote:
           | I don't understand. It sounds like you're describing the
           | existing behavior for the existing types. But I'm asking for
           | new types with new behavior. The rules could be defined
           | behavior for these types, which is what I want. I don't want
           | types that happen-to-trap-because-it's-UB. I want a type that
           | traps because that's its mission. And another type that
           | saturates.
           | 
           | I understand the existing undefined behavior, that's not
           | particularly interesting. What would be interesting is well-
           | defined behavior for new types that does not interfere with
           | however defined or undefined behavior exists for the existing
           | types.
           | 
           | > usually do not have separate opcodes for overflowing
           | arithmetic operations and trapping arithmetic operations.
           | 
           | I don't care how good or bad the codegen is. Just emit the
           | code for the right behavior, because I'm using this type that
           | requires this behavior.
        
           | adrian_b wrote:
           | Most compilers, like gcc or clang, have a compile option to
           | trap on integer overflow, instead of leaving the behavior to
           | be undefined for such cases.
           | 
           | For example with gcc one should always use these compile
           | options, unless there exists an extremely serious reason to
           | do otherwise:
           | 
           | -fsanitize=address,undefined -fsanitize-undefined-trap-on-
           | error
           | 
           | These options provide correct behavior not only on overflows,
           | but also on out-of-bounds accesses and other C undefined
           | operations.
        
             | Unklejoe wrote:
             | It's fine for debug builds, but people should be aware that
             | building with the address sanitizer enabled balloons the
             | size of the executable and slows it down tremendously.
        
               | adrian_b wrote:
               | One should always measure to see if there is a noticeable
               | slowdown or not.
               | 
               | That is what I have meant by "an extremely serious reason
               | to do otherwise".
               | 
               | For many programs the speed is determined by things like
               | the throughput or latency of accessing the main memory,
               | or by I/O operations, or by interaction with an user, in
               | which case the sanitize options have negligible influence
               | on speed.
               | 
               | When necessary, they should be turned off only for
               | specific well tested functions whose performance is
               | limited by CPU computation in registers or in cache
               | memory, and they should remain on for the rest of the
               | program.
        
               | jeffbee wrote:
               | ASAN can easily slow your program by 100x or worse. Your
               | bias should always be that its impact would not be
               | acceptable in production. If you were willing to accept
               | the cost of ASAN you should have simply chose a slower,
               | safer language in the first place.
        
         | klysm wrote:
         | Totally agree. Overflowing silently is almost never what you
         | want. Sometimes it's useful but that's the rare case.
        
         | beardyw wrote:
         | > trap on overflow
         | 
         | What do you see happening in that case?
        
           | wyldfire wrote:
           | The program terminates - depending on context, I suppose. I
           | don't know what all toolchains/backends do but some will
           | generate an unaligned load or some other architecturally
           | illegal code (__builtin_trap). or you could trap to some
           | handler.
        
             | colejohnson66 wrote:
             | x86 has had the `INTO` (interrupt on overflow) instruction
             | (just a one byte form of `INT 4`) for decades. After an
             | `ADD` or other arithmetic operation, you would `INTO` to
             | trap it. However, because compilers love exploiting
             | undefined behavior for speedups, they don't emit that
             | instruction, so it's actually fairly slow on modern
             | processors.
        
               | wyldfire wrote:
               | Right, well a new type would give them that flexibility
               | because I can opt in to the trap or saturate.
        
               | josefx wrote:
               | I am not sure if I remember it correctly, but wasn't the
               | problem that you had to explicitly call INTO after every
               | operation? You didn't have a global flag to set or a flag
               | bit in the add instruction, you had to explicitly emit
               | the check. So using it carried a significant performance
               | penalty over leaving the operation unchecked.
               | 
               | > However, because compilers love exploiting undefined
               | behavior for speedups
               | 
               | Except if the behavior had been well defined any attempt
               | to catch overflow would violate the standard - see
               | unsigned integers for reference.
               | 
               | > so it's actually fairly slow on modern processors.
               | 
               | Are you sure that having a potential branch every second
               | operation, that depends directly on the result of the
               | preceding operation, has at any point in time not been
               | horribly slow?
        
               | colejohnson66 wrote:
               | > I am not sure if I remember it correctly, but wasn't
               | the problem that you had to explicitly call INTO after
               | every operation? You didn't have a global flag to set or
               | a flag bit in the add instruction, you had to explicitly
               | emit the check. So using it carried a significant
               | performance penalty over leaving the operation unchecked.
               | 
               | Correct.
               | 
               | > Except if the behavior had been well defined any
               | attempt to catch overflow would violate the standard -
               | see unsigned integers for reference.
               | 
               | Then the compiler could emit them only on signed
               | operations, but they don't.
               | 
               | > Are you sure that having a potential branch every
               | second operation, that depends directly on the result of
               | the preceding operation, has at any point in time not
               | been horribly slow?
               | 
               | Yes; it's been shown that `INTO` is slower than `JO
               | raiseOverflowError` because, by the nature of `INTO`
               | being an interrupt, it doesn't get the advantage of
               | branch prediction, but `JO` would.
               | 
               | On an older processor without a pipeline, sure, it'd be
               | just as slow as an equivalent check, but with how massive
               | the pipelines are on today's processors, it's bad.
        
               | amluto wrote:
               | > Yes; it's been shown that `INTO` is slower than `JO
               | raiseOverflowError` because, by the nature of `INTO`
               | being an interrupt, it doesn't get the advantage of
               | branch prediction, but `JO` would.
               | 
               | A CPU could predict INTO -- simply predicting it as not
               | taken would be a good guess. This would require
               | designing, implementing, verifying, and maintaining, and
               | it might add some area to important parts of the CPU. The
               | vendors haven't seen it as a priority.
               | 
               | (AIUI at least older CPUs couldn't predict branches in
               | microcode, and INTO is presumably fully microcoded, so it
               | doesn't get predicted.)
        
         | ljosifov wrote:
         | +1. Would be great to have integer NAN, INF and -INF supported
         | in h/w.
        
           | icsa wrote:
           | The k language has implemented this capability (in software)
           | for decades.
           | 
           | https://code.kx.com/q/ref/#datatypes
        
         | phkahler wrote:
         | Saturating arithmetic is available on some DSPs and may be
         | available to C programs via #pragma on those architectures.
         | IMHO this is a burden to every ALU and instruction that does
         | _not_ need the feature. We should not push application specific
         | features like that down to the basic hardware. We 'd also need
         | to provide language support for it.
        
           | brandmeyer wrote:
           | Saturating arithmetic is probably not all that expensive,
           | since its been in every ARM SIMD unit (and sometimes also
           | scalar) for many years. You just pay a little bit in latency
           | when using those instructions.
        
       | 0xblood wrote:
       | I don't really understand it, wouldn't this just trade the checks
       | and corner cases you have to pay attention to with different set
       | of corner cases? If I have an integer that can approach INT_MIN,
       | I know I have to pay attention to overflows when using it and
       | this proposal helps with at most some fraction of the problems
       | that can arise. And what's the great benefit, that it get's a bit
       | more "elegant"? If you want to write "safe" software and have the
       | compiler enforce it, go use Rust/Ada/Spark instead.
        
       | wyager wrote:
       | If you find yourself wanting any of this, just use a language
       | with algebraic data types and save everyone (especially yourself)
       | some pain.
        
       | jmull wrote:
       | The argument starts here:
       | 
       | > I really don't like this asymmetry - it leads to annoying edge
       | cases in all sort of integer APIs.
       | 
       | And gets to here:
       | 
       | > Third, you're getting an unused bit pattern 0b1'0000000, the
       | old INT_MIN, which you can interpret however you like.
       | 
       | This isn't solving the original problem. If you don't like
       | annoying edge cases in all sorts of integer APIs, it's worse.
       | 
       | edit: I can add: the article goes on to suggest that "old
       | INT_MIN" should be called INT_NAN and "let's just say arithmetic
       | on INT_NAN is undefined behavior". So now what's the result of a
       | + b? Undefined behavior! a * b? Undefined behavior! The general
       | result of any function taking an integer or returning one is
       | undefined behavior!
        
         | turminal wrote:
         | Arithmetic on INT_MIN is already undefined behaviour.
        
         | kzrdude wrote:
         | In C, with int a, b then a + b is already potential UB. Just
         | depends on the values of the variables. It is remarkable to
         | note this.. but not new! :)
        
       | mlatu wrote:
       | the symetric signed int with int_nan i would like to have too
       | 
       | but with the uint63_t i dont quite see the point but admit it
       | would be neat having an unsigned int that cant overflow
        
       | e63f67dd-065b wrote:
       | I'm far more partial to the idea that NaN was a mistake than a
       | desired feature. Including it then making operations that involve
       | it UB is a road to hell and sadness, far worse than abs(INT_MIN).
       | 
       | The idea to separate bit vectors from numbers is a good one
       | though, it always seemed a little weird to me that we're passing
       | around numeric types for FLAG_FOO & FLAG_BAR.
        
         | jltsiren wrote:
         | If you are doing bit manipulation that's more complicated than
         | vectors of boolean flags, you are likely going to use both
         | bitwise and arithmetic operations on the same data. If you need
         | separate types for those operations, you will need many
         | explicit casts in your code. That is usually a sign that
         | something is wrong.
         | 
         | In low-level code, integer types are not numbers. They are
         | fixed-length bit fields with a numerical interpretation. One
         | thing I like in Rust over C/C++ is that it makes this clear.
         | You always specify the size and signedness of the integer
         | explicitly, and the standard library provides many useful
         | operations beyond the operators.
        
         | adrian_b wrote:
         | Already in 1964 the language PL/I (still named NPL in 1964) had
         | "bit strings" as a distinct type from integer numbers.
         | 
         | Also "character strings" were a distinct type.
         | 
         | I also believe that these types must be clearly distinguished,
         | even if conversion functions must be provided between them, and
         | even if in certain contexts it may be convenient for the
         | conversions to be implicit.
        
           | gumby wrote:
           | > Already in 1964 the language PL/I (still named NPL in 1964)
           | had "bit strings" as a distinct type from integer numbers.
           | 
           | That's because lots of hardware back then had support for bit
           | addressing. The machine I love the best, the PDP-6/PDP-10 had
           | addressable bytes of width ranging from 1-36 bits, which were
           | extremely handy.
           | 
           | The few C compilers for those machines didn't support them of
           | course because the PDP-11s didn't support them, but C on the
           | PDP-10 was only used for porting stuff anyway.
           | 
           | Honestly I'm surprised they haven't been revived. Packed
           | bitfields are _really_ handy and should be fast.
        
             | kevin_thibedeau wrote:
             | > I'm surprised they haven't been revived.
             | 
             | ARM has bit addressing with their bit-banding feature where
             | individual bits are mapped onto byte addresses.
        
         | codeflo wrote:
         | I mean, strongly typed flag values are a solved problem. It's
         | not rocket science, just bothersome to do in C++.
        
       | pfdietz wrote:
       | Arbitrary length integers. That is, integers that actually act
       | like integers.
        
         | tialaramex wrote:
         | I guess this could arguably make sense as a vocabulary type
         | (provided in the standard library so that everybody agrees on
         | it, not for performance reasons).
         | 
         | Right now though you will probably get such a type for free
         | with a library of common maths features you'd likely want to
         | use with it. So the only question is whether there are several
         | different libraries and people would prefer to mix and match.
        
           | pfdietz wrote:
           | One can get reasonable performance from them if the language
           | (and ABI) is designed for it. Many implementations of lisp
           | family languages, for example. Most computations will be on
           | fixnums, with a fallback to bignums as required. The fixnums
           | are unboxed.
           | 
           | It's nice to require analysis for maximum performance rather
           | than for correctness, since maximum performance usually
           | doesn't matter but correctness usually does.
        
         | nemo1618 wrote:
         | This is being considered for Go 2:
         | https://github.com/golang/go/issues/19623
         | 
         | Personally I'm wary of it: it's not hard to think of
         | adversarial scenarios where this could lead to OOM. On the
         | other hand, adversarially overflowing an int can cause plenty
         | of havoc too...
        
         | dahfizz wrote:
         | As long as you don't want fast code, sure.
        
         | Zamiel_Snawley wrote:
         | I agree, this could be a good solution for a lot of situations.
         | People fret about the performance impact, but I'm confident
         | that clever engineers could come up with a way to treat numbers
         | within the typical range(say, signed 64 bit) just as they are
         | today, but if an operation would go out of bounds do the
         | expensive, arbitrary length operation. Emitting some kind of
         | extra warning when those expensive operations are actually done
         | would also be nice.
        
         | [deleted]
        
         | chrisshroba wrote:
         | I like this feature of python, but it's wise to remember that
         | operations that we typically think of as O(1) are actually
         | usually dependent on the length of the input, so allowing
         | arbitrary length integers can cause issues (such as DOS
         | opportunities) in places some people don't think to look.
         | Specifically, Python recently changed the str -> int conversion
         | logic to error on strings beyond a certain length (e.g.
         | `int('1'*4301)` will error.) [1]
         | 
         | [1] https://news.ycombinator.com/item?id=32753235
        
           | astrange wrote:
           | They're also much more expensive in memory, because you have
           | to separately allocate anything you don't know the size of.
           | 
           | Something being O(N) can also be a security issue since it
           | introduces a timing side channel.
           | 
           | I don't think I've ever needed a bignum in my life or even a
           | 64-bit integer (excluding pointers and file sizes). Of course
           | I've used them inside black box crypto libraries but they
           | have to be careful with them because of said security issues.
        
             | pfdietz wrote:
             | As implemented in lisps they typically don't use more
             | memory than 64 bit longs. That's because fixnums (typically
             | 61 bit signed values) are unboxed, existing in the
             | "pointer" itself; only when numbers fall outside that range
             | do they become bignums that must be heap allocated (and in
             | practice almost all integers are fixnums.)
        
       | tialaramex wrote:
       | In the first part (balanced signed integers) I had sort of
       | expected this to eventually say "Oh, this is basically how Rust's
       | niches work, huh" but it doesn't. This is especially weird
       | because Jonathan explicitly calls out the "new languages" like
       | Carbon and whatever Herb will call his thing.
       | 
       | Rust's Option<NonZeroU32> is _guaranteed_ to be the exact same
       | size as u32 (4 bytes) and the standard library could do the same
       | thing with a BalancedI32 type if it wanted to.
        
         | TakeBlaster16 wrote:
         | I was expecting that too. It seems strictly more expressive,
         | since once you have the niche, you can do anything you like
         | with it.
         | 
         | And INT_MIN is a better place for it than zero imo. If Rust had
         | a `BalancedI32` I would reach for it a lot more than I use the
         | NonZero types. In my code at least, I've found zero is a pretty
         | useful number.
        
           | tialaramex wrote:
           | AIUI The sticky problem is that the Rust compiler today wants
           | to see a single contiguous range of scalar values to enable
           | this trick and from its perspective e.g. BalancedI8 would
           | have _two_ ranges, 0 to 127 inclusive and then 129 to 255
           | inclusive, it doesn 't see the signed interpretation when
           | thinking about this problem.
           | 
           | That's clearly not impossible to solve, but if I'm correct it
           | means significant compiler development work rather than just
           | a fun weekend chore writing and testing a custom type
           | intended to work only in the standard library.
        
             | codeflo wrote:
             | It's obviously 129 to 127 inclusive, hoping that the
             | implied unsigned overflow has the correct consequences. ;)
        
       | liminal wrote:
       | Completely agree that moving bit operations to library calls and
       | freeing up their syntax characters makes a lot of sense. Also
       | seems impossible!
        
         | dahfizz wrote:
         | Do you feel the same about +-/*% ?
         | 
         | I think people say this because they rarely use bitwise
         | operators, and so are scared of them.
        
           | liminal wrote:
           | I'm not scared of them, but very rarely use them and believe
           | that is generally true for others. Basic arithmetic operators
           | that operate on numbers (as opposed to bitwise) are used much
           | more frequently, so I feel very differently about them.
        
           | samatman wrote:
           | I don't, no.
           | 
           | LuaJIT uses a bit library, while Lua 5.3 and up have the
           | native bitwise operators. I prefer the former.
        
       | gwbas1c wrote:
       | > Distinct bit vectors vs integer type
       | 
       | This. I always find bitwise operations an unneeded exercise in
       | mental gymnastics.
       | 
       | On the other hand, I wonder if a lot of the other quirks about
       | numbers could be handled via a library?
        
         | klysm wrote:
         | > unneeded exercise in mental gymnastics
         | 
         | Great characterization! I've always been confused why most
         | languages don't have better tools for bit vectors. They're
         | incredible common and it's really confusing to have to use the
         | underlying representation of numbers to do anything with them.
        
         | morelisp wrote:
         | It is solved by the standard library.
         | 
         | https://en.cppreference.com/w/cpp/utility/bitset
        
           | Scaless wrote:
           | It is not "solved" at all. std::bitset has terrible
           | properties and an awful interface.
           | 
           | 1. No dynamic resize, have to know size at compile time or
           | allocate based on max expectations. And yes,
           | std::vector<bool> sucks too.
           | 
           | 2. Despite being only statically sized, several classes of
           | bugs are not prevented at compile-time. For example:
           | 
           | std::bitset<4> fail1{"10001"}; // This produces a value of
           | 0b1000, no warnings or exceptions thrown
           | 
           | std::bitset<10> fail2; fail2.set(11); // Exception at
           | runtime. Why is this not a static_assert?
           | 
           | 3. Size is implementation defined. std::bitset<1> can use up
           | to 8 bytes depending on compiler/platform.
           | 
           | 4. Debug performance is 20x slower than Release. In many
           | cases you are going from what would be a single assembly
           | instruction to multiple function calls.
           | 
           | 5. Limited options for accessing underlying storage
           | efficiently (for serialization, etc). to_ullong() will work
           | up to 64 bits, but beyond that it will throw exceptions.
           | 
           | 6. Uses exceptions. This is still a deal breaker for many.
           | 
           | 7. Cannot toggle a range of bits at once. It's either one or
           | all.
        
             | [deleted]
        
           | gwbas1c wrote:
           | Uhm, you know that there are other languages than c++?
        
             | morelisp wrote:
             | Sure, the point is that it doesn't take language machinery
             | to do it. And as the other post demonstrates, trying to
             | make a standard one will mostly just get people pissed you
             | made different tradeoffs.
             | 
             | (But actually a ton of other stdlibs do have one too.)
        
       | churnedgodotdev wrote:
       | Ada solved these problems in 1983.
       | 
       | More recently, you can use Frama-C to constrain allowable
       | sequences of 0's and 1's for C types and formally verify
       | correctness.
       | 
       | In Ada since 1983 you can, e.g, declare your own 8 bit signed
       | symmetric type without the wart -128 like so:
       | type Sym_8 is new Integer range -127 .. 127;
       | 
       | Then this fails at compile time:                 My_Signed_Byte :
       | Sym_8 := -128;
       | 
       | SPARK can prove all execution paths through your program are free
       | of such constraint violations. This means safe SPARK code can
       | disable runtime checks and run faster than the safest Rust/Zig
       | dev settings, which insert runtime checks for over/under flow.
       | 
       | In Frama-C, say you want a function that returns an absolute
       | value. This function will fail to verify:                 /*@
       | ensures (x >= 0 ==> \result == x) &&                   (x < 0 ==>
       | \result == -x);           assigns \nothing;  */       int abs
       | (int x) {           if (x >=0)               return x;
       | return -x;       }
       | 
       | It fails to verify because you might have x==INT_MIN. So this
       | will verify:                 #include <limits.h>       /*@
       | requires x > INT_MIN;           ensures (x >= 0 ==> \result == x)
       | &&                   (x < 0 ==> \result == -x);           assigns
       | \nothing;  */       int abs (int x) {           if (x >=0)
       | return x;           return -x;       }
        
       ___________________________________________________________________
       (page generated 2022-09-30 23:01 UTC)