[HN Gopher] New integer types I'd like to see
___________________________________________________________________
New integer types I'd like to see
Author : ibobev
Score : 89 points
Date : 2022-09-30 09:30 UTC (13 hours ago)
(HTM) web link (www.foonathan.net)
(TXT) w3m dump (www.foonathan.net)
| dheera wrote:
| > So here's my wish: a signed integer where INT_MIN == -INT_MAX
|
| Just set INT_MIN to -127 and never use -128.
|
| > Second, you're getting symmetry back. All the operations
| mentioned above are now symmetric and can't overflow. This makes
| them a lot easier to reason about.
|
| But you seem to want the actual bits to be symmetric with the
| sign bit just being a literal + or -. This makes it easier for
| _humans_ to reason with, but it does NOT make it easier for a CPU
| 's adder to add -15 and +63, which in the current standard signed
| int implementation does not even require the CPU adder to know
| that it's a signed int.
|
| Unsigned and signed int adding work exactly the same bit-wise,
| and can use the same hardwired circuits to process at blazing
| speed. That's the beauty of the current standard implementation.
| 0xblood wrote:
| If I want to do bitwise operations on an integer, and I want to
| set/mask the MSB (maybe it's just a bitfield for me), I mask with
| INT_NAN? That also sounds very weird
| codeflo wrote:
| > Instead, let's just say arithmetic on INT_NAN is undefined
| behavior; sanitizers can then insert assertions in debug mode.
|
| I mean, if using that value is supposed to be UB anyway, then the
| current behavior if standard ints conforms to the spec. Doesn't
| that make the proposal largely pointless?
| sharikous wrote:
| From a computational point of view it would be nice to see,
| actually, a type that has no zero and is symmetric around it= ...
| -3/2,-1/2,1/2,3/2 ... It would be trivial to implement
| efficiently in hardware (as opposite to the checks required for
| INT_NAN or INT_MIN) and it would have sense in some cases (like
| some kinds of discretizations)
|
| From a general programming viewpoint pervasive bigints, ints with
| NaN/+Inf/-Inf and bitfields would be interesting too, but I don't
| know if the they are worth the complexity they introduce
| helloooooooo wrote:
| Ah so they want to replace undefined behaviour with a different
| kind of undefined behaviour
| klysm wrote:
| I'm not sure this is a particularly useful critique. I learned
| a good amount from this post.
| shultays wrote:
| Any reasons for not adding these as a library and expect it from
| the language? C++ gives you all tools for that
| pornel wrote:
| LLVM already supports types like i31 and i7, but without
| language support it may be difficult to _reliably_ convince the
| optimizer that 's what you mean.
| manwe150 wrote:
| Hardware support. It is trivial to write libraries that can
| emulate these in many languages, but the performance drop is
| going to be pretty sharp if you are coding your basic
| arithmetic by hand. Unless you are writing code for an FPGA...
| jeffffff wrote:
| the one about unsigned integers with one bit missing would be
| trivial to implement as a library in C++ with no significant
| downside. all you have to do is make a class wrapping a
| signed int and put debug checks for if the high 1 bit is set
| behind an if constexpr in operator= and the copy constructor.
| in most other languages this would bring a big performance
| penalty, but this is one thing that C++ is actually very good
| at.
| eklitzke wrote:
| This is incorrect. For one thing, you're going to need to
| overload all the operators that mutate the integer in
| place, e.g. operator++, operator +=, operator -=, operator
| *=, shifts, etc. And those checks can be quite expensive.
| For example, your code is probably littered with for loops
| that are implemented using operator++ or operator+=, and
| that means on every loop iteration you need to check for
| overflow, which is expensive if the loop body is simple.
| GCC and Clang already implement -ftrapv which does
| something similar (it adds compiler checks at all places
| where signed integers can overflow and trap if overflow
| occurs). I've used -ftrapv in debug builds but for most
| programs you don't want it in non-debug builds because the
| overhead is too high.
| shultays wrote:
| You could use the same hardware support in your library
| couldn't you? Compile flags to check your target hardware and
| enable/disable code if you have the support
| amluto wrote:
| I would much rather see most of this be generalized to a
| parametrized bounded integer type, e.g. int<lowerbound,
| upperbound>.
| jeffffff wrote:
| i think you could actually do that in C++ as a library with a
| bit of metaprogramming. the arithmetic operations will get
| weird though unless everything is the same type, for example
| what is the result type of int<-10, 10>+int<0, 20>? (the
| "right" answer is probably a megabyte of compiler errors)
| evouga wrote:
| Surely just `int`?
| IAmLiterallyAB wrote:
| Exactly. In fact, I would go far as to make this the _only_
| integer type. u16, u32 etc can just be typedefs.
|
| The ability to be precise with integer ranges could prevent
| many types of arithmetic and OOB errors.
| tialaramex wrote:
| WUFFS types can be refined as well as having a native
| representation so e.g.
|
| base.u8[0 .. =99] is a single byte with values from 0 to 99
| inclusive
|
| base.u16[0 .. =99] is a 16-bit type with the same values
|
| base.u16[100 .. =400] is still 16 bits but values from 100 to
| 400 inclusive
|
| In WUFFS all arithmetic overflow or array bounds misses are
| compile time errors. From the point of view of WUFFS if
| pixels is an array with 200 elements, and idx is a
| base.u8[0.. = 240] then pixels[idx] is a type error as it
| wouldn't make sense to index 240 into an array of 200
| elements so that doesn't compile.
|
| As well as the obvious safety implication, this also means
| WUFFS can in principle go just as fast as if you were
| inhumanly careful with a conventional low level language and
| skipped any safety checks, since it knows it already took
| care of that. In practice it transpiles to C (which doesn't
| know what safety checks are).
| astrange wrote:
| This would be a big improvement over how a lot of languages
| handle types - namely it'd let you stop calling "i16" "i32"
| types when they're actually storage sizes.
|
| "Safe" languages like Java try to achieve safety by making you
| write boilerplate casts here - "short = int + int" is illegal
| without a (short) cast even though that doesn't change the
| meaning of the program.
|
| If it was "x = i32<0,3> + i32<0,3>" then it highlights how
| silly this is and maybe they wouldn't make you write a
| (i16<0,6>) cast.
| MarkusWandel wrote:
| This is probably not what you're looking for because it might not
| be high performance. But in hardware synthesis/modeling, we use
| the so called "AC data types" and I just googled that and found
| it on Github.
|
| https://github.com/hlslibs/ac_types
|
| Arbitrary bit length, symmetrical and unsymmetrical, rounding and
| wraparound behaviour specifiable etc. etc.
| CalChris wrote:
| Maybe this is just too prosaic, but I'd like 64-bit unsigned
| integers in Java. I'd like Unums, which are not integers,
| especially the low precision variants.
| Viliam1234 wrote:
| What is the purpose of having a 64-bit unsigned integer? Could
| you tell me an example of a use case where 9x10^18 is not
| enough (so you cannot use unsigned 64-bit integer), but
| 18x10^18 will definitely be enough (so you do not need
| BigInteger)?
| [deleted]
| [deleted]
| IYasha wrote:
| Sorry, bro. Even if we have time machines, we're in the bad
| timeline.
| torstenvl wrote:
| Congratulations on inventing slower one's complement.
|
| > _I'd like to have . . . a 63 bit unsigned integer. . . .
| However, it is undefined behavior if it ever stores a negative
| value._
|
| No. There is no "debug mode" for undefined behavior. Undefined
| behavior is always undefined behavior.
|
| It also doesn't "automatically come[] with a precondition that it
| cannot be negative" - only that _if_ it 's ever negative, your
| program might crash for reasons that cannot be found in your
| code.
|
| If you really want this, use uint64_t and treat any value over
| 0x7fffffffffffffff as an error.
|
| Curious about the drive-by downvoters: Can you point out where
| you think I'm wrong?
| Kranar wrote:
| > Undefined behavior is always undefined behavior.
|
| This is not true; the standard explicitly allows
| implementations to, and I quote directly from the standard,
| "behave during translation or program execution in a documented
| manner characteristic of the environment".
|
| This idea that undefined behavior is always undefined behavior
| and there's no way to reason about it is purely academic,
| incorrect, and not in any way justified either by source
| material or in practice.
|
| As for debugging undefined behavior, UBSan [1] is an excellent
| tool that is well supported by GCC and clang, and MSVC is
| working to add support for it as well.
|
| [1] https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
| erk__ wrote:
| 63 bit integers are actually used in Ocaml which is then used
| to figure out if a value is a integer or a pointer
|
| https://blog.janestreet.com/what-is-gained-and-lost-with-63-...
| matheusmoreira wrote:
| Also used in Ruby virtual machines and no doubt many others.
| [deleted]
| jeffffff wrote:
| torstenvl is correct. UB is UB whether you are in debug mode or
| release mode. making it UB for it to store a negative value
| doesn't make sense. you could put debug checks for it behind an
| if constexpr or a macro but please C++ has enough UB already,
| don't add more. disappointing to see that this is downvoted.
| Sniffnoy wrote:
| Interestingly, in Ethereum smart contract languages (e.g.
| Solidity and Vyper), having separate types for bit vectors vs
| integers is common. I don't know why it's caught on there but
| nowhere else. Obviously those languages have a pretty specialized
| niche and aren't really going to be used outside of that context.
| zelphirkalt wrote:
| Does this really help? Sometimes being able to treat integers
| as bitvectors and vice versa is something desirable. I think
| one needs appropriate functions, which treat integers or
| bitvectors as such, but not necessarily a split in types,
| possibly requiring any kind of type conversion.
| wyldfire wrote:
| I'd like to see integer types for C/C++ that are able to saturate
| or trap on overflow - or both types. I'm pretty sure it's even
| been proposed as a C standard but was rejected or postponed.
|
| It seems like a really obvious feature that's easy to support but
| isn't.
| Symmetry wrote:
| Likewise. That's what I was expecting going in rather than
| Lovecraftian monstrosities like INT_NAN. Plus they're actually
| something that's supported in some existing ISAs.
| [deleted]
| mikewarot wrote:
| Free Pascal (fpc) supports trapping on overflow, it uses the
| overflow flag and tests it after integer operations, it's
| pretty basic. Here's the assembly output of in Int64 add
| # [16] c := a + b; movq U_$P$TEST_$$_A(%rip),%rdx
| movq U_$P$TEST_$$_B(%rip),%rax addq %rdx,%rax
| jno .Lj9 call FPC_OVERFLOW .Lj9: movq
| %rax,U_$P$TEST_$$_C(%rip)
|
| It should be easy to support in C/C++, as FPC uses the same
| LLVM backend.
| chrisseaton wrote:
| That's bizarre - it jumps on _not_ overflow? And jumps
| forward? But that's statically predicted not-taken - it'll
| statically mispredict and stall the processor every time!
| magicalhippo wrote:
| My gut feeling says this is due to FreePascal supporting
| platforms[1] that only has short conditional jumps (-128 to
| +127 bytes), like i8086[2]. The alternative on those
| platforms would be a unconditional jump after the
| conditional, to jump over the call to the error handler.
|
| On i8086 at least a non-taken conditional jump is 4 cycles
| while a unconditional jump is 15, for a total of 19 cycles,
| while a taken conditional jump is only 16. So it makes
| sense to almost always take the conditional jump.
|
| [1]:
| https://wiki.freepascal.org/Free_Pascal_supported_targets
|
| [2]: https://edge.edx.org/c4x/BITSPilani/EEE231/asset/8086_
| family... (page 2-45, or 60 in the PDF)
|
| [3]: page 2-58 in[2] or 73 in the PDF
| mikewarot wrote:
| It's the way that error handling is done in Free Pascal.
| I've never worried about the code it generates before, I
| just care about correct results. Yeah, it could be better.
| csmpltn wrote:
| That's hardware specific, right? You're not guaranteed to
| have a solution on every platform.
| rep_lodsb wrote:
| The only modern architecture which does not have overflow
| detection is RISC-V.
| ama5322 wrote:
| Interesting. Is it a deliberate decision or just an
| omission?
| snvzz wrote:
| Deliberate.
|
| RISC-V architects weighted pros and cons of having a
| flags register, and pros and cons of having overflow
| exceptions.
|
| They concluded it is best to not have flags at all
| (conditional branches do their own testing, and no flag
| dependencies need to be tracked which simplify
| superscalar implementations) nor overflow checks (flow
| breaking and costly; if you need the check, the cost of a
| software check is minimal, by design).
| [deleted]
| mikequinlan wrote:
| All platforms can detect overflow in one way or another;
| for example by checking for an unexpected sign change.
| chrisseaton wrote:
| AMD64 addressing modes cannot detect overflow, and this
| is where in practice 90% of arithmetic is done!
| mikequinlan wrote:
| I don't know specific AMD64 addressing modes, but you
| should always be able to detect overflow in signed 2s
| complement addition by checking to see if both addends
| are positive and the result is negative. There is a
| similar check for subtraction.
| chrisseaton wrote:
| Yeah, but the point is addressing modes fuse addition and
| multiplication and using the result, so there's no
| opportunity to check anything after the operations.
| mikewarot wrote:
| That can't be true. Why would a compiler use memory
| access to do math? Math operations are register based,
| none of the quantities are addresses.
|
| _How_ can you use memory addressing modes to do math,
| like 987*765?
| chrisseaton wrote:
| > That can't be true.
|
| Lol. See for yourself if you don't believe me.
|
| https://godbolt.org/z/o14f7cdsc
|
| > Why would a compiler use memory access to do math?
|
| Because there are efficient instructions for it. Have you
| written an arithmetic code generation pass? It's normal
| to do it like this.
|
| > Math operations are register based, none of the
| quantities are addresses.
|
| An address is just a number. Yeah they're in registers or
| can be immediate. So what?
|
| > How can you use memory addressing modes to do math,
| like 987*765?
|
| Because AMD64 addressing modes have a base, a scale, and
| an offset component. That's multiplying and adding.
|
| Let me know how you think you can work out if that lea
| overflowed!
| mikewarot wrote:
| Wow, that's nuts. I'd have never expected that abuse.
|
| You can use -ftrapv to check for integer overflows, like
| this
|
| https://godbolt.org/z/snbdadv8b
|
| >Let me know how you think you can work out if that lea
| overflowed!
|
| LEA (Load Effective Address) can't overflow... memory is
| supposed to wrap, the byte after the last byte is byte 0.
| chrisseaton wrote:
| Right but have you seen what that's done to your machine
| code? That's why people don't do it normally.
|
| > LEA (Load Effective Address) can't overflow...
|
| You're mistaken - it's defined to overflow.
|
| https://reverseengineering.stackexchange.com/questions/11
| 442...
| zorgmonkey wrote:
| I'm not 100% sure, but I think that the behavior of LEA
| is to wrap, but it does not set the overflow flag
| zajio1am wrote:
| Standard (signed) integers in C could be implemented to trap on
| overflow. Overflow on signed integer is undefined operation,
| like NULL dereferencing or division by zero, so trap is
| perfectly expected behavior.
|
| The issue is that unsigned integers must not trap on overflow,
| and CPUs, if they could be configured to trap on arithmetic
| overflow, usually do not have separate opcodes for overflowing
| arithmetic operations and trapping arithmetic operations.
| wyldfire wrote:
| I don't understand. It sounds like you're describing the
| existing behavior for the existing types. But I'm asking for
| new types with new behavior. The rules could be defined
| behavior for these types, which is what I want. I don't want
| types that happen-to-trap-because-it's-UB. I want a type that
| traps because that's its mission. And another type that
| saturates.
|
| I understand the existing undefined behavior, that's not
| particularly interesting. What would be interesting is well-
| defined behavior for new types that does not interfere with
| however defined or undefined behavior exists for the existing
| types.
|
| > usually do not have separate opcodes for overflowing
| arithmetic operations and trapping arithmetic operations.
|
| I don't care how good or bad the codegen is. Just emit the
| code for the right behavior, because I'm using this type that
| requires this behavior.
| adrian_b wrote:
| Most compilers, like gcc or clang, have a compile option to
| trap on integer overflow, instead of leaving the behavior to
| be undefined for such cases.
|
| For example with gcc one should always use these compile
| options, unless there exists an extremely serious reason to
| do otherwise:
|
| -fsanitize=address,undefined -fsanitize-undefined-trap-on-
| error
|
| These options provide correct behavior not only on overflows,
| but also on out-of-bounds accesses and other C undefined
| operations.
| Unklejoe wrote:
| It's fine for debug builds, but people should be aware that
| building with the address sanitizer enabled balloons the
| size of the executable and slows it down tremendously.
| adrian_b wrote:
| One should always measure to see if there is a noticeable
| slowdown or not.
|
| That is what I have meant by "an extremely serious reason
| to do otherwise".
|
| For many programs the speed is determined by things like
| the throughput or latency of accessing the main memory,
| or by I/O operations, or by interaction with an user, in
| which case the sanitize options have negligible influence
| on speed.
|
| When necessary, they should be turned off only for
| specific well tested functions whose performance is
| limited by CPU computation in registers or in cache
| memory, and they should remain on for the rest of the
| program.
| jeffbee wrote:
| ASAN can easily slow your program by 100x or worse. Your
| bias should always be that its impact would not be
| acceptable in production. If you were willing to accept
| the cost of ASAN you should have simply chose a slower,
| safer language in the first place.
| klysm wrote:
| Totally agree. Overflowing silently is almost never what you
| want. Sometimes it's useful but that's the rare case.
| beardyw wrote:
| > trap on overflow
|
| What do you see happening in that case?
| wyldfire wrote:
| The program terminates - depending on context, I suppose. I
| don't know what all toolchains/backends do but some will
| generate an unaligned load or some other architecturally
| illegal code (__builtin_trap). or you could trap to some
| handler.
| colejohnson66 wrote:
| x86 has had the `INTO` (interrupt on overflow) instruction
| (just a one byte form of `INT 4`) for decades. After an
| `ADD` or other arithmetic operation, you would `INTO` to
| trap it. However, because compilers love exploiting
| undefined behavior for speedups, they don't emit that
| instruction, so it's actually fairly slow on modern
| processors.
| wyldfire wrote:
| Right, well a new type would give them that flexibility
| because I can opt in to the trap or saturate.
| josefx wrote:
| I am not sure if I remember it correctly, but wasn't the
| problem that you had to explicitly call INTO after every
| operation? You didn't have a global flag to set or a flag
| bit in the add instruction, you had to explicitly emit
| the check. So using it carried a significant performance
| penalty over leaving the operation unchecked.
|
| > However, because compilers love exploiting undefined
| behavior for speedups
|
| Except if the behavior had been well defined any attempt
| to catch overflow would violate the standard - see
| unsigned integers for reference.
|
| > so it's actually fairly slow on modern processors.
|
| Are you sure that having a potential branch every second
| operation, that depends directly on the result of the
| preceding operation, has at any point in time not been
| horribly slow?
| colejohnson66 wrote:
| > I am not sure if I remember it correctly, but wasn't
| the problem that you had to explicitly call INTO after
| every operation? You didn't have a global flag to set or
| a flag bit in the add instruction, you had to explicitly
| emit the check. So using it carried a significant
| performance penalty over leaving the operation unchecked.
|
| Correct.
|
| > Except if the behavior had been well defined any
| attempt to catch overflow would violate the standard -
| see unsigned integers for reference.
|
| Then the compiler could emit them only on signed
| operations, but they don't.
|
| > Are you sure that having a potential branch every
| second operation, that depends directly on the result of
| the preceding operation, has at any point in time not
| been horribly slow?
|
| Yes; it's been shown that `INTO` is slower than `JO
| raiseOverflowError` because, by the nature of `INTO`
| being an interrupt, it doesn't get the advantage of
| branch prediction, but `JO` would.
|
| On an older processor without a pipeline, sure, it'd be
| just as slow as an equivalent check, but with how massive
| the pipelines are on today's processors, it's bad.
| amluto wrote:
| > Yes; it's been shown that `INTO` is slower than `JO
| raiseOverflowError` because, by the nature of `INTO`
| being an interrupt, it doesn't get the advantage of
| branch prediction, but `JO` would.
|
| A CPU could predict INTO -- simply predicting it as not
| taken would be a good guess. This would require
| designing, implementing, verifying, and maintaining, and
| it might add some area to important parts of the CPU. The
| vendors haven't seen it as a priority.
|
| (AIUI at least older CPUs couldn't predict branches in
| microcode, and INTO is presumably fully microcoded, so it
| doesn't get predicted.)
| ljosifov wrote:
| +1. Would be great to have integer NAN, INF and -INF supported
| in h/w.
| icsa wrote:
| The k language has implemented this capability (in software)
| for decades.
|
| https://code.kx.com/q/ref/#datatypes
| phkahler wrote:
| Saturating arithmetic is available on some DSPs and may be
| available to C programs via #pragma on those architectures.
| IMHO this is a burden to every ALU and instruction that does
| _not_ need the feature. We should not push application specific
| features like that down to the basic hardware. We 'd also need
| to provide language support for it.
| brandmeyer wrote:
| Saturating arithmetic is probably not all that expensive,
| since its been in every ARM SIMD unit (and sometimes also
| scalar) for many years. You just pay a little bit in latency
| when using those instructions.
| 0xblood wrote:
| I don't really understand it, wouldn't this just trade the checks
| and corner cases you have to pay attention to with different set
| of corner cases? If I have an integer that can approach INT_MIN,
| I know I have to pay attention to overflows when using it and
| this proposal helps with at most some fraction of the problems
| that can arise. And what's the great benefit, that it get's a bit
| more "elegant"? If you want to write "safe" software and have the
| compiler enforce it, go use Rust/Ada/Spark instead.
| wyager wrote:
| If you find yourself wanting any of this, just use a language
| with algebraic data types and save everyone (especially yourself)
| some pain.
| jmull wrote:
| The argument starts here:
|
| > I really don't like this asymmetry - it leads to annoying edge
| cases in all sort of integer APIs.
|
| And gets to here:
|
| > Third, you're getting an unused bit pattern 0b1'0000000, the
| old INT_MIN, which you can interpret however you like.
|
| This isn't solving the original problem. If you don't like
| annoying edge cases in all sorts of integer APIs, it's worse.
|
| edit: I can add: the article goes on to suggest that "old
| INT_MIN" should be called INT_NAN and "let's just say arithmetic
| on INT_NAN is undefined behavior". So now what's the result of a
| + b? Undefined behavior! a * b? Undefined behavior! The general
| result of any function taking an integer or returning one is
| undefined behavior!
| turminal wrote:
| Arithmetic on INT_MIN is already undefined behaviour.
| kzrdude wrote:
| In C, with int a, b then a + b is already potential UB. Just
| depends on the values of the variables. It is remarkable to
| note this.. but not new! :)
| mlatu wrote:
| the symetric signed int with int_nan i would like to have too
|
| but with the uint63_t i dont quite see the point but admit it
| would be neat having an unsigned int that cant overflow
| e63f67dd-065b wrote:
| I'm far more partial to the idea that NaN was a mistake than a
| desired feature. Including it then making operations that involve
| it UB is a road to hell and sadness, far worse than abs(INT_MIN).
|
| The idea to separate bit vectors from numbers is a good one
| though, it always seemed a little weird to me that we're passing
| around numeric types for FLAG_FOO & FLAG_BAR.
| jltsiren wrote:
| If you are doing bit manipulation that's more complicated than
| vectors of boolean flags, you are likely going to use both
| bitwise and arithmetic operations on the same data. If you need
| separate types for those operations, you will need many
| explicit casts in your code. That is usually a sign that
| something is wrong.
|
| In low-level code, integer types are not numbers. They are
| fixed-length bit fields with a numerical interpretation. One
| thing I like in Rust over C/C++ is that it makes this clear.
| You always specify the size and signedness of the integer
| explicitly, and the standard library provides many useful
| operations beyond the operators.
| adrian_b wrote:
| Already in 1964 the language PL/I (still named NPL in 1964) had
| "bit strings" as a distinct type from integer numbers.
|
| Also "character strings" were a distinct type.
|
| I also believe that these types must be clearly distinguished,
| even if conversion functions must be provided between them, and
| even if in certain contexts it may be convenient for the
| conversions to be implicit.
| gumby wrote:
| > Already in 1964 the language PL/I (still named NPL in 1964)
| had "bit strings" as a distinct type from integer numbers.
|
| That's because lots of hardware back then had support for bit
| addressing. The machine I love the best, the PDP-6/PDP-10 had
| addressable bytes of width ranging from 1-36 bits, which were
| extremely handy.
|
| The few C compilers for those machines didn't support them of
| course because the PDP-11s didn't support them, but C on the
| PDP-10 was only used for porting stuff anyway.
|
| Honestly I'm surprised they haven't been revived. Packed
| bitfields are _really_ handy and should be fast.
| kevin_thibedeau wrote:
| > I'm surprised they haven't been revived.
|
| ARM has bit addressing with their bit-banding feature where
| individual bits are mapped onto byte addresses.
| codeflo wrote:
| I mean, strongly typed flag values are a solved problem. It's
| not rocket science, just bothersome to do in C++.
| pfdietz wrote:
| Arbitrary length integers. That is, integers that actually act
| like integers.
| tialaramex wrote:
| I guess this could arguably make sense as a vocabulary type
| (provided in the standard library so that everybody agrees on
| it, not for performance reasons).
|
| Right now though you will probably get such a type for free
| with a library of common maths features you'd likely want to
| use with it. So the only question is whether there are several
| different libraries and people would prefer to mix and match.
| pfdietz wrote:
| One can get reasonable performance from them if the language
| (and ABI) is designed for it. Many implementations of lisp
| family languages, for example. Most computations will be on
| fixnums, with a fallback to bignums as required. The fixnums
| are unboxed.
|
| It's nice to require analysis for maximum performance rather
| than for correctness, since maximum performance usually
| doesn't matter but correctness usually does.
| nemo1618 wrote:
| This is being considered for Go 2:
| https://github.com/golang/go/issues/19623
|
| Personally I'm wary of it: it's not hard to think of
| adversarial scenarios where this could lead to OOM. On the
| other hand, adversarially overflowing an int can cause plenty
| of havoc too...
| dahfizz wrote:
| As long as you don't want fast code, sure.
| Zamiel_Snawley wrote:
| I agree, this could be a good solution for a lot of situations.
| People fret about the performance impact, but I'm confident
| that clever engineers could come up with a way to treat numbers
| within the typical range(say, signed 64 bit) just as they are
| today, but if an operation would go out of bounds do the
| expensive, arbitrary length operation. Emitting some kind of
| extra warning when those expensive operations are actually done
| would also be nice.
| [deleted]
| chrisshroba wrote:
| I like this feature of python, but it's wise to remember that
| operations that we typically think of as O(1) are actually
| usually dependent on the length of the input, so allowing
| arbitrary length integers can cause issues (such as DOS
| opportunities) in places some people don't think to look.
| Specifically, Python recently changed the str -> int conversion
| logic to error on strings beyond a certain length (e.g.
| `int('1'*4301)` will error.) [1]
|
| [1] https://news.ycombinator.com/item?id=32753235
| astrange wrote:
| They're also much more expensive in memory, because you have
| to separately allocate anything you don't know the size of.
|
| Something being O(N) can also be a security issue since it
| introduces a timing side channel.
|
| I don't think I've ever needed a bignum in my life or even a
| 64-bit integer (excluding pointers and file sizes). Of course
| I've used them inside black box crypto libraries but they
| have to be careful with them because of said security issues.
| pfdietz wrote:
| As implemented in lisps they typically don't use more
| memory than 64 bit longs. That's because fixnums (typically
| 61 bit signed values) are unboxed, existing in the
| "pointer" itself; only when numbers fall outside that range
| do they become bignums that must be heap allocated (and in
| practice almost all integers are fixnums.)
| tialaramex wrote:
| In the first part (balanced signed integers) I had sort of
| expected this to eventually say "Oh, this is basically how Rust's
| niches work, huh" but it doesn't. This is especially weird
| because Jonathan explicitly calls out the "new languages" like
| Carbon and whatever Herb will call his thing.
|
| Rust's Option<NonZeroU32> is _guaranteed_ to be the exact same
| size as u32 (4 bytes) and the standard library could do the same
| thing with a BalancedI32 type if it wanted to.
| TakeBlaster16 wrote:
| I was expecting that too. It seems strictly more expressive,
| since once you have the niche, you can do anything you like
| with it.
|
| And INT_MIN is a better place for it than zero imo. If Rust had
| a `BalancedI32` I would reach for it a lot more than I use the
| NonZero types. In my code at least, I've found zero is a pretty
| useful number.
| tialaramex wrote:
| AIUI The sticky problem is that the Rust compiler today wants
| to see a single contiguous range of scalar values to enable
| this trick and from its perspective e.g. BalancedI8 would
| have _two_ ranges, 0 to 127 inclusive and then 129 to 255
| inclusive, it doesn 't see the signed interpretation when
| thinking about this problem.
|
| That's clearly not impossible to solve, but if I'm correct it
| means significant compiler development work rather than just
| a fun weekend chore writing and testing a custom type
| intended to work only in the standard library.
| codeflo wrote:
| It's obviously 129 to 127 inclusive, hoping that the
| implied unsigned overflow has the correct consequences. ;)
| liminal wrote:
| Completely agree that moving bit operations to library calls and
| freeing up their syntax characters makes a lot of sense. Also
| seems impossible!
| dahfizz wrote:
| Do you feel the same about +-/*% ?
|
| I think people say this because they rarely use bitwise
| operators, and so are scared of them.
| liminal wrote:
| I'm not scared of them, but very rarely use them and believe
| that is generally true for others. Basic arithmetic operators
| that operate on numbers (as opposed to bitwise) are used much
| more frequently, so I feel very differently about them.
| samatman wrote:
| I don't, no.
|
| LuaJIT uses a bit library, while Lua 5.3 and up have the
| native bitwise operators. I prefer the former.
| gwbas1c wrote:
| > Distinct bit vectors vs integer type
|
| This. I always find bitwise operations an unneeded exercise in
| mental gymnastics.
|
| On the other hand, I wonder if a lot of the other quirks about
| numbers could be handled via a library?
| klysm wrote:
| > unneeded exercise in mental gymnastics
|
| Great characterization! I've always been confused why most
| languages don't have better tools for bit vectors. They're
| incredible common and it's really confusing to have to use the
| underlying representation of numbers to do anything with them.
| morelisp wrote:
| It is solved by the standard library.
|
| https://en.cppreference.com/w/cpp/utility/bitset
| Scaless wrote:
| It is not "solved" at all. std::bitset has terrible
| properties and an awful interface.
|
| 1. No dynamic resize, have to know size at compile time or
| allocate based on max expectations. And yes,
| std::vector<bool> sucks too.
|
| 2. Despite being only statically sized, several classes of
| bugs are not prevented at compile-time. For example:
|
| std::bitset<4> fail1{"10001"}; // This produces a value of
| 0b1000, no warnings or exceptions thrown
|
| std::bitset<10> fail2; fail2.set(11); // Exception at
| runtime. Why is this not a static_assert?
|
| 3. Size is implementation defined. std::bitset<1> can use up
| to 8 bytes depending on compiler/platform.
|
| 4. Debug performance is 20x slower than Release. In many
| cases you are going from what would be a single assembly
| instruction to multiple function calls.
|
| 5. Limited options for accessing underlying storage
| efficiently (for serialization, etc). to_ullong() will work
| up to 64 bits, but beyond that it will throw exceptions.
|
| 6. Uses exceptions. This is still a deal breaker for many.
|
| 7. Cannot toggle a range of bits at once. It's either one or
| all.
| [deleted]
| gwbas1c wrote:
| Uhm, you know that there are other languages than c++?
| morelisp wrote:
| Sure, the point is that it doesn't take language machinery
| to do it. And as the other post demonstrates, trying to
| make a standard one will mostly just get people pissed you
| made different tradeoffs.
|
| (But actually a ton of other stdlibs do have one too.)
| churnedgodotdev wrote:
| Ada solved these problems in 1983.
|
| More recently, you can use Frama-C to constrain allowable
| sequences of 0's and 1's for C types and formally verify
| correctness.
|
| In Ada since 1983 you can, e.g, declare your own 8 bit signed
| symmetric type without the wart -128 like so:
| type Sym_8 is new Integer range -127 .. 127;
|
| Then this fails at compile time: My_Signed_Byte :
| Sym_8 := -128;
|
| SPARK can prove all execution paths through your program are free
| of such constraint violations. This means safe SPARK code can
| disable runtime checks and run faster than the safest Rust/Zig
| dev settings, which insert runtime checks for over/under flow.
|
| In Frama-C, say you want a function that returns an absolute
| value. This function will fail to verify: /*@
| ensures (x >= 0 ==> \result == x) && (x < 0 ==>
| \result == -x); assigns \nothing; */ int abs
| (int x) { if (x >=0) return x;
| return -x; }
|
| It fails to verify because you might have x==INT_MIN. So this
| will verify: #include <limits.h> /*@
| requires x > INT_MIN; ensures (x >= 0 ==> \result == x)
| && (x < 0 ==> \result == -x); assigns
| \nothing; */ int abs (int x) { if (x >=0)
| return x; return -x; }
___________________________________________________________________
(page generated 2022-09-30 23:01 UTC)