[HN Gopher] Go 1.21 may have a clear(x) builtin
       ___________________________________________________________________
        
       Go 1.21 may have a clear(x) builtin
        
       Author : aviramha
       Score  : 79 points
       Date   : 2022-11-19 18:33 UTC (4 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | unwind wrote:
       | That was weird. Makes sense once you know about NaN's
       | incomparability, but still surprising.
       | 
       | I had to try it in Python (3.10.4, Windows on x86), it worked
       | fine:                   >>> import math         >>> a={math.nan:
       | 1}         >>> a         {nan: 1}         >>> del(a[math.nan])
       | >>> a         {}
        
         | masklinn wrote:
         | It's because cpython shortcuts through identity:
         | https://github.com/python/cpython/blob/main/Objects/object.c...
         | /* Quick result when objects are the same.
         | Guarantees that identity implies equality. */         if (v ==
         | w) {             if (op == Py_EQ)                 return 1;
         | else if (op == Py_NE)                 return 0;         }
         | 
         | `math.nan` is always itself, and thus this shortcut is taken
         | and the key works.
        
         | [deleted]
        
         | vhcr wrote:
         | This works as expected only because id(math.nan) is always the
         | same, if you build the nan dynamically, it will fail:
         | >>> d = {float('nan'): 1}         >>> d         {nan: 1}
         | >>> del d[float('nan')]         Traceback (most recent call
         | last):           File "<stdin>", line 1, in <module>
         | KeyError: nan
        
       | kangalioo wrote:
       | Would anything actually break if NaN == NaN equalled true?
        
         | josephcsible wrote:
         | Isn't "if(x != x)" the canonical way to test whether x is NaN?
         | That seems like quite a big thing to break.
        
           | jeroenhd wrote:
           | Most languages have some kind of isNaN(value) method in my
           | experience. Even C and C++ have an isnan method (C since
           | +-1999, but other platforms had the macro before: https://rep
           | o.or.cz/w/glibc.git/blob/HEAD:/sysdeps/ieee754/db...)
           | 
           | People do probably rely on NaN [?] NaN but I think that's
           | more of a side effect of compatibility than anything else.
        
         | sedatk wrote:
         | Yes that would mean that sqrt(-1) equals to 0/0 for example,
         | which might open a new portal to hell.
        
         | moomin wrote:
         | Yes, in surprising places. But the most obvious thing is that
         | comparing float values would become significantly more
         | expensive.
        
           | derriz wrote:
           | Could you explain how/why?
        
             | crote wrote:
             | Virtually all architectures have floats implemented in
             | hardware - including NaN behavior. This means things like
             | IEEE 754 float comparison is done in a single instruction,
             | executed in a single cycle.
             | 
             | Explicitly checking for NaNs would (at least) double the
             | cycle count in cases where that branch is _not_ taken, and
             | having to deal with the custom comparison is even worse.
             | This is not helped by the fact that there isn 't a single
             | NaN value: there are both positive and negative NaNs, and
             | there are 23 bits which are _usually_ ignored but for which
             | some values do have specific meaning.
             | 
             | To make it even worse, modern CPUs include vector
             | extensions, allowing you to operate on multiple values at
             | once. With AVX-512, you can compare 32 floats per clock
             | cycle. I would not be surprised if switching to a custom
             | comparison made some edge cases over 100x slower.
        
               | anonymoushn wrote:
               | Comparing the raw bits using instructions intended for
               | integers is quite fast.
        
               | masklinn wrote:
               | NaN is not a single value, it's a multitude of them: a
               | NaN is defined by all exponent bits being set to 1, so in
               | a double there are 2^53 different possible NaN bit
               | patterns (though only 2^52 values, because the sign bit
               | is not part of the "NaN value", and possibly half that if
               | you interpret signaling as a variable property of a fixed
               | value)
        
               | anonymoushn wrote:
               | Yes. Many applications do not require that at most 1 nan
               | and at most 1 zero be allowed in each collection but do
               | have trouble if some items are not equal to themselves.
        
         | derriz wrote:
         | I can't see it. I've no idea what problem was solved by the
         | current weird IEEE behavior.
         | 
         | It must have been a biggy because this solution means your
         | domain now loses the property that, in general, x == x. This
         | property is a fundamental axiomatic feature of any notion of
         | equality. A non-reflexive equality isn't just weird - it's
         | daft/stupid/surprising.
        
       | martisch wrote:
       | > This for loop is less efficient than clearing a map in one
       | operation
       | 
       | For maps with keys that are reflexive with == the Go compiler
       | already optimizes the range loop to a single efficient runtime
       | map clear call: https://go-review.googlesource.com/c/go/+/110055
        
       | kazinator wrote:
       | This looks like catching up to GNU Awk, which allows "delete a"
       | to clear an associative array. That's an extension over POSIX,
       | which specifies "delete a[key]".
        
       | inglor wrote:
       | Interestingly JavaScript handles this correctly and specified
       | Object.is equality that is different than `===` equality around
       | NaNs so `const m = new Map(); m.set(NaN, 3); m.get(NaN)` returns
       | 3 and `.delete` similarly works.
        
       | yawaramin wrote:
       | OK, now I understand why OCaml's Float.compare function compares
       | NaNs as equal to each other:
       | https://discuss.ocaml.org/t/assertions-involving-nan/10762/4
        
       | ch_123 wrote:
       | I'm curious why this is not implemented as a method on the map
       | type (and others like list) instead of being a top-level builtin.
       | I suppose it is consistent with other collection operations such
       | as append and len... which I guess makes me wonder why those are
       | builtins as well.
        
         | urzaj wrote:
         | From the go FAQ:
         | 
         | > Why is len a function and not a method?
         | 
         | > We debated this issue but decided implementing len and
         | friends as functions was fine in practice and didn't complicate
         | questions about the interface (in the Go type sense) of basic
         | types.
         | 
         | I imagine similar reasoning applies to "clear" here.
         | 
         | https://go.dev/doc/faq#methods_on_basics
        
           | saghm wrote:
           | That's a pretty weird rationale IMO. "It doesn't break
           | anything" should be the minimum requirement for a major
           | design choice, but it certainly isn't sufficient on its own.
           | They sort of imply that making it a method would "complicate
           | questions about the interface of basic types", but I don't
           | really understand what that would even mean "in the Go type
           | sense". Are they worried that someone might make an interface
           | with a `Len` method and therefore be able to pass slices and
           | maps to it? That honestly seems like a feature, not a bug,
           | and certainly not worth throwing out the nicer syntax for.
        
             | icholy wrote:
             | In Go, it's common to create new types from the built-in
             | ones. Here's an example: https://pkg.go.dev/net/http#Header
             | 
             | When you create a new type like this, you lose all methods
             | from the original type. If the delete operation was
             | implemented as a method on the `map` type, `Header.Del`
             | would need to convert back to a `map` type to call it.
             | (map[string][]string(h)).Delete(key)
        
               | saghm wrote:
               | Interesting, I was aware of that newtype wrapper pattern,
               | but I did not realize that methods weren't automatically
               | forwarded to the wrapped type. This isn't the design I'd
               | personally choose, but it does seem more consistent with
               | what I know of Go's philosophy now that I understand
               | this.
        
         | [deleted]
        
         | mseepgood wrote:
         | Russ Cox explains in the issue why they don't want it to be a
         | standard library function:
         | 
         | https://github.com/golang/go/issues/56351#issuecomment-12914...
         | 
         | "We talked about defining maps.Clear in #55002, but how? It
         | would not be possible to write in pure Go code. Instead it
         | would need some kind of hidden entry point into the runtime.
         | 
         | In the past I've referred to that kind of library change as a
         | "back-door language change", meaning it's a language change -
         | it adds something not possible in pure Go - but pretends not to
         | be one. We try hard to avoid doing that."
        
         | esprehn wrote:
         | The answer is lack of generics in all previous versions of Go.
         | There's no way to express a generic interface to have methods
         | for map or slice (which are not interfaces, but compiler magic
         | instead). All the operations on builtin types must be
         | implemented as globals handled by the compiler.
        
           | hesdeadjim wrote:
           | Been using Go since the early days, and 'make/append' always
           | felt clunky and out of place. Especially when you could just
           | use literals for a map/slice...
        
             | chimeracoder wrote:
             | > Been using Go since the early days, and 'make/append'
             | always felt clunky and out of place. Especially when you
             | could just use literals for a map/slice...
             | 
             | Most people do use the literals for creating the empty
             | map/slice. "make" is mostly useful when specifying a known
             | length or capacity, which saves unnecessary allocations and
             | improves performance.
        
         | AtNightWeCode wrote:
         | Most obvious flaw in Go is the top-level bs. Why on earth are
         | they adding things to that? It is off course down to other poor
         | decisions made historically that forces this design. But
         | anyway. They should fix the lang and scrap the top-level
         | nonsense all together.
         | 
         | For a beginner in Go it makes absolutely no sense at all. Most
         | of the builtins should be methods.
        
       | _old_dude_ wrote:
       | In Java, the primitive type double uses the IEEE-754 semantics
       | and java.lang.Double uses the bits by bits comparison semantics
       | [1] so List<Double>.remove() works correctly.
       | 
       | [1]
       | https://docs.oracle.com/en/java/javase/19/docs/api/java.base...
        
         | uluyol wrote:
         | This won't help you in a future version of Java where you can
         | write List<double>, right?
        
           | _old_dude_ wrote:
           | It depends if List<double> is an alias for
           | List<@PleaseSpecialize @NotNull Double> or not.
        
           | masklinn wrote:
           | No.
           | 
           | What will help you is that `clear()` is part of
           | `Java.until.Collection` so just about every container has it.
        
       | derriz wrote:
       | Java has a specific hack for this which I discovered by accident
       | a few years ago. Normally, given (primitive) double values d0 and
       | d1, then boxing them does not affect equality testing - i.e. d0
       | == d1 if and only if
       | Double.valueOf(d0).equals(Double.valueOf(d1)). However if d0 and
       | d1 are both NaN, then the boxed versions ARE considered equal
       | while d0 == d1 is false.
       | 
       | This inconsistency infuriated me when I discovered it but the
       | Javadoc for Double.equals explicitly states that this anomaly is
       | there to "allow hash tables to work properly".
        
         | bobbylarrybobby wrote:
         | Using floats as hash keys is insane, no?
        
           | vbezhenar wrote:
           | No. JavaScript does that all the time.
        
             | ithkuil wrote:
             | (Because JS has no integers)
        
               | yawaramin wrote:
               | That used to be the case but JavaScript has BigIntegers
               | now.
        
           | metadat wrote:
           | Yes, why would you ever want a map with floating point keys?
           | 
           | I'm struggling to think of a valid use case where there is no
           | better alternative.
           | 
           | Any design utilizing this language "feature" seems
           | masochistic and begging to get sliced by sheet metal edges.
        
             | kazinator wrote:
             | > _Yes, why would you ever want a map with floating point
             | keys?_
             | 
             | One possible answer: you're in Javascript or something
             | similar. You wanted integer keys, but all you have is
             | floating-point numbers.
        
               | metadat wrote:
               | Or.. convert to string form of each integer and use that.
               | Wacky nonsense situation averted.
        
             | kazinator wrote:
             | Say I'm in a dynamic language and have a dictionary in
             | which an object of any type whatsoever can be a key. Why
             | would I arbitrarily disallow floating-point objects, if a
             | character, string, lexical closure, database connection,
             | file handle, ..., or lists or arrays of these can all be
             | hash keys.
             | 
             | A hash table of floating-point values can be used for, say,
             | memoizing a function whose argument (or arguments) are
             | floating-point.
             | 
             | A compiler could use a floating-point-keyed hash table for
             | deduplicating identical floating-point constants. Say that
             | constants are stored in some static area, and referenced by
             | address: it's wasteful to repeat those constants. Some
             | constant defining mechanisms (like #define in C)
             | proliferate copies of a constant as a repeated
             | subexpression.
        
               | derefr wrote:
               | Because all those other things are either identity types
               | (handles, closures, etc) or are value types where the
               | only possible way to represent them is with a single
               | "canonical form."
               | 
               | But IEEE754 allows not only NaNs, but also "negative
               | zero" and denormals. Floating-point numbers, in other
               | words, allow for multiple different bit-encodings that
               | represent _mathematically equal_ , but _not_ identical,
               | number values. There are _non-canonical_ forms of the
               | "same" numbers. And programming-language runtimes don't
               | do anything to prevent CPUs from returning these non-
               | canonical numbers; nor do they massage them back into
               | canonical forms upon receiving them. They just end up
               | blindly holding these non-canonical numbers -- numbers
               | which, if they ask the CPU if they're equal, they are;
               | but if they look at the bit-patterns and compare _those_
               | for equality, they 're not.
               | 
               | > A compiler could use a floating-point-keyed hash table
               | for deduplicating identical floating-point constants.
               | 
               | Bad example. A compiler wouldn't _want_ a hash table
               | whose key type is a floating-point number, because that
               | would imply that the hash table is operating using
               | IEEE754 definition of equality via-a-vis key presence
               | collision checking.
               | 
               | Rather, a compiler would use a bitstring key type, where
               | the keys are the bit patterns representing either the
               | target ISA's native FP-register encodings of the given
               | floating-point numbers; or the abstract/formal bit-packed
               | form of those floating-point numbers according to
               | IEEE754.
               | 
               | The difference between the two is that the latter uses
               | bitstring collation (ordering and equality), not
               | floating-point collation.
        
             | masklinn wrote:
             | > Yes, why would you ever want a map with floating point
             | keys?
             | 
             | Because it's a cache and you didn't specifically handle
             | that edge case.
        
               | malcolmgreaves wrote:
               | I don't know what precisely the poster meant. However,
               | from my perspective, I see it as a simpler design to
               | explicitly hash float values into keys.
        
               | masklinn wrote:
               | That... still requires the foresight to see that edge
               | case, and being in a situation where you're not colliding
               | your float hashes with random integers?
        
               | kazinator wrote:
               | Externally hashing outside of the hash table requires you
               | to take on ugly complications. What if two distinct
               | floating point keys map to the same key? The values of
               | your hash table then have to be lists or arrays so you
               | can keep all colliding entries. You've then wastefully
               | built chained hashing around a hash table!
        
           | chrisseaton wrote:
           | It's done all the time in JavaScript. Why not in Java?
        
       | akira2501 wrote:
       | You've already got zero and negative zero as an outlying case.
       | I'm not sure why anyone would feel comfortable using floats as
       | keys in a map. To anyone who has done this.. why? What was the
       | use case?
        
         | [deleted]
        
         | craigching wrote:
         | Go has no built-in set data structure, you use map if you don't
         | want to bring in a third party. Maybe this is why?
        
       | zmj wrote:
       | C# had the same NaN behavior until recently:
       | https://learn.microsoft.com/en-us/dotnet/core/compatibility/...
        
         | rawling wrote:
         | Oof, I don't like that Matrix3x2 type. "It's a 3x2 matrix, but
         | we're going to pretend it has a last column going 0/0/1 so we
         | can get a determinant or multiply them together, but when we
         | negate or add or subtract them it's still going to be 0/0/1."
        
       | chubot wrote:
       | Meh using floating point as keys for maps in any language is just
       | asking for trouble -- it's not just NaN
       | 
       | I don't think there's any real use case for it
       | 
       | I'd say clear() is good for clarity, and that's it
        
         | mprime1 wrote:
         | > I don't think there's any real use case for it
         | 
         | Histogram where you want to keep a count for each float value
         | you've seen (maybe rounded to some precision to reduce the
         | number of buckets)
         | 
         | Not disagreeing with your comment, just saying it's not
         | uncommon
        
       | kazinator wrote:
       | > _This for loop is less efficient than clearing a map in one
       | operation_
       | 
       | That is not obvious; a compiler could have a pattern match for
       | that exact AST pattern, and transform it to a delete operation.
       | (Except for that pesky issue where two fail to be equivalent due
       | to NaN keys.)
       | 
       | Quick and dirty, not entirely correct proof-of-concept in TXR
       | Lisp:                 1> (macroexpand '(my-dohash (k v
       | some.obj.hash) (print [some.obj.hash k])))       (dohash (k v
       | some.obj.hash                ())         (print [some.obj.hash
       | k]))       2> (macroexpand '(my-dohash (k v some.obj.hash) (del
       | [some.obj.hash k])))       (clearhash hash)
       | 
       | Impl:                 (defmacro my-dohash ((kvar vvar hash :
       | result) . body)         (if-match @(require ((del [@hash @kvar]))
       | (null result))                   body           ^(clearhash hash)
       | ^(dohash (,kvar ,vvar ,hash ,result) ,*body)))
        
       | karmakaze wrote:
       | Usage of NaN as hash key reminds me of the two uses of NULL in
       | SQL. One is as unknown values which are never equal to anything
       | even other NULL values. Another use is as a key for aggregate
       | grouping. In that case the entry represents the aggregate for all
       | the unknown values which aren't equal but still grouped together.
       | Different uses have different meanings so invalid in one use does
       | not invalidate other uses.
        
       | halpmeh wrote:
       | The issue with NaN equality is interesting, but is that really
       | why they're adding a clear(x) builtin? What if you want to remove
       | a single-NaN key from a map? clear(x) seems like a band-aid at
       | best if the Go team is strictly trying to fix removing NaN keys
       | in maps.
        
         | erk__ wrote:
         | The way Rust dels with this issue is by saying that the Key
         | must implement both `Hash` and `Eq` where Eq is a strict
         | equality compared to `PartialEq` which allows the equality
         | relation to be incomplete. While integers implement both
         | PartialEq and Eq, floating point values only implement
         | PartialEq.
         | 
         | https://doc.rust-lang.org/std/collections/struct.HashMap.htm...
         | 
         | https://doc.rust-lang.org/std/cmp/trait.Eq.html
         | 
         | https://doc.rust-lang.org/std/cmp/trait.PartialEq.html
         | 
         | There is also a partial ordering relation `PartialOrd` (and
         | `Ord` for complete relations) Floating point values only
         | implement the partial relation as well because of NaN's. This
         | means that it is a bit harder to sort them, though the floating
         | point standard also has a seperate complete comparison relation
         | with some extra rules which you can use instead.
         | 
         | https://doc.rust-lang.org/std/primitive.f64.html#method.tota...
        
           | duped wrote:
           | This is one of those rough edges where Rust is more annoying
           | than helpful imo. It's also inconsistent.
           | 
           | I'd rather panic on NaN comparisons than complicate the
           | fundamental comparison traits. Treat it like integer overflow
           | or array bounds access. We don't have different traits for
           | indexing that might fail or overflow that might happen. There
           | are methods on the trait to handle those cases when you care
           | about them.
           | 
           | But that said, this feels like an XY issue. If you're
           | comparing hashes of floats as keys somewhere you don't want
           | IEEE 754 defined equivalence. You almost certainly want bit
           | equivalence, which means the keys should be type punned to
           | u32/u64. Classic example is memoizing function calls with
           | floats. You don't want to compare the semantic values of the
           | arguments, but the literal bits. For floats that's not the
           | same.
        
             | moomin wrote:
             | No you don't, you want to use the hardware defined equality
             | function. And that one is IEEE 754 compliant.
        
               | duped wrote:
               | My point is sometimes I want NaN to compare with NaN as
               | true when I'm using a float as a key in a table.
        
               | tialaramex wrote:
               | You are welcome to build a type which has this property
               | and declares that it is Eq.
               | 
               | But Rust's built-in floating point types f32 and 64 do
               | not have that property.
        
               | anonymoushn wrote:
               | How are you so sure what other people want? Using the
               | hardware defined equality function for integers is
               | reasonable a lot of the time.
        
               | jeroenhd wrote:
               | In a hash table, I want my equality function to be based
               | on the actual value, not the super special comparison for
               | a specific type.
               | 
               | If I write a UniqueNumber that just returns false on the
               | == operator, I'd be stupid but the hash map should still
               | work. Java has a separate getHashCode() and equals()
               | method for a reason.
               | 
               | Either provide a hash key override or use the raw bits
               | for deep compares. Custom equality algorithms usually
               | don't make sense for arbitrary key-value stores.
        
               | tialaramex wrote:
               | Because of the Pigeonhole Principle what you're
               | describing is nonsense.
               | 
               | In Rust you can make UniqueNumber, indeed
               | misfortunate::OnewayGreater is such a type.
               | misfortunate::Maxwell even more so.
               | 
               | However of course the HashMap can't meaningfully "work"
               | for this type. Rust promises your program doesn't have
               | Undefined Behaviour despite such a prank, but it might
               | spin forever when you try to insert this type into a map
               | for example, a well defined but undesirable behaviour
               | brought on by your poor choices.
               | 
               | You seem to imagine that Java's hash map doesn't need to
               | compare types, that it can just use getHashCode() --
               | however because of the Pigeonhole Principle that can't
               | actually work.
               | 
               | And NaN != NaN isn't a "custom equality algorithm" it's
               | literally how the floating point numbers are defined in
               | your CPU.
        
           | tialaramex wrote:
           | Rust's HashMap and similar containers have clear() and many
           | of them have drain()
           | 
           | Drain is interesting, because it resolves the other problem
           | if things are in the collection which can never be retrieved
           | from it. Drain gives you each of the things in the
           | collection, one at a time, to do with as you wish, the
           | collection is now empty. So this means even if the collection
           | has a dozen SillyNonsenses in it, all of which insist they're
           | not the one you were looking for whenever you go looking for
           | a SillyNonsense in the collection, you get all twelve of them
           | out in Drain, and can examine them as you wish.
           | 
           | Of course if you just wanted to fish one thing out and then
           | throw the rest away, you can do that, once you drop the
           | result of Drain the rest of the things being drained are
           | dropped.
        
       ___________________________________________________________________
       (page generated 2022-11-19 23:00 UTC)