[HN Gopher] Go 1.21 may have a clear(x) builtin
___________________________________________________________________
Go 1.21 may have a clear(x) builtin
Author : aviramha
Score : 79 points
Date : 2022-11-19 18:33 UTC (4 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| unwind wrote:
| That was weird. Makes sense once you know about NaN's
| incomparability, but still surprising.
|
| I had to try it in Python (3.10.4, Windows on x86), it worked
| fine: >>> import math >>> a={math.nan:
| 1} >>> a {nan: 1} >>> del(a[math.nan])
| >>> a {}
| masklinn wrote:
| It's because cpython shortcuts through identity:
| https://github.com/python/cpython/blob/main/Objects/object.c...
| /* Quick result when objects are the same.
| Guarantees that identity implies equality. */ if (v ==
| w) { if (op == Py_EQ) return 1;
| else if (op == Py_NE) return 0; }
|
| `math.nan` is always itself, and thus this shortcut is taken
| and the key works.
| [deleted]
| vhcr wrote:
| This works as expected only because id(math.nan) is always the
| same, if you build the nan dynamically, it will fail:
| >>> d = {float('nan'): 1} >>> d {nan: 1}
| >>> del d[float('nan')] Traceback (most recent call
| last): File "<stdin>", line 1, in <module>
| KeyError: nan
| kangalioo wrote:
| Would anything actually break if NaN == NaN equalled true?
| josephcsible wrote:
| Isn't "if(x != x)" the canonical way to test whether x is NaN?
| That seems like quite a big thing to break.
| jeroenhd wrote:
| Most languages have some kind of isNaN(value) method in my
| experience. Even C and C++ have an isnan method (C since
| +-1999, but other platforms had the macro before: https://rep
| o.or.cz/w/glibc.git/blob/HEAD:/sysdeps/ieee754/db...)
|
| People do probably rely on NaN [?] NaN but I think that's
| more of a side effect of compatibility than anything else.
| sedatk wrote:
| Yes that would mean that sqrt(-1) equals to 0/0 for example,
| which might open a new portal to hell.
| moomin wrote:
| Yes, in surprising places. But the most obvious thing is that
| comparing float values would become significantly more
| expensive.
| derriz wrote:
| Could you explain how/why?
| crote wrote:
| Virtually all architectures have floats implemented in
| hardware - including NaN behavior. This means things like
| IEEE 754 float comparison is done in a single instruction,
| executed in a single cycle.
|
| Explicitly checking for NaNs would (at least) double the
| cycle count in cases where that branch is _not_ taken, and
| having to deal with the custom comparison is even worse.
| This is not helped by the fact that there isn 't a single
| NaN value: there are both positive and negative NaNs, and
| there are 23 bits which are _usually_ ignored but for which
| some values do have specific meaning.
|
| To make it even worse, modern CPUs include vector
| extensions, allowing you to operate on multiple values at
| once. With AVX-512, you can compare 32 floats per clock
| cycle. I would not be surprised if switching to a custom
| comparison made some edge cases over 100x slower.
| anonymoushn wrote:
| Comparing the raw bits using instructions intended for
| integers is quite fast.
| masklinn wrote:
| NaN is not a single value, it's a multitude of them: a
| NaN is defined by all exponent bits being set to 1, so in
| a double there are 2^53 different possible NaN bit
| patterns (though only 2^52 values, because the sign bit
| is not part of the "NaN value", and possibly half that if
| you interpret signaling as a variable property of a fixed
| value)
| anonymoushn wrote:
| Yes. Many applications do not require that at most 1 nan
| and at most 1 zero be allowed in each collection but do
| have trouble if some items are not equal to themselves.
| derriz wrote:
| I can't see it. I've no idea what problem was solved by the
| current weird IEEE behavior.
|
| It must have been a biggy because this solution means your
| domain now loses the property that, in general, x == x. This
| property is a fundamental axiomatic feature of any notion of
| equality. A non-reflexive equality isn't just weird - it's
| daft/stupid/surprising.
| martisch wrote:
| > This for loop is less efficient than clearing a map in one
| operation
|
| For maps with keys that are reflexive with == the Go compiler
| already optimizes the range loop to a single efficient runtime
| map clear call: https://go-review.googlesource.com/c/go/+/110055
| kazinator wrote:
| This looks like catching up to GNU Awk, which allows "delete a"
| to clear an associative array. That's an extension over POSIX,
| which specifies "delete a[key]".
| inglor wrote:
| Interestingly JavaScript handles this correctly and specified
| Object.is equality that is different than `===` equality around
| NaNs so `const m = new Map(); m.set(NaN, 3); m.get(NaN)` returns
| 3 and `.delete` similarly works.
| yawaramin wrote:
| OK, now I understand why OCaml's Float.compare function compares
| NaNs as equal to each other:
| https://discuss.ocaml.org/t/assertions-involving-nan/10762/4
| ch_123 wrote:
| I'm curious why this is not implemented as a method on the map
| type (and others like list) instead of being a top-level builtin.
| I suppose it is consistent with other collection operations such
| as append and len... which I guess makes me wonder why those are
| builtins as well.
| urzaj wrote:
| From the go FAQ:
|
| > Why is len a function and not a method?
|
| > We debated this issue but decided implementing len and
| friends as functions was fine in practice and didn't complicate
| questions about the interface (in the Go type sense) of basic
| types.
|
| I imagine similar reasoning applies to "clear" here.
|
| https://go.dev/doc/faq#methods_on_basics
| saghm wrote:
| That's a pretty weird rationale IMO. "It doesn't break
| anything" should be the minimum requirement for a major
| design choice, but it certainly isn't sufficient on its own.
| They sort of imply that making it a method would "complicate
| questions about the interface of basic types", but I don't
| really understand what that would even mean "in the Go type
| sense". Are they worried that someone might make an interface
| with a `Len` method and therefore be able to pass slices and
| maps to it? That honestly seems like a feature, not a bug,
| and certainly not worth throwing out the nicer syntax for.
| icholy wrote:
| In Go, it's common to create new types from the built-in
| ones. Here's an example: https://pkg.go.dev/net/http#Header
|
| When you create a new type like this, you lose all methods
| from the original type. If the delete operation was
| implemented as a method on the `map` type, `Header.Del`
| would need to convert back to a `map` type to call it.
| (map[string][]string(h)).Delete(key)
| saghm wrote:
| Interesting, I was aware of that newtype wrapper pattern,
| but I did not realize that methods weren't automatically
| forwarded to the wrapped type. This isn't the design I'd
| personally choose, but it does seem more consistent with
| what I know of Go's philosophy now that I understand
| this.
| [deleted]
| mseepgood wrote:
| Russ Cox explains in the issue why they don't want it to be a
| standard library function:
|
| https://github.com/golang/go/issues/56351#issuecomment-12914...
|
| "We talked about defining maps.Clear in #55002, but how? It
| would not be possible to write in pure Go code. Instead it
| would need some kind of hidden entry point into the runtime.
|
| In the past I've referred to that kind of library change as a
| "back-door language change", meaning it's a language change -
| it adds something not possible in pure Go - but pretends not to
| be one. We try hard to avoid doing that."
| esprehn wrote:
| The answer is lack of generics in all previous versions of Go.
| There's no way to express a generic interface to have methods
| for map or slice (which are not interfaces, but compiler magic
| instead). All the operations on builtin types must be
| implemented as globals handled by the compiler.
| hesdeadjim wrote:
| Been using Go since the early days, and 'make/append' always
| felt clunky and out of place. Especially when you could just
| use literals for a map/slice...
| chimeracoder wrote:
| > Been using Go since the early days, and 'make/append'
| always felt clunky and out of place. Especially when you
| could just use literals for a map/slice...
|
| Most people do use the literals for creating the empty
| map/slice. "make" is mostly useful when specifying a known
| length or capacity, which saves unnecessary allocations and
| improves performance.
| AtNightWeCode wrote:
| Most obvious flaw in Go is the top-level bs. Why on earth are
| they adding things to that? It is off course down to other poor
| decisions made historically that forces this design. But
| anyway. They should fix the lang and scrap the top-level
| nonsense all together.
|
| For a beginner in Go it makes absolutely no sense at all. Most
| of the builtins should be methods.
| _old_dude_ wrote:
| In Java, the primitive type double uses the IEEE-754 semantics
| and java.lang.Double uses the bits by bits comparison semantics
| [1] so List<Double>.remove() works correctly.
|
| [1]
| https://docs.oracle.com/en/java/javase/19/docs/api/java.base...
| uluyol wrote:
| This won't help you in a future version of Java where you can
| write List<double>, right?
| _old_dude_ wrote:
| It depends if List<double> is an alias for
| List<@PleaseSpecialize @NotNull Double> or not.
| masklinn wrote:
| No.
|
| What will help you is that `clear()` is part of
| `Java.until.Collection` so just about every container has it.
| derriz wrote:
| Java has a specific hack for this which I discovered by accident
| a few years ago. Normally, given (primitive) double values d0 and
| d1, then boxing them does not affect equality testing - i.e. d0
| == d1 if and only if
| Double.valueOf(d0).equals(Double.valueOf(d1)). However if d0 and
| d1 are both NaN, then the boxed versions ARE considered equal
| while d0 == d1 is false.
|
| This inconsistency infuriated me when I discovered it but the
| Javadoc for Double.equals explicitly states that this anomaly is
| there to "allow hash tables to work properly".
| bobbylarrybobby wrote:
| Using floats as hash keys is insane, no?
| vbezhenar wrote:
| No. JavaScript does that all the time.
| ithkuil wrote:
| (Because JS has no integers)
| yawaramin wrote:
| That used to be the case but JavaScript has BigIntegers
| now.
| metadat wrote:
| Yes, why would you ever want a map with floating point keys?
|
| I'm struggling to think of a valid use case where there is no
| better alternative.
|
| Any design utilizing this language "feature" seems
| masochistic and begging to get sliced by sheet metal edges.
| kazinator wrote:
| > _Yes, why would you ever want a map with floating point
| keys?_
|
| One possible answer: you're in Javascript or something
| similar. You wanted integer keys, but all you have is
| floating-point numbers.
| metadat wrote:
| Or.. convert to string form of each integer and use that.
| Wacky nonsense situation averted.
| kazinator wrote:
| Say I'm in a dynamic language and have a dictionary in
| which an object of any type whatsoever can be a key. Why
| would I arbitrarily disallow floating-point objects, if a
| character, string, lexical closure, database connection,
| file handle, ..., or lists or arrays of these can all be
| hash keys.
|
| A hash table of floating-point values can be used for, say,
| memoizing a function whose argument (or arguments) are
| floating-point.
|
| A compiler could use a floating-point-keyed hash table for
| deduplicating identical floating-point constants. Say that
| constants are stored in some static area, and referenced by
| address: it's wasteful to repeat those constants. Some
| constant defining mechanisms (like #define in C)
| proliferate copies of a constant as a repeated
| subexpression.
| derefr wrote:
| Because all those other things are either identity types
| (handles, closures, etc) or are value types where the
| only possible way to represent them is with a single
| "canonical form."
|
| But IEEE754 allows not only NaNs, but also "negative
| zero" and denormals. Floating-point numbers, in other
| words, allow for multiple different bit-encodings that
| represent _mathematically equal_ , but _not_ identical,
| number values. There are _non-canonical_ forms of the
| "same" numbers. And programming-language runtimes don't
| do anything to prevent CPUs from returning these non-
| canonical numbers; nor do they massage them back into
| canonical forms upon receiving them. They just end up
| blindly holding these non-canonical numbers -- numbers
| which, if they ask the CPU if they're equal, they are;
| but if they look at the bit-patterns and compare _those_
| for equality, they 're not.
|
| > A compiler could use a floating-point-keyed hash table
| for deduplicating identical floating-point constants.
|
| Bad example. A compiler wouldn't _want_ a hash table
| whose key type is a floating-point number, because that
| would imply that the hash table is operating using
| IEEE754 definition of equality via-a-vis key presence
| collision checking.
|
| Rather, a compiler would use a bitstring key type, where
| the keys are the bit patterns representing either the
| target ISA's native FP-register encodings of the given
| floating-point numbers; or the abstract/formal bit-packed
| form of those floating-point numbers according to
| IEEE754.
|
| The difference between the two is that the latter uses
| bitstring collation (ordering and equality), not
| floating-point collation.
| masklinn wrote:
| > Yes, why would you ever want a map with floating point
| keys?
|
| Because it's a cache and you didn't specifically handle
| that edge case.
| malcolmgreaves wrote:
| I don't know what precisely the poster meant. However,
| from my perspective, I see it as a simpler design to
| explicitly hash float values into keys.
| masklinn wrote:
| That... still requires the foresight to see that edge
| case, and being in a situation where you're not colliding
| your float hashes with random integers?
| kazinator wrote:
| Externally hashing outside of the hash table requires you
| to take on ugly complications. What if two distinct
| floating point keys map to the same key? The values of
| your hash table then have to be lists or arrays so you
| can keep all colliding entries. You've then wastefully
| built chained hashing around a hash table!
| chrisseaton wrote:
| It's done all the time in JavaScript. Why not in Java?
| akira2501 wrote:
| You've already got zero and negative zero as an outlying case.
| I'm not sure why anyone would feel comfortable using floats as
| keys in a map. To anyone who has done this.. why? What was the
| use case?
| [deleted]
| craigching wrote:
| Go has no built-in set data structure, you use map if you don't
| want to bring in a third party. Maybe this is why?
| zmj wrote:
| C# had the same NaN behavior until recently:
| https://learn.microsoft.com/en-us/dotnet/core/compatibility/...
| rawling wrote:
| Oof, I don't like that Matrix3x2 type. "It's a 3x2 matrix, but
| we're going to pretend it has a last column going 0/0/1 so we
| can get a determinant or multiply them together, but when we
| negate or add or subtract them it's still going to be 0/0/1."
| chubot wrote:
| Meh using floating point as keys for maps in any language is just
| asking for trouble -- it's not just NaN
|
| I don't think there's any real use case for it
|
| I'd say clear() is good for clarity, and that's it
| mprime1 wrote:
| > I don't think there's any real use case for it
|
| Histogram where you want to keep a count for each float value
| you've seen (maybe rounded to some precision to reduce the
| number of buckets)
|
| Not disagreeing with your comment, just saying it's not
| uncommon
| kazinator wrote:
| > _This for loop is less efficient than clearing a map in one
| operation_
|
| That is not obvious; a compiler could have a pattern match for
| that exact AST pattern, and transform it to a delete operation.
| (Except for that pesky issue where two fail to be equivalent due
| to NaN keys.)
|
| Quick and dirty, not entirely correct proof-of-concept in TXR
| Lisp: 1> (macroexpand '(my-dohash (k v
| some.obj.hash) (print [some.obj.hash k]))) (dohash (k v
| some.obj.hash ()) (print [some.obj.hash
| k])) 2> (macroexpand '(my-dohash (k v some.obj.hash) (del
| [some.obj.hash k]))) (clearhash hash)
|
| Impl: (defmacro my-dohash ((kvar vvar hash :
| result) . body) (if-match @(require ((del [@hash @kvar]))
| (null result)) body ^(clearhash hash)
| ^(dohash (,kvar ,vvar ,hash ,result) ,*body)))
| karmakaze wrote:
| Usage of NaN as hash key reminds me of the two uses of NULL in
| SQL. One is as unknown values which are never equal to anything
| even other NULL values. Another use is as a key for aggregate
| grouping. In that case the entry represents the aggregate for all
| the unknown values which aren't equal but still grouped together.
| Different uses have different meanings so invalid in one use does
| not invalidate other uses.
| halpmeh wrote:
| The issue with NaN equality is interesting, but is that really
| why they're adding a clear(x) builtin? What if you want to remove
| a single-NaN key from a map? clear(x) seems like a band-aid at
| best if the Go team is strictly trying to fix removing NaN keys
| in maps.
| erk__ wrote:
| The way Rust dels with this issue is by saying that the Key
| must implement both `Hash` and `Eq` where Eq is a strict
| equality compared to `PartialEq` which allows the equality
| relation to be incomplete. While integers implement both
| PartialEq and Eq, floating point values only implement
| PartialEq.
|
| https://doc.rust-lang.org/std/collections/struct.HashMap.htm...
|
| https://doc.rust-lang.org/std/cmp/trait.Eq.html
|
| https://doc.rust-lang.org/std/cmp/trait.PartialEq.html
|
| There is also a partial ordering relation `PartialOrd` (and
| `Ord` for complete relations) Floating point values only
| implement the partial relation as well because of NaN's. This
| means that it is a bit harder to sort them, though the floating
| point standard also has a seperate complete comparison relation
| with some extra rules which you can use instead.
|
| https://doc.rust-lang.org/std/primitive.f64.html#method.tota...
| duped wrote:
| This is one of those rough edges where Rust is more annoying
| than helpful imo. It's also inconsistent.
|
| I'd rather panic on NaN comparisons than complicate the
| fundamental comparison traits. Treat it like integer overflow
| or array bounds access. We don't have different traits for
| indexing that might fail or overflow that might happen. There
| are methods on the trait to handle those cases when you care
| about them.
|
| But that said, this feels like an XY issue. If you're
| comparing hashes of floats as keys somewhere you don't want
| IEEE 754 defined equivalence. You almost certainly want bit
| equivalence, which means the keys should be type punned to
| u32/u64. Classic example is memoizing function calls with
| floats. You don't want to compare the semantic values of the
| arguments, but the literal bits. For floats that's not the
| same.
| moomin wrote:
| No you don't, you want to use the hardware defined equality
| function. And that one is IEEE 754 compliant.
| duped wrote:
| My point is sometimes I want NaN to compare with NaN as
| true when I'm using a float as a key in a table.
| tialaramex wrote:
| You are welcome to build a type which has this property
| and declares that it is Eq.
|
| But Rust's built-in floating point types f32 and 64 do
| not have that property.
| anonymoushn wrote:
| How are you so sure what other people want? Using the
| hardware defined equality function for integers is
| reasonable a lot of the time.
| jeroenhd wrote:
| In a hash table, I want my equality function to be based
| on the actual value, not the super special comparison for
| a specific type.
|
| If I write a UniqueNumber that just returns false on the
| == operator, I'd be stupid but the hash map should still
| work. Java has a separate getHashCode() and equals()
| method for a reason.
|
| Either provide a hash key override or use the raw bits
| for deep compares. Custom equality algorithms usually
| don't make sense for arbitrary key-value stores.
| tialaramex wrote:
| Because of the Pigeonhole Principle what you're
| describing is nonsense.
|
| In Rust you can make UniqueNumber, indeed
| misfortunate::OnewayGreater is such a type.
| misfortunate::Maxwell even more so.
|
| However of course the HashMap can't meaningfully "work"
| for this type. Rust promises your program doesn't have
| Undefined Behaviour despite such a prank, but it might
| spin forever when you try to insert this type into a map
| for example, a well defined but undesirable behaviour
| brought on by your poor choices.
|
| You seem to imagine that Java's hash map doesn't need to
| compare types, that it can just use getHashCode() --
| however because of the Pigeonhole Principle that can't
| actually work.
|
| And NaN != NaN isn't a "custom equality algorithm" it's
| literally how the floating point numbers are defined in
| your CPU.
| tialaramex wrote:
| Rust's HashMap and similar containers have clear() and many
| of them have drain()
|
| Drain is interesting, because it resolves the other problem
| if things are in the collection which can never be retrieved
| from it. Drain gives you each of the things in the
| collection, one at a time, to do with as you wish, the
| collection is now empty. So this means even if the collection
| has a dozen SillyNonsenses in it, all of which insist they're
| not the one you were looking for whenever you go looking for
| a SillyNonsense in the collection, you get all twelve of them
| out in Drain, and can examine them as you wish.
|
| Of course if you just wanted to fish one thing out and then
| throw the rest away, you can do that, once you drop the
| result of Drain the rest of the things being drained are
| dropped.
___________________________________________________________________
(page generated 2022-11-19 23:00 UTC)