[HN Gopher] a[low:high:max] in Golang - A Rare Slice Trick
___________________________________________________________________
a[low:high:max] in Golang - A Rare Slice Trick
Author : weird_user
Score : 84 points
Date : 2023-03-18 16:53 UTC (6 hours ago)
(HTM) web link (build-your-own.org)
(TXT) w3m dump (build-your-own.org)
| [deleted]
| tommodev wrote:
| Question: what is the reason for the silent copy when append
| exceeds the original slice cap?
|
| It's a footgun avoided by reading the spec and (maybe)
| remembering it in practice, but it feels like it would be safer
| to throw a comp error and force the user to deal with it when a
| user is trying to exceed the cap of the underlying array?
|
| Alternative is defensively using len() and cap() for slice ops in
| which case error-ing out feels more ergonomic.
| nutate wrote:
| My fave golang slice trick is the len of an empty slice is 0, but
| the slice itself is == to nil, but the len of nil won't compile.
| Can't understand that one.
|
| https://go.dev/play/p/MslCkBphl7q?v=gotip
| kjksf wrote:
| Because nil is a special thing in Go.
|
| nil is not a value, it's a predeclared identifier.
|
| it represents zero value for pointers, interfaces, maps,
| slices, channels and function types, representing an
| uninitialized value
|
| len(nil) feels like it should work if you think of nil as the
| same as "value of empty array"
|
| but what should be: var p *Struct len(p)
| ???
|
| There's no "length of zero-valued pointer".
| morelisp wrote:
| Because the bare value `nil` has no type. Typing it, e.g.
| `len([]int(nil))` works fine.
| gus_massa wrote:
| This paragraph is scary. It looks like a good idea for the Golang
| version of the Underhanded C Contest.
|
| > _This trick is useful for returning a slice from an immutable
| array; if you accidentally append to the supposedly immutable
| slice, a copy is forced and no data is overwritten because there
| is no more capacity left._
| gleenn wrote:
| I don't know much Go but what does it mean to "accidentally"
| append to an immutable array? Are there just no particular
| guards for arrays and they are saying something which should be
| treated as immutable?
| chrsig wrote:
| Here's a succinct demonstration:
| https://go.dev/play/p/Li6_Rpe2R5L
|
| Basically: a slice is a triple of {ptr to allocation, number
| of elements used, size of allocation}
|
| the slices will share the same underlying allocation. By
| specifying the third parameter in the slice function, you set
| the capacity equal to the number of elements in your slice.
| This forces the next append to reallocate and copy the
| contents into a new region.
|
| Without bounding the capacity when creating a new slice, the
| append operation _could possibly_ continue using the same
| allocation, shared with the original slice. It _could
| possibly_ resize and copy as well. It depends on how full the
| slice is, and is an implementation detail subject to change
| between versions.
| ErikCorry wrote:
| Here another illustration of how slices sometimes alias
| each other, sometimes not. https://mobile.twitter.com/erikc
| orry/status/1561635841495236...
| thomaslkjeldsen wrote:
| and yet another one (thanks to Erik):
| https://github.com/tlk/go-
| wat/blob/41cc21294a953dc7ad07a9b40...
| tedunangst wrote:
| There are no immutable arrays. If you have a slice of
| capacity ten with four elements, you can write a fifth
| element to it. If you reduce the capacity to four, then a new
| slice must be allocated to store five elements.
|
| The backing store for a slice is just a pointer. The rule is
| you don't give people pointers that you don't want them to
| write to.
| andrewstuart2 wrote:
| That's just not true. There are immutable arrays in length.
| [4]int cannot be appended to. The backing of all slices are
| array types and array doubling is used for appends that go
| beyond the capacity.
|
| https://go.dev/ref/spec#Array_types
| tedunangst wrote:
| Yeah, that was a mistake.
| Groxx wrote:
| Append mutates or copies-and-appends based on the capacity of
| the slice it's given. And since slicing an array or a slice
| gives you just a _view_ , not a copy, it can cause "spooky
| action at a distance" and mutate things you didn't expect,
| especially if it was a slice _with extra capacity_ of data
| used somewhere else. Which is what this article is
| describing.
|
| Plus the knowledge of the change in length (to see the
| appended data) is only visible with the returned slice (a new
| view with the larger length), but the underlying data is
| changed either way.
|
| It's not _often_ a source of errors, but when it is it can be
| extremely hard to diagnose.
| DougBTX wrote:
| So in other words, appending to a slice _may_ overwrite
| data in the middle of a different slice?
| Groxx wrote:
| Yep. Depending on how the slice was _constructed_ , not
| how it is used.
|
| Slices (largely) behave like this in most languages, Go's
| contribution is mostly that slices and append are
| ubiquitous, so pretty much every Go coder is exposed to
| it. Prior to generics, literally any alternative was so
| much more work they essentially haven't been used. That
| may change in the future now that we do have (very
| simplistic) generics, but only time will tell.
| iainmerrick wrote:
| _Slices (largely) behave like this in most languages_
|
| Is that really the case? I can't think of many languages
| that let you construct a view into the middle of array,
| pass that view as an argument to a function call, and
| allow that function to add new values into your array via
| the view.
|
| In Python or JS, for example, the "slice" would just be a
| brand new array, right?
| Groxx wrote:
| If the _slice syntax_ creates a copy, I would argue that
| it 's simply syntactic sugar for array copying, not
| actually producing a slice (i.e. there is no _slice type_
| ). But yes, `somefunc(ary[1:])` in Python produces a
| copy, not a reference to the underlying value. You could
| build a more "true" slice class, but the builtin stuff
| doesn't do that. JavaScript is similar.
|
| Java however has `List<>.sublist` which largely behaves
| like Go: https://docs.oracle.com/javase/8/docs/api/java/u
| til/List.htm... . Sometimes these kinds of things are
| also referred to as "views". Go calls them slices though,
| and this is a Go article.
|
| edit: C# apparently has "spans":
| https://learn.microsoft.com/en-us/archive/msdn-
| magazine/2018...
| iainmerrick wrote:
| It doesn't look like any of those allow you to
| arbitrarily add and remove elements from the parent list
| in the same way Go does (if I understand Go slices
| correctly).
|
| Java does let you make modifications, but only under very
| tight restrictions:
|
| _The semantics of the list returned by this method
| become undefined if the backing list (i.e., this list)
| is_ structurally modified _in any way other than via the
| returned list. (Structural modifications are those that
| change the size of this list, or otherwise perturb it in
| such a fashion that iterations in progress may yield
| incorrect results.)_
|
| I take that to mean the _parent_ list can't be modified
| at the same time, nor can multiple views be modified.
|
| So, I think Go's ability to do those structural
| modifications on slices is rather unusual (as well as
| error-prone and not obviously useful).
| klodolph wrote:
| Another example is typed arrays in JavaScript, which have
| both .slice() and .subarray(). One creates a copy, and
| one creates a new view into the same underlying memory.
| iainmerrick wrote:
| You can't change the length of a JS ArrayBuffer.
| masklinn wrote:
| > Slices (largely) behave like this in most languages
|
| Only on assignment, not in appending.
|
| > Go's contribution is mostly that slices and append are
| ubiquitous
|
| Go's contribution is the conflation of slices and
| vectors, which in most languages are separate (or really
| most languages only have the latter and don't provide
| access to backing arrays, thus precluding this specific
| confusion).
| klodolph wrote:
| In C++, until C++20's std::span, you would just use
| std::vector for stuff you can modify and const
| std::vector for stuff you can't.
|
| I think the real problem is that Go's type system doesn't
| catch the common error of keeping a reference to
| something you don't own, or similar errors. C# catches
| some of these errors by letting you return a
| IReadOnlyList<T> or some other restricted type. Golang
| has ways to narrow certain types, but the slice type is
| primitive and has no narrowed read-only variation. C#
| instead forces you to pay a higher runtime cost, because
| you're paying a lot more for indirection with
| IReadOnlyList<T>.
|
| People fret about the difference between T[] and List<T>
| in C#, and they do come with different performance
| characteristics... Go's choice to use slices everywhere
| does have a certain advantage that you're getting the
| fast path everywhere, and you're spending less time
| thinking about which one to use.
|
| Not trying to say that Go is "right" here, it's just my
| viewpoint that these language design decisions are
| rational.
| TwentyPosts wrote:
| Go has no way to ensure "const correctness". If you pass
| around a slice (which you'll be doing a lot, since it's
| essentially Go's equivalent of a vector), everyone can modify
| the slice however they please. It's just a fat pointer.
|
| So yes, something which should be treated as immutable.
|
| The way Go works here easily leads to bugs, especially when
| concurrency is involved and slices get captured or used by
| goroutines.
|
| If you send a slice to a function, they essentially receive a
| pointer pointing at the same block of memory as in your
| calling function. They can modify it. If they append to the
| slice, however, and surpass the capacity of the slice they
| received, then their slice gets reallocated and doesn't point
| at the same block of memory anymore.
|
| In other words, if you receive a slice, append to it, and
| then change the first value of the slice, then this might
| modify a slice (or array) somewhere completely different,
| depending on whether your append surpasses the allocated
| capacity of the slice or not.
|
| Welcome to Golang.
| pstuart wrote:
| On the bright side, if one tests with race detection
| enabled these problems are _usually_ made apparent.
| masklinn wrote:
| This issue can trivially occur in sequential code.
| eternalban wrote:
| "one goroutine has access to the value" :O
|
| If you stick to "values" Go works as originally advertised
| (see below). With values iirc the consensus became that for
| high performance the channel overhead was too much of a hit
| and so back to locks and 'traditional' concurrency.
|
| The concurrency issue in PLTs is not a logical puzzle. It's
| a performance challenge revolving around copying stuff and
| hand-offs. With multicores thrown in, it seems to really
| require addressing the challenges once and for all (as
| services) at the OS and possibly even hardware layer. IF we
| have efficiencies at the hw & os around 'message passing',
| any variation on CSP would address the concurrency issue,
| "by design", as it says below in Go's "Effective Go"
| documentation.
|
| https://web.archive.org/web/20091111073232/http://golang.or
| g...
|
| https://web.archive.org/web/20091113154825/http://golang.or
| g...
|
| _Share by communicating
|
| Concurrent programming is a large topic and there is space
| only for some Go-specific highlights here.
|
| Concurrent programming in many environments is made
| difficult by the subtleties required to implement correct
| access to shared variables. Go encourages a different
| approach in which shared values are passed around on
| channels and, in fact, never actively shared by separate
| threads of execution. Only one goroutine has access to the
| value at any given time. Data races cannot occur, by
| design. To encourage this way of thinking we have reduced
| it to a slogan:
|
| Do not communicate by sharing memory; instead, share memory
| by communicating."_
| Groxx wrote:
| But also strings _are_ actually immutable sometimes (at the
| very least when hardcoded), and they can be converted to a
| `[]byte` slice that looks mutable, but panics when
| modified. AFAIK no other slice-like data in Go behaves like
| this.
|
| Fun!
| morelisp wrote:
| This is misleading enough it's a lie; to do this you need
| to use a function named `unsafe.Pointer` with a type
| named `reflect.StringHeader`, not just `[]byte` and
| `string`.
| Groxx wrote:
| It means there is special protection for string-data that
| is not available to anything else, and that library-
| authors get panic reports about immutable byte slices. To
| that degree, it's something that the specialized type
| exposes you to but the type system does not adequately
| protect or signal.
|
| But yes, AFAIK this requires use of `unsafe`. Which does
| change things, and puts the blame for misuse squarely on
| the immutable-byte-slice creators.
| morelisp wrote:
| > It means there is special protection for strings that
| is not available to anything else
|
| Lots of languages have no C/C++-style const objects yet
| immutable strings. It's weird to pick on Go specifically
| for this. (Since it's the JVM model, at this point it may
| even be a majority of mainstream languages.)
|
| If you mean specifically the panic when modifying a
| static string, is this not just the default `mprotect` on
| rodata? You can do that to any memory page you want. It's
| a feature of the kernel's memory management, not the type
| system or even the language runtime.
|
| > library-authors get panic reports about immutable byte
| slices.
|
| I'm really skeptical this happens in any meaningful
| amount. Go offers the feature to transform a byte slice
| you know you won't want anymore into a string without a
| memory allocation; or to pass a string to a function
| which wants a []byte and _will not mutate it_ , usually
| to wrap an optimized implementation working on both
| strings and []byte. Modifying a string's contents is
| always undefined behavior, even if it doesn't panic
| immediately - the compiler will assume strings are
| immutable and make "as if" judgements accordingly.
| tedunangst wrote:
| I don't think you can convert a string to []byte without
| unsafe except by copying?
| wyager wrote:
| This is a pretty good vignette of how Go is designed.
| "Users don't need this feature (immutability,
| polymorphism, etc.), let's not include it. Ah crap, turns
| out we actually need it for a core language feature. Is
| there a lesson we can take away here? Nah, just make an
| ad-hoc implementation of the functionality in this one
| place."
| Groxx wrote:
| "X for me but not for thee" is very much the vibe Go
| gives me, yeah.
|
| I mean, I have completely replaced my adhoc Python use
| with it and I'm much happier. But it's terrifying to use
| to build large systems.
| ikiris wrote:
| Part of it is that "large systems" are almost all
| combinations of small systems with proto boundaries, so
| it's not much of an actual risk unless you're making
| giant monolith code.
| andrewflnr wrote:
| "Proto" meaning protobuffers?
| klodolph wrote:
| Const is a hard language feature to design well. There
| are a lot of pitfalls in the way const is used in C++ and
| TypeScript (I'm including "readonly" in the discussion).
|
| Rust gets it right, but Rust is relatively complicated.
|
| The languages which are most similar to Go are Java and
| C#, both of which also lack const types, or have a very
| limited version of constness.
| avgcorrection wrote:
| For what it's worth it sounds no worse than Java: you
| have immutable strings but you have no way to enforce
| that you can't modify something that you have recieved,
| which is why the underlying char array for a string has
| to be defensively copied if the user asks for `asArray()`
| or whatever it is.
|
| Strings being immutable doesn't seem like a special case,
| though. The underlying mutable array is just
| encapsulated, which is something that you can implement
| yourself. (But _reflection_ ... maybe you can break the
| rules with reflection.)
| Izkata wrote:
| Sounds like they're just badly describing copy-on-write.
|
| The _original_ slice uses no extra memory because as long as
| you treat it as immutable it won 't actually make a copy. As
| soon as you try to modify it, then the copy is made and the
| extra memory used.
| mook wrote:
| Yes-ish; because go has no immutable slices, it's just
| copy-on-realloc. Which is if course obvious if realize
| that's what it's actually doing...
| [deleted]
| UncleEntity wrote:
| I know even less go but it sounds like it prevents
| accidentally writing past the end of the array by making a
| copy and appending to that.
| jjice wrote:
| Tangential, but if OP is the owner of the site, could you talk a
| little bit about your book writing process?
|
| - How you write
|
| - How you render the PDF
|
| - How you develop your plans
|
| That kind of thing. I find it super interesting to self-publish
| software books and I've been slowly writing one for about a year
| now. Really curious about this stuff in general and it looks like
| you've got a solid process down.
| sacnoradhq wrote:
| 0. Only 2 of 3 values are necessary.
|
| 1. Can 1 value be unspecified?
|
| 2. Can 2 values be unspecified?
|
| 3. Can 3 values be unspecified?
|
| 4. Are conflicting values interpreted as intersection of ranges
| rather than union?
|
| Note 1-3. With 2 values, I believe x[:] is how to lift a sized
| array into a more generic slice type.
| masklinn wrote:
| Your questions are really unclear.
|
| In the 2-parameters form `a[low:high]`, both values can be left
| out, defaulting to respectively 0 and len(a). In the
| 3-parameter form, only the leading value can be left out,
| defaulting to 0.
|
| I've no idea what (4) is asking about.
| klodolph wrote:
| Regarding conflicting ranges--
|
| In a[i:j:k], 0 <= i, i <= j, j <= k, and k <= cap(a). If not,
| then the operation will panic.
| [deleted]
| operator-name wrote:
| So this is the same as a[low:min(high, max)]
| ?
___________________________________________________________________
(page generated 2023-03-18 23:00 UTC)