[HN Gopher] Making a Go program faster with a one-character change
___________________________________________________________________
Making a Go program faster with a one-character change
Author : hcm
Score : 358 points
Date : 2022-11-14 14:47 UTC (8 hours ago)
(HTM) web link (hmarr.com)
(TXT) w3m dump (hmarr.com)
| sakras wrote:
| A while ago at my company we switched from GCC to Clang, and
| noticed a couple of massive regressions (on the order of 50%?) in
| performance having to do with floating point.
|
| After profiling for a bit, I discovered that suddenly a lot of
| time was spent in isinf on Clang and no time in GCC... Clang was
| emitting a function call where GCC wasn't. I happened to randomly
| change isinf to std::isinf (it's a random habit of mine to put
| std:: in front of these C functions). Suddenly the regression
| disappeared! I guess on Clang only std::isinf was a compiler
| intrinsic while GCC recognized both? Anyway, that's my small-
| change optimization story.
| 10000truths wrote:
| C defines isinf as a macro, whereas C++'s std::isinf is a
| function. Perhaps the discrepancy has to do with differences in
| how they're evaluated?
| ludiludi wrote:
| > If you read the title and thought "well, you were probably just
| doing something silly beforehand", you're right!
|
| Don't feel too silly. Russ Cox, one of the technical leads on the
| Go language, made the same mistake in the regexp package of the
| standard library.
|
| https://go-review.googlesource.com/c/go/+/355789
| lxe wrote:
| This is the kind of stuff that the compiler needs to really
| understand. If all this de-referencing and referencing magic is
| at the control of the user, it needs to have meaningful effect on
| what the code does. Otherwise we might as well just write C.
| silisili wrote:
| The compiler does understand it and did what was asked - it was
| just written rather poorly.
|
| There are valid use cases for wanting to take a copy, and then
| pass along a pointer of the copy. Perhaps to go through a
| series of modification methods that don't touch the original.
| I'd sure hate it if the compiler tried to outsmart me on that
| and changed the behavior away from what I'd written.
| enedil wrote:
| Went from 4.139s to 2.413s. I fail to see how it is 70%. I think
| it is explained as 4.139/2.413 = 1.7 which of course doesn't make
| sense here.
| CodesInChaos wrote:
| I do think saying it's 71% faster makes sense here, since "x%
| faster" and "speed increased by x%" mean the same thing. This
| reduces the runtime by 42%, but that doesn't mean it's just 42%
| faster.
| bmicraft wrote:
| It would probably be more accurate to say it can do 70% more
| stuff in the same time. Or that it takes 42% less runtime
| xnorswap wrote:
| But 70% more stuff in the same time is 70% faster.
| OJFord wrote:
| Let's say you do 10 things in 100 time (100/10 =10 time per
| thing) to start with:
|
| 70% more stuff in the same time is 17 things in 100 time
| (100/17 =5.9 time per thing);
|
| 70% faster is 10 things in 30 time (30/10 =3 time per
| thing) - or, I would argue incorrectly, perhaps said to
| mean 10 things in 70 time (70/10 =7 time per thing).
| morelisp wrote:
| This is an extremely common mistake in reporting performance
| numbers. That the old version is 70% slower does not make the
| new version 70% faster.
| wizofaus wrote:
| 70% slower is a bit ambiguous though - it could mean 70%
| extra runtime or it could mean 30% of the new speed. Whereas
| 70% faster would always suggest to me that it can do 70% more
| work in the same amount of time, i.e. a 1.7x increase in
| speed.
| enedil wrote:
| I do not agree. When you benchmark you usually measure the
| differences between times needed to complete. This is
| because it is highly non-obvious that if you increase
| workload twice, the time increases also twice. Perhaps the
| algorithm is not linear. Perhaps if you have more data, you
| suddenly need to swap memory. Perhaps something (like disk
| access in parallel) means that actually it takes less than
| 2x time. This means that a concept of speed per unit of
| work is undefined. So the only reasonable interpretation of
| "70% faster" means "spends 30% time of original".
| yongjik wrote:
| But under your definition, if something becomes "three
| times as fast" (i.e., 200% faster), it will have to
| finish its task in negative time!
| wizofaus wrote:
| I can categorically state I've never thought of or
| understand 70% faster as meaning that, and certainly not
| 100% faster as meaning "completes instantly".
|
| I see the OP has solved the problem by removing any
| references to how much faster from the article title!
|
| You're right about non-linear algorithms though. If an
| O(n^2) algorithm is 2x / 100% faster, it can't process
| 100% more items in the same time, but I'd understand it
| to mean taking half the time for the same n.
| barbegal wrote:
| It's gone from 14.5 operations per minute to 24.9 operations
| per minute so a 70% speedup.
| hcm wrote:
| Doh! Thanks for pointing out another silly mistake - I'll fix
| that.
| xnorswap wrote:
| You had it right the first time, 1.7x speed _is_ 70% faster.
|
| If something previously took 4s now takes 2s then it's 100%
| faster.
|
| Think of driving 10miles. If you drive at 20mph then it takes
| 30 minutes. If you drive twice as fast, 40mph, it takes 15
| minutes.
|
| 40mph is 100% faster than 20mph.
|
| Half the time is twice as fast!
| [deleted]
| wizofaus wrote:
| Glad it wasn't just me thinking this!
| markoman wrote:
| @hcm: Would have loved to see the 'after' flamegraph just for
| comparison purposes! I'm still trying to get used to groking
| flamegraphs when optimizing. They're a somewhat new tool,
| IMO.
| wizofaus wrote:
| I disagree, I think the 70% is right, and matches what you
| still describe as a 1.7x speed increase. If it originally
| took 4 seconds and now takes 2, I'd call that a 100% speed
| increase, i.e. twice as fast.
| t3estabc wrote:
| cbsmith wrote:
| As an old C/C++ programmer, I'm always surprised by how often
| software developers are surprised by the performance costs of
| inopportune value semantics (C and C++ even more so, punishes you
| severely for using value semantics when you shouldn't). I
| increasingly see the wisdom of languages with implicit reference
| semantics.
|
| It's not that value semantics can't be better (they most
| assuredly can be), or that reference semantics don't cause their
| own complexity problems, but rather that so often we
| thoughtlessly imply/impose value semantics through interfaces in
| ways that negatively impact performance; getting interfaces wrong
| is a much tougher bell to unring.
|
| The vast majority of my mental energy when I define an interface
| in C++ is carefully thinking through a combination of ownership
| contracts and value vs. reference semantics that I can mostly
| ignore in languages with implicit reference semantics. While
| occasionally ignoring those contracts while developing in
| Java/Python/whatever comes back to bite me, the problem isn't
| nearly as common or problematic as when I unintentionally impose
| value semantics in a language that allows me to.
| dleslie wrote:
| I would also lay a chunk of blame on the use of type inference
| which causes the information about the behaviour to be hidden
| from view.
|
| Had the author been looking at the type information within the
| syntax of the code the profile output may not have been a
| surprise. Perhaps the problem would never have existed in the
| first place.
| adrianmonk wrote:
| Yeah. Often, writing out the type explicitly is just
| busywork. But it seems it would have paid off here.
|
| If you were forced to stop and think what type to declare, I
| bet you'd write "var rule *Rule". Even if you don't think
| deeply and just look at the return type.
|
| And then if you assigned "r[i]" to "rule", you'd get a type
| error.
| bitexploder wrote:
| Becoming good at Go is mostly knowing all the sharp edges with
| the built in types IMO. Of course go routines and concurrency
| primitives are difficult to master, but that is a different
| beast and if you understand concurrency from some other
| languages Go just makes that easy. But knowing all the
| behaviors of slices, and really intuiting how they work makes
| your life a lot easier and a lot less prone to bugs. And slice
| behavior almost all comes down to what is a slice internally
| and how and where am I copying things as I wrote my code.
| Generally the copying is fine, but in some cases it is not or
| it is in a tight performance critical section where you need to
| be thinking about it.
|
| Esit: Also, pointers into slices will probably leave you sad.
| You get a pointer to the slice storage, not a pointer to the
| thing. And if the slice resizes your pointer mow references a
| dead slice. Basically, pointers and slices are not friends.
| Unless you have a slice of pointers, which idiomatic go avoids
| unless there is a decent reason for it :)
| [deleted]
| tylerhou wrote:
| The performance issue here is not value semantics, it's the
| overhead of automatic lifetime management. The copy is cheap.
| The lifetime tracking is not because it forces a heap
| allocation and creates additional GC pressure. In fact,
| assuming Rule is small, if Match returned by _value_ , the code
| would be similarly as fast.
| cbsmith wrote:
| > The performance issue here is not value semantics
|
| There's no performance cost to value semantics, so of course
| not.
|
| > The copy is cheap. The lifetime tracking is not because it
| forces a heap allocation and creates additional GC pressure.
| In fact, assuming Rule is small, if Match returned by value,
| the code would be similarly as fast.
|
| I'm referring more to how this stuff seeps in without the
| programmer realizing it. It's the implicit nature of all this
| behaviour that is the problem.
| tylerhou wrote:
| Sorry, _because_ of value semantics. (Also, you wrote
| "performance costs of inopportune value semantics". So the
| correction is annoying; you know what I meant.)
|
| This stuff seeps in without the programmer realizing it
| because Go made a deliberate design decision to have
| automatic lifetime management. In other words, this is a
| feature of the language and not a bug. The only way for
| this to not seep in is if Go forced programmers to specify
| most lifetimes, which would make the language much more
| cumbersome to use.
|
| I.e., this is not a value-vs-reference semantics issue.
| It's a manual-lifetime-management vs automatic-lifetime-
| management issue. The solution is to either 1) write in a
| language with more explicit lifetimes if performance is
| that important, or 2) profile your code and reduce heap
| allocations, which is what the person who wrote the article
| did.
| cbsmith wrote:
| Sorry for being annoying. I wasn't trying to be.
|
| I would disagree about this stuff seeping in because of
| automatic lifetime management. The equivalent code in
| Java, Smalltalk, etc., would either not have the
| performance problem or it would be much more obvious that
| the code wasn't as efficient as it could be.
| tylerhou wrote:
| > Sorry for being annoying. I wasn't trying to be.
|
| Thanks, no worries.
|
| You're right that Java doesn't have this performance
| problem because it (apart from primitives) has no
| implicit copies. (So the code wouldn't copy, it would
| return a reference, hence no allocation.)
|
| I think what you're trying to say is: programs written
| languages with value semantics can implicitly create more
| objects, which (in some cases) have additional
| allocation/lifetime tracking in languages with automatic
| lifetime management. So maybe one solution is to prevent
| copies in most circumstances, and when you do need to
| copy, make them explicit.
|
| But I think this conflicts with Go's philosophy as well.
| First, in highly concurrent programs, copies are
| necessary. A mutex/atomic will become a bottleneck;
| sharing is bad for performance. This means that copies
| should be part of most types in the language. (If they
| aren't, you'll run into issues like not being able to
| pickle when using multiprocessing in Python [1].)
|
| OK, so we need copies. But clearly, I shouldn't have to
| write `a = b.Copy()` if b is an integer. That's too
| verbose. And if everything has a `.Copy()` call, that
| leads to visual noise and it's not clear when copies are
| actually expensive. So what set of "blessed" types should
| be implicitly copyable? Java picks primitives, but this
| means it's basically impossible to create an int-like
| object with value semantics. I think this is bad. C++ is
| at the other extreme with "(almost) all objects are
| implicitly copyable" but some people dislike how
| forgetting a & on a std::vector type might lead to a huge
| copy.
|
| Go has chosen a reasonable middle ground -- dynamically
| sized types like arrays and maps are not implicitly
| copyable, but "cheap" objects like structs are implicitly
| copyable. This means that when you see a Copy call, it's
| meaningful. But this means you run into these situations
| when an implicit copy interacting with lifetimes/garbage
| collection causes a slowdown. But I don't see any other
| alternative. Getting rid of implicit copies for cheap
| types will cause visual noise. Getting rid of GC makes
| your language harder to use. Both of these alternatives
| seem worse to me.
|
| To be clear, I'm not a huge fan of Go. Especially around
| the lack of monadic error handling. But I think they have
| done a decent job wrt. their design goals.
|
| [1] https://stackoverflow.com/questions/8804830/python-
| multiproc...
| [deleted]
| Jemaclus wrote:
| I'm a dumb dumb. Can you define "implicit reference semantics"
| and "value semantics"? You use the phrases several times in
| your post, but I don't really understand what you mean. If it
| helps, I'm not a C++ programmer, but I am familiar with higher
| level languages like Go, Python, Ruby, PHP and Javascript.
| scubbo wrote:
| Thank you for asking - I, too, was a dumb dumb.
| gdwatson wrote:
| "Implicit reference semantics" means that variables
| ordinarily refer to objects rather than containing them.
| "Value semantics" means that variables contain values rather
| than references, though there are pointer types that let you
| explicitly store references when you want them. (Often this
| is discussed in terms of parameter passing, for historical
| reasons, but the same ideas apply to variables more
| generally.)
|
| If your language has implicit reference semantics, "x = y"
| will cause x and y to refer to the same object. If it has
| value semantics, x will be a copy of y.
| Jemaclus wrote:
| Thank you
| masklinn wrote:
| "Implicit reference semantics" = everything (or near enough)
| is implicitly a reference. That's what you get in Python,
| Ruby, or Javascript: when you pass something around, you're
| really passing a reference to it, a pointer. Everyone ends up
| referring to the same thing, and sharing it.
|
| "Value semantics": you pass the value itself around, which
| means you're shallow-copying it all the time. That's what you
| get when you pass or return a bare non-interface type in Go.
|
| PHP is a bit of a mess, objects are passed by reference (=
| implicit reference semantics) but arrays are passed by value,
| and you can opt them into pass-by-reference. You can also
| pass non-arrays by reference, though I don't think that's
| very common.
| strifey wrote:
| If you're familiar with Python and Go, you'll likely be able
| to quickly spot the differences in how they handle parameter
| passing. Python uses references and Go uses value.
| erik_seaberg wrote:
| Go maps and channels have reference semantics. Slices are
| passed and returned by value, but backing arrays are often
| shared (not copied) in an error-prone way, so it's safest
| to pretend they have move semantics and treat only one copy
| as valid.
|
| IMHO they should have used pointers everywhere they didn't
| want value semantics with deep copies.
| klodolph wrote:
| Languages like Python, Java, and C# have implicit reference
| semantics. When you create an object, you get a pointer to
| that object (usually, or commonly).
|
| In languages like C, C++, Go, Rust... references are more
| explicit. If you want a pointer to an object, you have to &,
| or something similar.
|
| It gets a bit fuzzy.
| Jemaclus wrote:
| Thank you
| TheRealPomax wrote:
| It's the difference between assigning/passing around "copies
| of the data" vs. assigning/passing around "the memory address
| for that data" under the hood.
|
| PHP, for example, has explicit references. If you have an
| `$arr1=array(1,2,3)` and an `$arr2 = $arr1`, that second
| array is a full copy of the first array, and updating $arr1
| does nothing to $arr2. Similarly, `function
| update_array($arr) { $arr[0] = 'cake'; }` called with `$arr1`
| will create a _copy_ of $arr1 for use inside that function.
| Any changes to `$arr` inside the function only apply to that
| copy, not to $arr1. _Unless_ you explicitly tell PHP you want
| to work with references, by using `function update_array(
| &$arr) { ... }` instead. PHP uses value semantics.
|
| JS, on the other hand, uses implicit reference semantics. If
| you have a `const arr1 = [1,2,3];` then `arr1` is an _alias_
| , simply pointing to a memory location, and declaring a
| `const arr2 = arr1`, makes arr2 _also_ be an alias for that
| same memory location. They both reference the same thing, but
| neither "is" the thing. Similarly, if you have a `function
| updateArray(arr) { arr[0] = 'cake'; }` then that `arr` is
| _also_ just an alias pointing to the memory location of
| whatever you passed in, and changes to `arr` change the
| underlying data, so whether you try to access that through
| `arr1`, `arr2` and `arr`, it 's the same thing. JS uses
| implicit reference semantics.
|
| (But note that many languages with implicit reference
| semantics make exceptions for immutable primitives like
| numbers or strings)
| datalopers wrote:
| That's not technically correct with regards to PHP. Your
| statement that any changes to $arr1 or $arr2 only impact
| the one in question, however, is accurate. If no changes
| are made they still refer to the same data in memory. It's
| copy-on-write semantics.
|
| $arr1 = [1,2,3]; // $arr1 is a pointer to a zval array
| [1,2,3] and refcount:1
|
| $arr2 = $arr1; // $arr1 and $arr2 are pointers to the same
| zval array but incremented refcount to 2.
|
| $arr2[] = 4; // at this point, $arr2 creates a new zval
| array, copies over the data from the first, sets the
| recount to 1, and appends int(4). The [1,2,3] array has its
| refcount decremented to 1 as it's still referenced by
| $arr1.
|
| [1] http://hengrui-li.blogspot.com/2011/08/php-copy-on-
| write-how...
| TheRealPomax wrote:
| true, good point.
| morelisp wrote:
| This isn't semantics, though, but implementation. As far
| as I'm aware there's no way to "observe" the copy short
| of digging into runtime debugging tools, so I think it's
| still fair to say the language has pass-by-value
| semantics. In C++ we still refer to things returned-by-
| value as returned by value despite the reality of NRVO,
| etc.
| johannes1234321 wrote:
| Actually that is outdated since at least PHP 7. In modern
| PHP the engine uses copy on write only for "larger"
| things, but "small" things like integers or booleans live
| on the stack and are copied around, avoiding heap
| allocations etc.
| Jemaclus wrote:
| Thank you
| unfunco wrote:
| PHP has implicit references too, objects are passed by
| reference by default (primitives by value, like your note)
| marcosdumay wrote:
| a = [1, 2]
|
| b = a
|
| b[0] = 3
|
| print(a)
|
| What does the above print? If the language implements
| reference semantics, it prints [3, 2]. If it implements value
| semantics, it prints [1, 2].
| Jemaclus wrote:
| Thank you
| brundolf wrote:
| Having used C++ relatively little, and a long time ago (and
| never used Go), I don't think I ever realized that value
| semantics could be used to implicitly clone _heap_ objects
|
| Yeesh
| masklinn wrote:
| Normally they're not, that's really specific to C++ nonsense.
|
| In a normal value-semantics system if you have a pointer you
| just copy the pointer. Obviously if the langage doesn't have
| your back it also means you've now fucked up your ownership,
| but you're in C so that was to be expected.
| brundolf wrote:
| Yeah, this was my assumption when I read the original
| comment, so I was confused about what it was saying (I
| thought it was suggesting you should pass references
| instead of doing a bitwise-copy most of the time, which is
| just plausible enough that I believed it for a minute)
| until I read further down in the thread and realized they
| meant implicit heap-cloning
| titzer wrote:
| I think the main issue in C++ isn't value semantics, it's deep
| copy semantics. E.g. in a functional language ADTs are
| immutable and don't have identity. They can be freely copied,
| or not, passed by reference or by value, but they are never
| _deep copied_. Comparison may be deep, but not passing them.
|
| That is to say, I think I mostly am agreeing with you. In Java,
| objects are always passed by reference, never by value, and
| never implicitly copied. But Java doesn't have _any_ value
| types other than primitives. When I added ADTs to Virgil, I
| wanted to get the best of both worlds; objects are pass by
| reference, like Java, and ADTs are immutable, identity-less, so
| they are deep compared but never deep copied. (ADTs in Virgil
| can also sometimes be flattened, so they don 't necessarily
| result in heap allocation).
| nyanpasu64 wrote:
| I still don't see why value structs need to be immutable;
| ints are mutable in all languages, and structs are mutable in
| C, C++, and Rust (if you `let mut`) and it's a feature of the
| language.
| cbsmith wrote:
| In a functional language, you don't have to worry about bits
| of code mutating your data. ;-) On the flip side, there's a
| lot of cognitive load that comes with functional languages,
| so while they do address the problem neatly...
|
| I'd have to take a look at Virgil to appreciate your
| approach, but I'm always leery of implicit value vs.
| reference semantics tied to types (aside from the whole array
| fiasco, easily the ugliest part of Java's type system). So
| often the particular semantics you want are driven by the
| context of what you're doing, rather than the what you're
| doing it with.
| thomascgalvin wrote:
| > I increasingly see the wisdom of languages with implicit
| reference semantics.
|
| I spend most of my time in a JVM language of one flavor or
| another, and when I was learning Go, the first thing that stuck
| out at me was, "why would I _ever_ want the compiler to
| invisibly copy a data structure for me? "
|
| I suppose the primary reason is to prevent the callee from
| modifying the caller's data out from under them; unless you
| pass a reference value, you _know_ the callee cannot modify
| your data.
|
| But, as someone who leans heavily into "everything should be as
| immutable as possible," the second thing that stuck out at me
| was "wait, a struct can't have const fields?"
|
| When I write code, it's common to have references to immutable
| classes thrown around with wild abandon, heedless of ownership,
| threads, or good taste, because the data just can't change. But
| that's a paradigm that Go simply doesn't support.
| cbsmith wrote:
| > When I write code, it's common to have references to
| immutable classes thrown around with wild abandon, heedless
| of ownership, threads, or good taste, because the data just
| can't change.
|
| If there's anything I wish languages with implicit reference
| semantics would adopt, it's implicit immutability. I wish
| Java would be so much nicer with keyword that is half way
| between "final" and "volatile" that means, "yes, you can
| actually mutate this" and then make final semantics the
| default for fields & variables.
| cogman10 wrote:
| Agreed.
|
| Could you Imagine a Java where you have a `Map` and a
| `MutableMap` and that's what you put at your API? I'd make
| it SO much clearer how safe any individual API is to call.
| spockz wrote:
| Scala has had this for ages. You can have it today. Even
| in Java either through the Google collection library or
| through a library that mimics fp style programming. The
| name eludes me for the moment.
| gowld wrote:
| Guava, but it's not fully typesafe.
|
| Mutable is not part of the Java meta-language used for
| typing.
|
| ImmutableMap implements Map
|
| https://guava.dev/releases/23.0/api/docs/com/google/commo
| n/c...
|
| and throws exceptions on mutation.
|
| https://guava.dev/releases/23.0/api/docs/src-
| html/com/google...
| foverzar wrote:
| Kotlin has this
| thomascgalvin wrote:
| Kotlin has this, but the Map is (usually) a MutableMap
| under the covers, because it's Java bytecode at the lower
| levels. You have to go out of your way to footgun
| yourself, but it's still possible.
| morelisp wrote:
| In general this doesn't work, the history rule says
| mutable types are not proper subtypes of immutable ones
| (and the converse is obvious). If you want to capture
| mutability in your type system, it needs to be orthogonal
| to subtyping (like C/C++ const).
| cogman10 wrote:
| > the history rule
|
| I'm unfamiliar with this rule (and not finding anything
| good to google). Can you elaborate?
|
| I can't really think of a scenario where an immutable
| datastructure isn't a subset of actions against a mutable
| datastructure.
| kapep wrote:
| I had to look it up too, it apparently is a constraint
| for subtypes defined in Liskovs substitution principle
| [1]. From Wikipedia:
|
| > History constraint (the "history rule"). Objects are
| regarded as being modifiable only through their methods
| (encapsulation). Because subtypes may introduce methods
| that are not present in the supertype, the introduction
| of these methods may allow state changes in the subtype
| that are not permissible in the supertype. The history
| constraint prohibits this. It was the novel element
| introduced by Liskov and Wing. A violation of this
| constraint can be exemplified by defining a mutable point
| as a subtype of an immutable point. This is a violation
| of the history constraint, because in the history of the
| immutable point, the state is always the same after
| creation, so it cannot include the history of a mutable
| point in general. Fields added to the subtype may however
| be safely modified because they are not observable
| through the supertype methods. Thus, one can define a
| circle with immutable center and mutable radius as a
| subtype of an immutable point without violating the
| history constraint.
|
| [1]: https://en.wikipedia.org/wiki/Liskov_substitution_pr
| inciple#...
| gowld wrote:
| [deleted]
| a_t48 wrote:
| C++ still has this problem -
| std::unordered_map<std::string, std::string>` and
| `std::unordered_map<std::string, const std::string>` are
| basically unrelated types - you can't const-cast the
| templated const away. (I may be misunderstanding here)
| masklinn wrote:
| > you can't const-cast the templated const away.
|
| That seems like a good thing. If you're handed a map to
| const values you can't just go "imma gunna mutate them
| anyway".
| a_t48 wrote:
| Yup, it's definitely a bit of a code smell if you do. The
| issue is more the reverse, though - I can't make a
| mutable map, then hand it by pointer/reference to
| something that says it wants an immutable map later.
| masklinn wrote:
| Not necessarily a bad thing either, things can get odd if
| you're not the only owner and it's mutated under you
| unexpectedly.
| [deleted]
| a_t48 wrote:
| That only matters if you're storing it. The big issue is
| 99% of code out there uses mutable template types for
| containers, and if you ever declare a container that
| doesn't have a mutable template type, you stub your toe
| as your new container isn't compatible with anyone else's
| code. You can't even easily copy your way out.
| std::unordered_map<std::string, const std::string> m;
| m["foo"] = "bar"; std::unordered_map<std::string,
| std::string> m2 = m;
|
| doesn't compile.
| morelisp wrote:
| Normally (when containers are not involved) this is
| exactly the point of a const cast.
| titzer wrote:
| > When I write code, it's common to have references to
| immutable classes thrown around with wild abandon, heedless
| of ownership, threads, or good taste, because the data just
| can't change. But that's a paradigm that Go simply doesn't
| support.
|
| You might get a kick out of Virgil. It's easy (and terse!) to
| define immutable classes and you can have immutable ADTs too.
| (plus tuples, generics with separate typechecking, etc).
| throwaway894345 wrote:
| I also have a background in C/C++, etc and I've only ever found
| myself missing value semantics when I use languages with
| implicit reference semantics. I guess I always figured the
| solution was "value semantics with better education / tooling".
| Education: people should understand value semantics. Tooling:
| imagine an IDE that highlights allocation points automatically
| (or perhaps the problem is implicit allocations rather than
| value semantics?).
| cbsmith wrote:
| > I've only ever found myself missing value semantics when I
| use languages with implicit reference semantics.
|
| Oh, I miss it every time. ;-)
|
| I will say though that some newer languages seem to have a
| confused idea about how to offer mixed semantics. A bunch of
| them tie semantics to types. The ideal interface can vary by
| usage context. It's hard enough getting the semantics right
| as the callee (as opposed to caller), let alone when you're
| defining a type that will be used by who knows how many
| interfaces.
|
| > I guess I always figured the solution was "value semantics
| with better education / tooling".
|
| I've always thought much the same, but I have slowly come to
| appreciate that it's more than just education & tooling. Even
| with good education & tooling, there's a cognitive load that
| comes with getting interfaces right that for the general case
| is just not worth it.
| adgjlsfhk1 wrote:
| I think this is half right. For anything 64 bits or
| smaller, value semantics are pretty much always going to be
| better. That said, being able to choose between value and
| reference semantics for larger objects per object is a
| pretty useful feature.
| cbsmith wrote:
| > For anything 64 bits or smaller, value semantics are
| pretty much always going to be better.
|
| That's assuming a 64-bit CPU (which admittedly seems like
| a reasonable assumption. The nice thing about the
| abstraction though is that there's nothing preventing the
| runtime from applying value semantics for those trivial
| small-object cases where they're obviously more
| efficient.
| throwaway894345 wrote:
| > I will say though that some newer languages seem to have
| a confused idea about how to offer mixed semantics. A bunch
| of them tie semantics to types.
|
| Curious about what you mean here. This sounds like C#'s
| class/struct distinction to me.
| cbsmith wrote:
| That's exactly the example I was thinking of.
| Serow225 wrote:
| The JetBrains IDEs can do this, at least for .NET
| ok123456 wrote:
| > Tooling: imagine an IDE that highlights allocation points
| automatically
|
| Rider does this already for C#.
| TremendousJudge wrote:
| >or perhaps the problem is implicit allocations rather than
| value semantics
|
| To me, this sounds like this is it. _Explicit is better than
| implicit_ is a very useful truism
| cbsmith wrote:
| The counter argument to the "explicit is better than
| implicit" is that abstraction & encapsulation are such
| significant force multipliers. If done properly, implicit
| is good. It's just that in case of copying, doing it
| "properly" is well nigh impossible.
| kazinator wrote:
| explicit implicit good * < *
| v v v bad * > *
|
| Good implicit is better than good explicit. (If all is
| good, go for implicit.)
|
| Bad explicit is better than bad implicit. (If all is bad,
| go for explicit; don't hide bad explicit with bad
| implicit.)
|
| Good explicit or implicit is better than bad explicit or
| implicit.
| codeflo wrote:
| > perhaps the problem is implicit allocations rather than
| value semantics?
|
| I think that's true. Expensive copies should never have been
| implicit. There was a story some time ago about a single
| keypress in the address bar of Chrome causing thousands of
| memory allocations. The culprit: lots of std::string
| arguments up and down the call stack.
|
| Rust gets this right, with the hindsight of C++'s example: "a
| = b" is a move operation by default and clone() is always
| explicit, except for plain data types where copying is
| literally memcpy -- and those are clearly marked as such by
| the type system.
| cbsmith wrote:
| IMHO, implicit allocations is a bit of a red herring. Yes,
| in C/C++ heap allocations are proportionately pretty
| expensive, but I've seen Java programs have just ridiculous
| amounts of implicit allocations but there really isn't much
| of a problem.
|
| But _allocations_ aren 't the same as copies, and the
| argument for reference semantics has always been that
| implicit copies are problematic. In your std::string
| example, having that many String copies in a Java program
| would be similarly terrible (and this sometimes happens by
| accident because of abstraction layers that hide all the
| copying going on under the covers).
|
| I do think Rust gets a lot of stuff right, but Rust's
| cognitive load is broadly recognized. I tend to see it as
| C++ with a lot fewer foot guns. ;-)
| codeflo wrote:
| I'm not sure I get what you mean. You wouldn't have that
| many String copies in Java by passing an unchanged String
| down the call stack. My point is that it's too easy to
| make this mistake in C++.
| cbsmith wrote:
| In Java, the mistake happens only when there's an
| abstraction that hides the copying from you, so it isn't
| implicit in the same way, but it's still implicit.
| morelisp wrote:
| Allocations are as damaging as your free function is
| slow.
|
| Java has a tremendously good GC, so can cope with lots of
| allocations. Go has an OK one, so needs some help (but
| mollifying it often pays dividends elsewhere in locality
| and memory usage too). C++ has your default system heap,
| good luck.
| throwaway894345 wrote:
| Historically Java has traded long pause times for fast
| allocations, although I'm of the impression that it has
| recently found a way to have its cake and eat it.
| klodolph wrote:
| Java has been tunable for a long time. Periodically, the
| recommended tuning changes, or new GC algorithms become
| available, etc. But it has long been possible to get
| short pause times with various combinations of choosing
| the right algorithm and writing your program the right
| way.
|
| I think what _really_ throws people off here is that
| getting good performance out of a Java application
| involves some skills which are alien to C++ programmers,
| and vice versa. You take an experienced C++ programmer
| and drop them into a Java codebase, they may have a very
| poor sense of what is expensive and what is cheap. Vice
| versa... experienced Java programmers don't do well in
| C++ either.
|
| The result is that you have precious few people with any
| significant, real-world experience fixing performance
| issues in both languages.
| throwaway894345 wrote:
| Agreed, but usually tuning for short pause times involves
| trading off throughput or allocation performance. But at
| the end of the day, if you aren't allocating a bunch of
| garbage in the first place, then you don't need to be as
| concerned about the costs of allocating or cleaning up
| the garbage. I wish Go did more to make allocations
| explicit so they could be more easily recognized and
| avoided; I dislike Java's approach of making allocations
| even more implicit/idiomatic while trying to fix the cost
| problem in the runtime (although I admire the ambition).
| lazide wrote:
| What do you consider a 'long' pause time?
|
| I've had no issues with Java 17+ under heavy
| allocation/garbage collection (data encryption pipeline I
| haven't tuned to reuse buffers yet), and it's pause times
| are on the order of a handful of milliseconds, without
| meaningful tuning. I think it's doing something like a
| GB/s of garbage collection.
|
| And the jvm in question is doing a LOT more than just
| this, so it's coping with millions of allocated objects
| at the same time.
| throwaway894345 wrote:
| > Yes, in C/C++ heap allocations are proportionately
| pretty expensive, but I've seen Java programs have just
| ridiculous amounts of implicit allocations but there
| really isn't much of a problem.
|
| Java programs make "ridiculous amounts of implicit
| allocations" because allocations are cheap in Java. And
| they need to be cheap because Java doesn't have value
| semantics so it leans hard on escape analysis + cheap
| allocations.
|
| I agree with the rest of your comment, although I think
| most of Rust's "cognitive load" amounts to borrow-
| checker-vs-garbage-collection. You could envision a Rust
| with explicit allocations and a GC, and that language
| would have a "cognitive load" approaching that of Go
| while also being a fair bit more performant insofar as
| people can much more easily reason about allocations and
| thus performance.
| jhoechtl wrote:
| A long long time ago Rust was a GC language.
| cbsmith wrote:
| > Java programs make "ridiculous amounts of implicit
| allocations" because allocations are cheap in Java. And
| they need to be cheap because Java doesn't have value
| semantics so it leans hard on escape analysis + cheap
| allocations.
|
| Yes, but that's kind of the point, right? Implicit
| allocation isn't really a problem because a runtime that
| optimizes the allocations magically for you is a lot
| easier to build than a runtime that optimizes whether you
| really need to be copying objects as much as you do.
| throwaway894345 wrote:
| > Implicit allocation isn't really a problem because a
| runtime that optimizes the allocations magically for you
| is a lot easier to build
|
| As far as I know, Java's (default) runtime gives cheap
| allocations at the cost of long GC pause times.
|
| > than a runtime that optimizes whether you really need
| to be copying objects as much as you do
|
| It's not "copying", it's "allocating", and avoiding
| allocations isn't that much work (and frankly I'm
| surprised it's such a minor problem that no one has
| bothered to build an IDE plugin that highlights these
| allocation points automatically--or at least I haven't
| heard of such a thing). Anyway, "a runtime that minimizes
| allocations" is just an escape analyzer and Java has one
| of these too, and IIRC it's a lot more sophisticated than
| Go's (but it's also a lot harder to reason about as a
| consequence).
| cbsmith wrote:
| > As far as I know, Java's (default) runtime gives cheap
| allocations at the cost of long GC pause times.
|
| "long GC pause times" is kind of vague, so I guess you
| could be correct, but in practice there's a LOT of
| different ways the memory management can be handled, many
| of which are deemed "pauseless GC" (though the term is
| somewhat misleading).
|
| My statement was considering that reality though. While
| not true for some use cases, in the vast majority of
| cases, the runtime optimizes the allocations more than
| sufficiently.
|
| > It's not "copying", it's "allocating"
|
| Allocators can do a pretty good job of minimizing the
| overhead of allocation, to the point the amortized cost
| isn't much more than a single machine instruction.
| Allocating gigabytes of memory quickly is possible.
| Copying the data can be a lot more work, and often
| objects have copy semantics that add a lot more
| additional work.
|
| > Anyway, "a runtime that minimizes allocations" is just
| an escape analyzer and Java has one of these too, and
| IIRC it's a lot more sophisticated than Go's (but it's
| also a lot harder to reason about as a consequence).
|
| I think you're implicitly saying "a runtime that
| minimizes _heap_ allocations " there, in which case I'd
| agree.
| throwaway894345 wrote:
| > in practice there's a LOT of different ways the memory
| management can be handled, many of which are deemed
| "pauseless GC" (though the term is somewhat misleading).
|
| Yes, but I'm pretty sure those "pauseless GC" schemes
| impose other tradeoffs.
|
| > My statement was considering that reality though. While
| not true for some use cases, in the vast majority of
| cases, the runtime optimizes the allocations more than
| sufficiently.
|
| I'm not sure I follow. The same could be said for Go--in
| the vast majority of cases, Go's tradeoffs (slow
| allocations, low latency / non-moving GC) are also
| suitable.
|
| > Allocators can do a pretty good job of minimizing the
| overhead of allocation, to the point the amortized cost
| isn't much more than a single machine instruction.
|
| As far as I know, speeding up allocations to this degree
| _requires_ a moving GC which imposes a bunch of other
| constraints (including copying a bunch of memory around).
|
| > Allocating gigabytes of memory quickly is possible.
| Copying the data can be a lot more work, and often
| objects have copy semantics that add a lot more
| additional work.
|
| Yes, but the bottleneck here wasn't the copying, it was
| the allocations. And if you optimized away allocation
| cost entirely such that only the copy cost remained, that
| cost would be so small that the OP would never have
| bothered to profile because copying small objects like
| this is so cheap compared to everything else (even if it
| is expensive compared to bump allocating).
|
| > I think you're implicitly saying "a runtime that
| minimizes heap allocations" there, in which case I'd
| agree.
|
| Yes, the allocator and GC are concerned with heap
| allocations and not stack allocations. I'm using
| "allocations" as a shorthand for "heap allocations".
| cbsmith wrote:
| In hindsight, I think I chose how to present this poorly,
| because yes, in this case, the allocation is what is
| killing the performance. I look at it, and I just see
| unnecessary implied behaviour creating a performance
| problem. Usually it isn't the allocations themselves that
| kill you, but it certainly is the case here.
| throwaway894345 wrote:
| Allocating isn't "an expensive copy"; it's not analogous to
| clone() in Rust. The copy isn't the problem, it's the
| allocation.
| cbsmith wrote:
| I'd argue quite the reverse. Allocation can be quite
| efficient if done properly, but copying involves a lot of
| other work.
| throwaway894345 wrote:
| I disagree--the bottleneck here is entirely the
| allocation. The copying is just a memcpy and it's very
| fast for small structs like this; like I said, it's not
| the same as a clone() in Rust, which is a deep copy. If
| you optimized the allocation away entirely (leaving only
| the copy cost), there wouldn't have been a significant
| performance problem and this blog would never have been
| written.
| xdavidliu wrote:
| > except for plain data types where copying is literally
| memcpy
|
| what do you mean by this? If I say `let x = 5; let y = x;`
| in rust, that's a "plain data type copy" of a stack value,
| but memcpy is usually used to copy heap memory. What
| connection between copying of primitive simple stack values
| and memcpy are you suggesting here?
| kccqzy wrote:
| The compiler can optimize memcpy with a known size into a
| small number of move instructions so they are identical
| to copying stack values.
|
| Try playing with memcpy on Godbolt and you'll find that
| the compiler will compile the memcpy to a single mov
| instruction when the size is small, and some
| movdqu/movups when the size is slightly large, and only a
| function call when the size is huge.
|
| > memcpy is usually used to copy heap memory
|
| memcpy is often used in low-level serialization /
| deserialization code since you can't just cast a buffer
| pointer to a uint32_t pointer and dereference that; the
| solution is memcpy between variables that are often both
| on the stack.
| orra wrote:
| > What connection between copying of primitive simple
| stack values and memcpy are you suggesting here?
|
| They're just using 'memcpy' as a shorthand for saying the
| bitpattern is blitted. Semantically, that's like a
| memcpy. The point is, there are no expensive allocations,
| nor do any embedded pointer fields need adjusted, etc.
| dimitrios1 wrote:
| As a current Go programmer, hunting down escaping pointers via
| escape analysis is a common part of the routine now.
| AtNightWeCode wrote:
| So, this is very basic Go design and you could write something
| about how it works in C and Go and why a older lang like C don't
| have this prob but then at the end of the day the Go fanclub will
| down vote the hell out you no matter what.
| AtNightWeCode wrote:
| Go compiler is garbage by the design. A 20 year old C compiler
| does not have this prob. This is also why Go have declined so
| much during the last couple of years. The benefits of Go have
| not increased and most of the quirks are still there. Like the
| error handling, the naive compiler and the syntax sugar that
| somewhat hides the diff between pointers and direct heap
| allocs.
|
| -1
| kosherhurricane wrote:
| I work on a code base that is a mixture of Go and C.
|
| It's IO, CPU and Memory hungry, and it's distributed.
|
| C is fast because it's close to how CPU and memory actually
| work. Go gives you 95+% of that plus easy to learn, easy to
| use language. A new person could start contributing useful
| features and bug fixes immediately. A senior person could get
| C-level performance.
|
| More and more of our code is moved from C to Go, with very
| little performance penalty, but with a lot more safety and
| ease of use.
|
| Our customers benefit, and our company makes more money.
|
| In the end, that's what software is about.
| dist1ll wrote:
| > C is fast because it's close to how CPU and memory
| actually work.
|
| Out-of-order execution, cache hierarchies, branch
| prediction, virtual memory, pipelining, vector
| instructions, ILP, NUMA are all pretty transparent to the C
| spec.
|
| Trying to accommodat hardware quirks with C feels like
| blackbox engineering. It's certainly better than with
| managed languages but still....
| fear91 wrote:
| Can you show any examples of "go compiler being garbage"? In
| my experience, it often generates much smarter code than C#
| or Java.
| asim wrote:
| If you want to have a solid understanding and need to do it in
| just a few hours here's a few things to review.
|
| - The Go programming language spec https://go.dev/ref/spec
|
| - Effective Go https://go.dev/doc/effective_go
|
| - Advanced Go concurrency patterns
| https://go.dev/talks/2013/advconc.slide#1
|
| - Plus many more talks/slides https://go.dev/talks/
| coder543 wrote:
| There is potentially another option: use the midstack inliner to
| move the allocation from the heap to the stack of the calling
| function: https://words.filippo.io/efficient-go-apis-with-the-
| inliner/
|
| As long as the global slice is never mutated, the current
| approach is probably fine, but it is definitely a semantic change
| to the code.
| ploxiln wrote:
| That seems like overkill for this particular case, but it's a
| very interesting technique, thanks for the link!
| morelisp wrote:
| Packages should be exposing an API with destination slices more
| often to begin with. The stdlib is pretty good about this
| (there's a few missing though 1.19 closed the most obvious
| absences), but most third-party code is awful. Or worse, it
| only takes strings.
| throwaway232iuu wrote:
| Why doesn't go use RVO like C++ and Rust?
|
| https://en.wikipedia.org/wiki/Copy_elision#Background
| coder543 wrote:
| I don't think we're on the same page about what midstack
| inlining is being used for in my suggestion. This discussion
| is about eliminating a heap allocation, which as far as I
| understand, RVO _never_ does. Please read the article I
| linked if you want to discuss this further. I don't want to
| repeat the article pointlessly.
|
| I'm also fairly sure Go uses RVO here too, which cuts down on
| the number of times the object is copied around, but again,
| it's irrelevant to the discussion of heap allocations.
| Copying the object isn't the performance problem here,
| needlessly allocating a very short-lived object on the heap
| over and over is.
| amluto wrote:
| Somewhat off topic, but I find a different part of this to be
| quite ugly: if match || err != nil {
| return rule, err }
|
| Translating this code to actual logic takes too much thought and
| is too fragile. Is that an error path or a success path? It's
| both! The logic is "if we found a rule _or_ if there was an error
| then return a tuple that hopefully indicates the outcome". If any
| further code were to be added in this block, it would have to be
| validated for the success and the error case.
|
| But this only makes any sense at all if one is okay with reading
| Go result returns in their full generality. A canonical Go
| function returns either Success(value) or Error(err not nil,
| meaningless auxiliary value). And this code has "meaningless
| auxiliary value" != nil! In fact, it's a pointer that likely
| escapes further into unrelated error handling code and thus
| complicates and kind of lifetime or escape analysis.
|
| I don't use Go, but if I did, I think this part of the language
| would be my biggest peeve. Go has very little explicit error
| handling; fine, that's a reasonable design decision. But Go's
| error handing is incorrectly typed, and that is IMO _not_ a
| reasonable design.
| ericbarrett wrote:
| I write a lot of Go, and I agree that this is a big wart in its
| error handling that would be served by a proper Result type.
|
| Nevertheless, the convention is that if a function returns
| (value, err), and err != nil, the value is discarded (I think
| of it as "undefined"). So the code is conventional.
| za3faran wrote:
| What I found frustrating that there is nothing stopping a
| function from returning both a non nil value and a non nil
| error. I've seen weird code that did that and it could have
| easily resulted in incorrect behavior.
| klodolph wrote:
| The one place where you'll actually see this is in Read().
|
| If you try to read 1000 bytes, you might get "800 bytes
| read, EOF" as a result. The worst part is that os.File
| won't ever do this!
| ok_dad wrote:
| I really hate when people use "errors" for signals. Or,
| alternately, use the signals mechanisms for actual
| errors. An error should be "something went wrong", and
| reading a file to the EOF is not "going wrong", that's
| just what the "read()" should do if you tell it so!
|
| I like Go's multiple returns and error checking by
| default, but it definitely should have been implemented
| with some sort of "Result/Error" type union instead. Go
| made so many good decisions, that I am frankly amazed
| they made this one really bad one.
| robotresearcher wrote:
| If you take the position that the semantics of read() are
| 'read data from this endless stream', then EOF is an
| error indicating that this model no longer applies.
|
| Doesn't bother me at all.
| nemo1618 wrote:
| Presumably this is for performance reasons: if a `Read`
| call involves, say, a network request, then returning
| "800, EOF" in one roundtrip is certainly nicer than "800,
| nil" -> "0, EOF" in two roundtrips. In practice... I
| suspect this happens rarely enough that it's not worth
| the cost. The performance edge case could be handled with
| an alternative API, e.g. an optional EOF() method.
| amluto wrote:
| In C, "discarding" a pointer in a way that leaves the value
| visible is quite common. At least if one doesn't accidentally
| use the pointer, it's harmless. (In the way that all manner
| of unsafeness is harmless in C as long as no actual UB
| occurs, which is to say it's not great.)
|
| But Go is a garbage collected language, and there is so such
| thing as "discarding" a pointer. Either it's there or it
| isn't, and this kind of leak has side effects. I find it
| baffling that the language designers and the community
| consider this acceptable.
|
| (One thing I really like about Rust is that you can't play
| fast and loose with lifetimes like this. If you have function
| taking &'a Vec<T> and you return &'a T, you can't arbitrarily
| "discard" and leak that reference up the call chain. You need
| to genuinely get rid of it by the time the vector is gone.)
| slcjordan wrote:
| They should probably replace that with 2 if statements to make
| the error path and the non-error path obvious:
| if err != nil { return nil, err }
| if match { return rule, nil }
| foldr wrote:
| The regular return value doesn't have to be meaningless just
| because the error is non-nil. In this case the function returns
| the rule that triggered the error, which is potentially useful
| information. I don't think it is an official Go convention that
| err being non-nil entails that the other return value should be
| meaningless.
| za3faran wrote:
| It's quite error prone, most golang code is if err != nil {
| return nil, err }
|
| Now of course it's important to read the documentation, but a
| language with sum types (or exceptions) would have used a
| separate type to indicate an error condition plus useful
| information on that error.
| chubot wrote:
| FWIW, to prevent the bug where a = b is slow for big types,
| Google's C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN
| (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all
| types (most types?)
|
| Looks like that's been gone for awhile in favor of C++ 11 stuff,
| which I don't really like:
|
| https://google.github.io/styleguide/cppguide.html#Copyable_M...
|
| A lot of good software was written in that style, but it has
| grown bureaucratic over time, and as the C++ language evolved
| renewiltord wrote:
| Clear tutorial of how to go about identifying this. Good blog
| post. Since the problem was constrained and real, it helps
| someone know when to use these tools. Thank you for sharing.
| karmakaze wrote:
| > I did consider two other approaches: Changing
| Ruleset from being []Rule to []*Rule, which would mean we no
| longer need to explicitly take a reference to the rule.
| Returning a Rule rather than a *Rule. This would still copy the
| Rule, but it should stay on the stack instead of moving to the
| heap.
|
| > However, both of these would have resulted in a breaking change
| as this method is part of the public API.
|
| The problem with heap allocated objects could be due to the
| incorrect public API.
|
| The change that improves performance also gives out pointers to
| the actual elements of Ruleset itself permitting the caller to
| change the contents of Ruleset which wasn't possible before the
| speed-up. Perhaps you're already aware since change to []*Rule
| was being considered.
| ok_dad wrote:
| Sometimes it doesn't matter if a public API is incorrect,
| because it's set in stone for whatever reason, and you just
| need to fix the problem internally.
| spockz wrote:
| This is why I like https://github.com/openrewrite so much.
| One gets to tell users how to rewrite code automatically. It
| makes refactoring almost as easy as in a mono repo.
| bspammer wrote:
| This looks super interesting, but as far as I can tell they
| don't go into how to inform downstream users to add your
| recipes to their maven/gradle configuration when you make a
| breaking change.
|
| In my head, the ideal flow would be an annotation on your
| library function/class which triggers an IntelliJ
| suggestion for downstream users affected by your breaking
| change to run the automated refactor for them. Kinda like a
| more helpful @Deprecated.
| tuetuopay wrote:
| Aaaaaand that's why I love Rust's decision to make copies
| explicit with `.clone()`. Annoying as hell when you're not used
| to it but overall worth it.
| lanstin wrote:
| That seems like a potential for compiler optimization. It should
| already know that the rule value is only used one time, as the
| target of a & and this must be somewhat common in managing return
| values.
| ithkuil wrote:
| The semantics change. You're now returning a pointer to the
| actual Rule in the Ruleset, while before you'd be returning a
| pointer to copy of the Rule.
|
| The optimization would only work if you had a way to tell the
| compiler that some values are constant/immutable.
| ithkuil wrote:
| BTW; I'm using both Go and Rust lately.
|
| In Rust you can write a function that returns the pointer of
| one element of a slice. You can also write a function that
| returns the pointer to a heap-allocated copy of an element of
| the slice. The two functions would have different signatures.
|
| The compiler would also prevent mutation of the slice as long
| as there are any references to individual elements of the
| slice being passed around.
| ok123456 wrote:
| >In Rust you can write a function that returns the pointer
| of one element of a slice.
|
| Have fun fighting the borrow checker on that one.
| [deleted]
| oconnor663 wrote:
| It's not always so bad :) https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
| ok123456 wrote:
| Now try to actually do that in a larger program without
| static lifetimes...
| piperswe wrote:
| It shouldn't be too bad as long as you keep in mind that
| it is a reference to an item in that slice, so whatever
| that slice is pointing to needs to stick around as long
| as you're using elements from it. I don't often encounter
| borrow checker issues anymore, because once you program
| Rust long enough you know what things need to live for
| what lifetimes, and you architect your "larger programs"
| around that.
| lanstin wrote:
| Oh yeah. Too bad. Just ran the escape to heap analysis on my
| current project, not looking too promising. Mostly it is
| allocating structure and saving them in a huge in memory
| hash.
| [deleted]
| kgeist wrote:
| I don't think it can be optimized without altering semantics.
| If it's a pointer to a value in the slice, changing the slice's
| values (ruleset[i] = ...) will be reflected in all Rule values
| returned from the function, because they all point to the same
| memory. In the same way, changing the returned value's fields
| will change the data in the original slice. The author's code
| is prone to this behavior after the change.
|
| When it's a pointer to a copy, no such implicit dependencies
| occur.
| derefr wrote:
| You're not wrong in general, but one interesting thing about
| Go as an ecosystem (rather than as a language) is that golang
| programs are mostly statically compiled -- all sources, one
| pass, one code unit, one object-code output -- so they're
| (theoretically) very amenable to compile-time (rather than
| link-time) Whole-Program Optimization techniques.
|
| In this specific case, that technique would be whole-program
| dataflow analysis. Given a Golang function that passes out
| references-to-copies-of owned data, you _could_ actually
| determine for certain -- at least in the default static-
| binary linkage mode -- whether these two properties hold
| universally within the resulting binary:
|
| 1. whether _no_ caller of the function _will ever try_ to do
| anything that would cause data within their copy of the
| struct to be modified;
|
| 2. whether the owner of the data _will never_ modify the data
| of the original struct in such a way that, if the copy were
| elided, the changes would be "seen" by any reads done in any
| of the callers. (The owner could still modify internal
| metadata within the struct for its own use, as long as such
| internal metadata is 1. in private fields, 2. where all
| callers live outside the package defining the struct, making
| those fields inaccessible; and 3. the fields are never
| accessed by any struct methods called by borrowers of the
| struct -- keeping in mind that such methods can be defined
| outside the package by caller code.)
|
| If you could prove both of these properties (using dataflow
| analysis), then you could safely elide the copy within the
| function, turning the return of a reference-to-a-copy-of-X
| into a return of a reference-to-X.
|
| (And, in fact, if you can only prove the second property
| universally, and the first property in specific instances,
| then you can still elide the copy from the function itself;
| but you'd also generate a wrapper function that calls said
| function [receiving a reference-to-X], copies, and so returns
| a reference-to-a-copy-of-X; and then, for any call-site where
| the first property _doesn 't_ hold -- i.e. callers whose
| transitive call-graph will ever modify the data -- you'd
| replace the call to the original function with a call to the
| wrapper. So "safe" connected caller sub-graphs would receive
| references, while "unsafe" connected caller sub-graphs would
| receive copies.)
| oconnor663 wrote:
| I think the optimization is only valid if we know that nothing
| is ever going to use thr returned pointer to do mutation.
| gp wrote:
| I was trying to debug and improve the performance of some
| parallelized C++ code over the weekend for parsing CSV files.
| What would happen was parsing each file (~24k lines, 8 columns)
| would take 100ms with one execution context, but when split
| across many threads, the execution time of each thread would slow
| down proportionally and the throughput of the whole program would
| strictly decrease as thread count increased.
|
| I tried all of the obvious things, but the offender ended up
| being a call to allocate and fill a `struct tm` object from a
| string representation of a date. This doesn't have any obvious
| reasons (to me) that it would cause cache invalidation, etc, so
| I'm a little in the dark.
|
| Still, replacing this four line block improved single threaded
| performance by 5x, and fixed the threaded behavior, so on the
| whole it is now ~70x faster and parses about 400mb of csv per
| second.
| Quekid5 wrote:
| False sharing, maybe?
| gp wrote:
| Not a bad suggestion - thanks for the idea
| infamousclyde wrote:
| Thank you for sharing. I'm curious if you would recommend any
| good resources for profiling with Go. I enjoyed your code
| snippets and methodology.
| assbuttbuttass wrote:
| Returning a pointer to a local variable is convenient, but can be
| a source of hidden allocations.
|
| It's best to treat each struct as a "value" or "pointer" type,
| and use one or the other consistently for each type. This mostly
| avoids the need to use & in the first place
| hoosieree wrote:
| > You can see these decisions being made by passing -gcflags=-m
| to go build:
|
| That's a very nice feature! I wonder if compilers for other
| languages have something similar.
| throwaway894345 wrote:
| Does anyone with VS Code or vim plugin development experience
| know how hard it would be to call this behind the scenes and
| highlight allocation points in the editor?
| _old_dude_ wrote:
| In Java -XX:+PrintCompilation
| -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
|
| and -XX:+UnlockDiagnosticVMOptions
| -XX:+LogCompilation
|
| The generated logs can be seen with JITWatch [1].
|
| [1] https://github.com/AdoptOpenJDK/jitwatch
| masklinn wrote:
| You'd have to see if all compilers support it, but LLVM has a
| "remarks" system, which should provide similar information
| (though likely a lot more of it) for optimization passes which
| are traced: https://llvm.org/docs/Remarks.html#introduction-to-
| the-llvm-...
|
| The frontend may or may not have its own optimizations and logs
| tho e.g. rustc now has MIR optimizations (https://rustc-dev-
| guide.rust-lang.org/mir/optimizations.html) but while you can
| dump MIR data (https://rustc-dev-guide.rust-
| lang.org/mir/debugging.html) I don't remember seeing an
| optimisation log.
|
| At the end of the day, I think it's more likely that you take a
| look at the assembly and infer problems from there if the
| profiler doesn't tell you straight. An other difference is the
| _kind_ of decisions the compiler makes e.g. while a compiler
| can optimize away allocations in "manual allocation" languages
| (https://godbolt.org/z/5nEo7xjEr) the allocations are plainly
| visible, so if they're trivially avoidable... you'll just avoid
| them.
|
| Using Rust as an example, you'd have something like this:
| pub fn match_(&self, path: &str) -> Result<&Rule, Error> {
| for rule in self.0.iter() { if
| rule.match_(path)? { return Ok(rule);
| } } Err(Error) }
|
| You couldn't miss an allocation, because the return type would
| have to change, and you'd need to perform the copy out:
| pub fn match_(&self, path: &str) -> Result<Box<Rule>, Error> {
| for rule in self.0.iter() { if
| rule.match_(path)? { return
| Ok(Box::new(rule.clone())); } }
| Err(Error) }
| amtamt wrote:
| It falls in those 3% of code lines one should think of while not
| optimizing prematurely.
| [deleted]
| is_taken wrote:
| Would be interesting to see the performance difference if you
| undo that move-&-change and change the function signature from:
| func (r Ruleset) Match(path string) (*Rule, error)
|
| to: func (r *Ruleset) Match(path string) (*Rule,
| error)
| masklinn wrote:
| Likely none: Ruleset is type Ruleset []Rule
|
| The original code creates a local copy of a rule and explicitly
| returns a pointer to _that_. Taking the ruleset by address
| wouldn 't change that issue.
| Beltalowda wrote:
| The deeper lesson here is "don't use pointers unless you're sure
| you need them". I've seen quite a few people use pointers for no
| reason in particular, or there's simply the _assumption_ it 's
| faster (and have done this myself, too), but it puts a lot more
| pressure on the GC than simple local stack variables.
|
| Of course sometimes pointers are faster, or much more convenient.
| But as a rule of thumb: don't use pointers unless you've got a
| specific reason for them. This applies even more so if you're
| creating a lot of pointers (like in a loop, or a function that
| gets called very frequently).
| throwaway894345 wrote:
| Eh, I've waffled a couple of times between "pass values by
| default" and "pass pointers by default". Ultimately, I don't
| think there's a really good answer except to understand escape
| analysis and get comfortable with the escape analyzer. Notably,
| "using pointers" doesn't inherently put pressure on the GC, but
| rather _allocations_ put pressure on the GC and there are
| plenty of instances where pointers don 't put pressure on the
| GC at all (notably, if you're passing data _into_ a function by
| pointer, it 's not going to allocate, but it may if you're
| returning a pointer to "a stack-allocated value").
|
| Notably, if you're defaulting to values, you may still have a
| bunch of allocations when you try to implement interfaces,
| which usually (always?) requires implicitly boxing values;
| however, if you pass a pointer into a function that takes an
| interface, I _don 't think_ it gets moved to the heap (but I'm
| not sure, which is why Go programmers need to be comfortable
| with profiling the escape analyzer and also why it would be
| great if Go actually had explicit semantics for allocations).
| bitwise101 wrote:
| Most of the times I use a pointer just because I want to return
| nil in case of error. Is this a valid reason?
| kccqzy wrote:
| That's not a deeper lesson. A deeper lesson is to understand
| _where_ the pointer points to and then decide accordingly.
| Beltalowda wrote:
| That is pretty much what I said, except phrased different.
| throwaway894345 wrote:
| It sounded to me like you were saying "avoid pointers by
| default" (bad advice IMO) rather than "if it matters,
| verify whether the pointer escapes" (good advice IMO).
| dist1ll wrote:
| > but it puts a lot more pressure on the GC than simple local
| stack variables.
|
| Do you have evidence for this claim? AFAIK the Go compiler does
| escape analysis, and allocates pointers that don't escape on
| the stack.
| throwaway894345 wrote:
| This is true, but it's hard to tell if a pointer escapes or
| not without actually profiling. That said, I don't think the
| answer is to avoid pointers, but rather to get comfortable
| with profiling the escape analyzer. By default, I just stick
| to the subset of Go which I know _won 't_ escape--functions
| can take pointer parameters, but I'm very careful about
| returning pointers to data that would otherwise be stack-
| allocated (even though it's not especially idiomatic, I'll
| often prefer mutating an `out _T` parameter rather than
| returning a `_ T` because I _know_ the former will not
| allocate).
| tonymet wrote:
| Overall good review of profiling tactics . But there's nothing
| egregious about Golang here . Pass by value vs reference is a
| common performance issue.
| masklinn wrote:
| > But there's nothing egregious about Golang here . Pass by
| value vs reference is a common performance issue.
|
| The trap here is that everything is passed by reference
| (pointer), but the intermediate _local_ value is, well, a value
| (a copy).
|
| Rule is not a gigantic monster struct (it's 72 bytes), chances
| are returning it by value would not have been an issue.
|
| Anyway I would say there _is_ an issue with Go here: it 's way
| too easy to copy out of a slice.
| sendfoods wrote:
| 1 character, in 2 places ;) I did not know profiling support for
| go was so seamless, thank you!
|
| May I ask, is that theme custom or available somewhere? I really
| enjoyed it
| blowski wrote:
| > is that theme custom or available somewhere
|
| Looks a bit like https://newcss.net/ or Water CSS
| hcm wrote:
| Thanks! It's just a few dozen lines of CSS. The body font is
| Inter and the monospaced font is JetBrains Mono.
| gwd wrote:
| > 1 character, in 2 places ;)
|
| Moving a single character from one place to another. :-)
|
| A good explanation of why "fire the developers with the lowest
| 50% of lines added" is an idiotic thing to do: this sort of
| deep analysis takes a lot of time and expertise, and frequently
| results in tiny changes.
| jimsmart wrote:
| From the headline alone, I guessed this was to do with
| pointers/references to values vs values themselves.
|
| Yep, with values that take a lot of memory, it's faster to pass
| pointers/references around than it is to pass the values around,
| because it is less bytes to copy.
|
| Of course there is more to such a decision than just performance,
| because if the code makes changes to the value which are not
| meant to be persisted, then one wants to be working with a copy
| of the value, not a pointer to the value. So one should take care
| if simply switching some code from values to pointers-to-values.
|
| All of these things are things that coders with more experience
| of languages that use such semantics kinda know already, almost
| as second nature, since the first day they got caught out by
| them. But everyone is learning, to various degrees, and we all
| have to start somewhere (i.e. knowing little to nothing).
___________________________________________________________________
(page generated 2022-11-14 23:00 UTC)