[HN Gopher] Classes vs. structs in .NET: how not to teach about ...
___________________________________________________________________
Classes vs. structs in .NET: how not to teach about performance
Author : GOPbIHbI4
Score : 86 points
Date : 2023-11-04 03:28 UTC (19 hours ago)
(HTM) web link (sergeyteplyakov.github.io)
(TXT) w3m dump (sergeyteplyakov.github.io)
| GOPbIHbI4 wrote:
| It is quite sad that a paid content like one from Pluralsight
| could provide a very bad advices on how to measure performance or
| how to use one or another language feature.
|
| Spoiler: the benchmarking code had quadratic complexity instead
| of linear. Yes, in the course on performance.
| karmakaze wrote:
| > It seems that the complexity is O(N^2) rather than O(2*N^2)
| as I expected. This is interesting! Obviously, my understanding
| of LINQ was incorrect.
|
| Perhaps just a wording problem, but big-O notation doesn't care
| about constant factors.
| klysm wrote:
| Not only does it not care, it's literally by definition that
| those two are identical
| klysm wrote:
| In practice though the factor of two can be relevant
| beart wrote:
| The author explicitly stated O(2*N^2) is the same as O(N^2),
| but maybe that was a later edit?
| nemetroid wrote:
| What the author means is that they thought that the
| complexity was O(n^2) for two different reasons, but it
| turned out that only one of those reasons was valid. But
| abusing big-O notation is not a good way to express that.
| SideburnsOfDoom wrote:
| This is where a well-placed .ToList() is called for, to reify
| avoid re-evaluating the enumeration.
|
| e.g.
|
| var classes = Names.Select(x => new PersonStruct { Name = x
| }).ToList();
|
| for (var i = 0; i < classes.Count; i++)
| mxz3000 wrote:
| I also believe that an IDE like Rider will complain that the
| IEnumerable returned by `.Select` is consumed multiple times.
| diarrhea wrote:
| Good point! I remember that being a lint in Visual Studio. A
| very valid one.
|
| In contrast, in Python, initial use exhausts generators.
| Subsequent iterations turns up empty. A gotcha, but also a
| way to highlight misuse, as it should show up in testing.
| starburst wrote:
| The first giveaway was the use of LINQ in a performance post
| phillipcarter wrote:
| This is a strange comment. LINQ is an extremely common way to
| write code in C#, and the performance of code that uses it is
| certainly relevant. Additionally, this is a performance
| _comparison_ post. If the baseline uses LINQ but compares
| something else, the other tests should also use LINQ.
| starburst wrote:
| LINQ is pretty much frowned upon when programming games in
| C#, also when doing performance comparison you want to get as
| close as possible to the actual code without the extra
| overhead.
|
| I would very much verify anything and not take it at face
| value when a C# performance post use LINQ.
| neonsunset wrote:
| The relationship between LINQ and performance is not
| trivial, it pretty much depends on what you do (more
| complex LINQ chain -> worse overhead).
|
| It does have base cost (allocating iterator object(s)), but
| it's less than what you think, I have seen enough game code
| that does intermediate list allocations when it doesn't
| need to, which are far costlier than LINQ.
|
| In addition, the benchmarks that do other positive work
| alongside the benchmarked aspect can sometimes be more
| illustrative and overall better because it is much more
| important how a particular approach works together with
| surrounding code, matching more closely real world
| scenarios.
|
| And last but not least - in this case using structs yields
| additional advantage with LINQ since monomorphization of
| methods where generic arguments are structs has additional
| codegen quality benefits.
| starburst wrote:
| It is certainly possible to write slow code without LINQ,
| all I'm saying is that I wouldn't blindly trust a blog
| post that talk about performance and use LINQ.
| neonsunset wrote:
| The article contents suggest deep understanding of the
| topic.
|
| This type of thinking ("LINQ bad" or "SOLID good") is one
| reason among many why bad patterns proliferate through
| the projects e.g. "hey you should rewrite this code with
| SOLID principles in mind" (without accounting for the
| context) or "This code calculates the sum using LINQ, you
| should rewrite it" (LINQ's Sum implementation uses SIMD
| and is hard to beat).
| starburst wrote:
| The article is fine, I was referencing the original
| article the article is referencing.
| bob1029 wrote:
| LINQ codepaths are only getting faster. A literal army of
| engineers is focused on this stuff full time.
|
| https://devblogs.microsoft.com/dotnet/performance_improveme
| n...
|
| > dotnet/runtime#64470 is the result of analyzing various
| real-world code bases for use of Enumerable.Min and
| Enumerable.Max, and seeing that it's very common to use
| these with arrays, often ones that are quite large. This PR
| updates the Min<T>(IEnumerable<T>) and
| Max<T>(IEnumerable<T>) overloads when the input is an int[]
| or long[] to vectorize the processing, using Vector<T>. The
| net effect of this is significantly faster execution time
| for larger arrays, but still improved performance even for
| short arrays (because the implementation is now able to
| access the array directly rather than going through the
| enumerable, leading to less allocation and interface
| dispatch and more applicable optimizations like inlining).
|
| What are the chances that you'd have patience to write a
| competitive bug-free SIMD implementation?
| fabian2k wrote:
| Games in C# might be written in Unity, and a lot of those
| improvements wouldn't apply there. So in that context
| this might be accurate because it's an entirely different
| runtime.
| whoknowsidont wrote:
| >LINQ is an extremely common way to write code in C#
|
| It is extremely uncommon in performance contexts. It is
| actively discouraged and removed when writing performant C#
| code.
|
| It is incredibly common in your "run of the mill" enterprise
| apps, or contexts where performance can slow down a bit for
| the sake of programmer happiness.
| throwaheyy wrote:
| It isn't uncommon when you know what it is doing. Wholesale
| removal or discouragement of LINQ is a sign of fake cargo-
| cult performance "optimization".
|
| It's perfectly fine to use if you learn about how it works
| and how to use it properly.
| whoknowsidont wrote:
| LINQ is going to add overhead, regardless of "properly"
| using it or "cargo-culting" things; save the platitudes
| for the Monday zoom meeting.
|
| LINQ adding overhead is a _technical reality_, it's how
| it works and that is fine. It's a fine tool in many
| difference contexts, but when we talk about performant
| code the context is obviously one in which every cycle
| matters.
|
| And those of us with enough experience know that LINQ
| performance and implementation details varies over time
| in the runtime, and those shifts aren't always positive.
|
| So when writing code where performance is fundamental to
| the success of the application, avoid LINQ since it WILL
| add overhead and it will remove implementation control
| from your team. It is a risk without much benefit when
| you're in the performance arena. That doesn't mean it's
| not useful in many other contexts.
| fabian2k wrote:
| There are cases where using the straightforward LINQ code
| would be a lot faster than a lower level alternative. For
| example when the code can be vectorized and use AVX
| instructions, which is implemented for quite a few LINQ
| methods. A straightforward non-LINQ version of the code
| would likely be slower as most developers would not or
| can't write the low-level AVX version.
|
| I'd certainly be careful about LINQ in certain
| performance-sensitive code, e.g. about creating
| unnecessary copies of the data and allocating too much.
| But I would not trust myself without measuring to really
| know whether it actually makes a difference or if my
| "optimized" code might be even slower.
| ttymck wrote:
| Wouldn't it be _more_ informative to see a "realworld" project,
| possibly built using LINQ, and the performance comparison done
| within the context of that project?
| starburst wrote:
| If what you want to measure is LINQ performance, sure, but in
| the context of measuring the language fundamental like class
| versus struct, it is an unnecessary overhead.
|
| The article itself says it:
|
| > However, the main reason why the benchmarks are not correct
| is because of LINQ and lazy evaluation.
| sixothree wrote:
| Honestly I wouldn't mind knowing that linq is fast when using
| one versus the other.
| chacham15 wrote:
| The n^2 argument the author points at is a red herring. The
| reason this is an invalid criticism of the benchmark is that both
| benchmarks are using the same query structure, so theyre both
| n^2. The author himself admits later on that the real issue is
| allocations. However, they posit that allocating of structs is
| done using a different allocator than classes. I dont know enough
| about C# to know if this is true, but even so, the advice that
| "structs are more performant overall" still holds...so this
| article seems to be mostly clickbait.
| neonsunset wrote:
| The criticism aims at poorly written benchmarking code that
| fails to evaluate its own claim and uses an anti-pattern. You
| may want to read the article in full.
|
| Also, structs do not use an allocator, this is basics of many
| programming languages - they simply represent a structure in
| memory, which by default is placed on the stack. Think an
| integer variable in a local method scope.
| astrange wrote:
| Allocation with a GC is typically not any more expensive than
| being "on the stack", so I don't think something being "on
| the stack" is a useful distinction.
|
| (And in a language with stackful closures the stack itself is
| GCed.)
| neonsunset wrote:
| This is incorrect in multiple ways. C# stack is completely
| native, identical to C++ or Rust.
|
| The following factors contribute to "structs being faster":
|
| - Heap allocations have go to through allocation calls,
| which need to find free memory, possibly zero it out, and
| then return pointer (reference) to it, both in managed and
| unmanaged languages, with C# being much faster at small
| object allocation (tlv read, pointer bump, and object
| header write) while unmanaged wins for large allocations
| instead (you don't have to go through LOH and extra cost
| associated with it). In comparison, stack is already zeroed
| for structs that are written to it, and those are just movs
| (or ldr/ldp's and str/stp's in case of arm64), and even
| then, only when spilled to stack at all (see below)
|
| - Stack may not be the best way to describe it - think
| "local exclusively owned memory" which means that
| compilers, no matter how strict, can reason about the exact
| lifetimes of local values and the changes that happen to
| them. This means that all struct values can be promoted to
| CPU registers and never touch memory unlike with heap
| allocations, where multiple reads of the same property may
| require repeated dereferencing to account for the fact that
| object memory may be globally observable. This in turn
| applies to optimizations like CSE which can elide multiple
| identical checks against struct values knowing they won't
| change between operations.
|
| - In .NET, generic method bodies for class-based generic
| arguments are shared (closest example in Rust - Box<dyn
| Trait>-based dispatch but with less overhead). However,
| struct generic arguments force method body monomorphization
| aka emitting specialized version for the exact generic
| type, which allows to write code with zero-cost
| abstractions the same way one would do in Rust with
| generics or in C++ with templates.
| whoknowsidont wrote:
| >C# stack is completely native, identical to C++ or Rust.
|
| This is absolutely not true. Where are you getting this
| from? Pray tell, what you think this is:
| https://github.com/dotnet/runtime
| nullzzz wrote:
| To me the point seems that the benchmark is so misguided and
| missed the obvious error in the usage on LINQ that the results
| are not relevant. You should not take perf advice from the
| authors of that course.
| throwaheyy wrote:
| > I dont know enough about C# to know if this is true
|
| Do you know enough about C# to realize that .Select() by itself
| doesn't materialize the collections, making the benchmark
| completely nonsensical?
|
| The query structure is the same because it was a failed attempt
| to evaluate classes vs. structs, not one query vs. another.
| quietbritishjim wrote:
| The comment you're replying to is talking about the second
| benchmark, which does access the enumeration so does
| materialize the select().
| mjr00 wrote:
| The original benchmark author is just really unclear what
| they're trying to benchmark. "Class vs struct performance" is
| meaningless, because performance at _what_? The first benchmark
| does nothing, as the article points out. The second benchmark
| tests the performance of large numbers of object creations; but
| why is ElementAt even being called in a loop there? It 's
| confusing since it's irrelevant to what actually ends up taking
| time. If you're benchmarking the time it takes to create the
| struct/class objects, just write a benchmark that does that and
| don't include random other code!
|
| And "structs are more performant" isn't even a correct
| conclusion; they're pass-by-value, so you could construct
| benchmarks where the copy time outweighs heap memory allocation
| time, eg constructing a large object once and passing it to a
| function many times.
| quietbritishjim wrote:
| I don't think you've really refuted the parent comment here.
| But, what you have done, is written a much better blog post
| than the original article :-)
| mjr00 wrote:
| I wasn't really trying to refute the parent! I agree that
| the article author focusing on LINQ making things secretly
| n^2 is a red herring (though important to know). But the
| more fundamental problem is the benchmark being very
| unclear in what it's benchmarking.
| ablob wrote:
| If struct copies are you problem you can always pass it by
| ref, so even that isn't an easy claim to make.
| sitharus wrote:
| The odd thing is this behaviour is documented. Classes
| reference types, are always heap allocated, and passed by
| reference. Structs are value types, allocated inline (that is
| inline in the containing object/array, or on the stack for
| local variables), and passed by value.
|
| Allocation time is going to be around the same for both types
| due to the GC, but there are performance implications depending
| on what you do later. In particular you can avoid garbage
| collection with appropriate use of structs but pass by value
| can mitigate those improvements.
| agent281 wrote:
| I disagree. Using an n^2 algorithm will exaggerate the
| difference between the two data types at higher values of n.
| Using a linear algorithm would give a more consistent
| perspective on the relative performance of the two data types.
| Hixon10 wrote:
| Does dotnet has any tool/profiler, which allows to count number
| of copy bytes (not allocations)? For example, if I want to
| benchmark 2 different function and find out, which one from them
| copies more (so, do more work).
| buybackoff wrote:
| It's so appalling that a paid content could be that bad. For me
| the first benchmark that was doing nothing in both cases is as
| shocking as the second.
|
| But the thing I wanted to highlight is that the blog author is a
| real authority on structs vs classes at a very very low level.
| His series on structs performance are must read for any .NET dev
| who cares about performance at low level. E.g. when to use
| readonly structs or when to use a mutable one to avoid excessive
| copying. That kind of things.
| https://devblogs.microsoft.com/premier-developer/author/sete...
|
| He has published two analyzers on NuGet and both are must have.
| One is focused on the struct usage ErrorProne.NET.Structs and,
| for example, highlights cases of defensive copies and could
| suggest when to make a struct readonly.
|
| https://devblogs.microsoft.com/premier-developer/avoiding-st...
|
| https://www.nuget.org/packages?q=errorprone
| mike_hock wrote:
| Can we just go ahead and call this straight up fraud? If you
| advertise a _paid_ course on C# and don 't have the first clue
| about the language, that can't be assumed to be in good faith.
| PreInternet01 wrote:
| I've seen LINQ lazy evaluation causing problems before, but
| mostly in the context of unit tests (where some code under test
| was simply never invoked, despite code coverage statistics
| looking OK).
|
| It's clear that lambdas can be confusing to both humans and
| tooling, and fixing the latter seems the most viable. Visual
| Studio greying out LINQ lambda code that isn't reachable given
| current invocation patterns would be a nice start, and doesn't
| seem unfeasible to me given the kind of code analysis already
| done...
___________________________________________________________________
(page generated 2023-11-04 23:00 UTC)