[HN Gopher] In C++, is empty() faster than comparing the size wi...
___________________________________________________________________
In C++, is empty() faster than comparing the size with zero?
Author : ibobev
Score : 93 points
Date : 2021-10-27 08:44 UTC (1 days ago)
(HTM) web link (lemire.me)
(TXT) w3m dump (lemire.me)
| sillycross wrote:
| > Amazingly, we find that the GCC compilers is able to compile
| Travis' is_empty C++ function to constant-time code.
|
| It's actually an interesting example where undefined behavior
| allowed compiler optimization:
|
| (1) dereferencing an invalid pointer is UD
|
| (2) signed integer overflow is UD.
|
| This allows the compiler to assume that the program never crashes
| and the counter never overflows. The loop is then optimized out
| knowing that it is read-only thus has no side-effects.
| em3rgent0rdr wrote:
| I suppose it is a safe assumption that the counter will never
| overlow _if_ memory space is not large enough to possibly hold
| the maximum positive integer number of data structs. Though if
| say was using a smaller int for size than what the hypothetical
| size that virtual memory could handle, such as a short int in a
| 32-bit memory system, then that assumption may not be true. But
| on a 64-bit linux system which only allows up to 128 TiB of
| virtual address space for individual processes, a 64-bit signed
| int could be as large as 2^63, which would be larger than the
| hypothetical maximum size that a 128 TiB virtual memory could
| reference, so the assumption that the size counter could never
| overflow would be safe.
| joosters wrote:
| Maybe I'm splitting hairs, but it's not specifically the
| _presence_ of 'undefined behavior' that allows the compiler
| optimization. Instead, it's the language specification. The C
| spec says that integers cannot be relied upon to overflow. The
| result of this is that compilers are then free to assume that
| the program they are compiling has _NO_ undefined behavior in
| it, and so the optimization is possible.
|
| EDIT: To make it clearer, you could imagine an alternate
| version of C that aborted the program if an integer overflowed.
| Then there would be no undefined behavior at all - but the
| optimization is still possible. It's not the UB that helps us
| here, it's the language spec telling us what behavior is
| reliable and what is not.
| adrian_b wrote:
| You need not imagine an alternate version of C, such a
| version of C is provided by any decent C compiler.
|
| For example, with gcc you can use either the option "-ftrapv"
| or the better option "-fsanitize=undefined -fsanitize-
| undefined-trap-on-error" and the program will abort on
| integer overflows (with the second option it will also abort
| for many other error conditions, e.g. access out of bounds).
| joosters wrote:
| Sure, but I understand why the C spec does not define this
| - because not all processors can trap these events, at all
| times and in all situations, without added costs. And C is
| very much of the opinion that it doesn't want added costs -
| but it lets you pick and choose what you want, hence the
| compiler options.
|
| I don't think there's an obvious 'good' and 'bad' choice
| here.
| admax88qqq wrote:
| I think the "good" choice is to remove undefined
| behaviour and accept that C will be slighlty slower on
| esoteric hardware.
| bregma wrote:
| > You need not imagine an alternate version of C, such a
| version of C is provided by any decent C compiler.
|
| Classic misunderstanding of undefined behaviour. In this
| case it's still undefined behaviour, but the vendor has
| said "this is what my compiler will do with this particular
| undefined behaviour under these circumstances". Vendors are
| allowed to do anything they want when code containing
| undefined behaviour is submitted to their compiler,
| including doing something you might consider sane.
| joosters wrote:
| Saying 'this is what I will do when an int can't hold the
| assigned value' is _defining_ the behavior, ergo it is no
| longer undefined!
|
| _Vendors are allowed to do anything they want when code
| containing undefined behaviour is submitted to their
| compiler_
|
| Yes, and his compiler is saying that it will trap these
| events. Are you suggesting that the compiler will
| secretly ignore this setting and then go on to do all
| kinds of insane things?
| bregma wrote:
| > Saying 'this is what I will do when an int can't hold
| the assigned value' is defining the behavior, ergo it is
| no longer undefined!
|
| Defining the behaviour of a particular compiler under
| specific circumstances does not equate to defining the
| behaviour of the C language.
| joosters wrote:
| You are reading words that nobody wrote. No-one claimed
| that they were redefining the 'C spec' by passing flags
| to their compiler - the standards board's job is still
| safe!
| adrian_b wrote:
| I agree that leaving such important behavior as undefined
| is a failure for a programming language standard, because
| the standard fails to guarantee that a program will do
| the right things independently of what compiler has been
| used and with what compiler options.
|
| Nevertheless, in such cases it is the responsibility of
| the programmer to either write programs in which it is
| guaranteed that events for which the program behavior is
| undefined will never happen or to choose compiler options
| to handle such events.
|
| If the programmers do their job right, then the compiler
| is free to optimize like when the undefined cases do not
| exist.
|
| Unlike the case with integer overflow, there are cases
| where the programming language standards correctly leave
| some behavior as undefined, e.g. the order of evaluation
| for function arguments. For that kind of undefined
| behavior the programmer must take care to write only
| programs whose effects do not depend on it.
|
| Data dependent events like integer overflow cannot always
| be avoided, so compilers must have options to generate
| exceptions when they happen.
| umanwizard wrote:
| The behavior is defined, but by the compiler
| documentation, not the language standard. Different
| documents can define different things.
| IshKebab wrote:
| > To make it clearer, you could imagine an alternate version
| of C that aborted the program if an integer overflowed. Then
| there would be no undefined behavior at all - but the
| optimization is still possible.
|
| The optimisation wouldn't be possible in that case because
| then the program wouldn't abort when the integer overflowed.
| It would break the defined behaviour that overflow=abort.
| minitech wrote:
| > Then there would be no undefined behavior at all - but the
| optimization is still possible.
|
| But then an optimization could change the defined behavior of
| aborting to not aborting, which is essentially what undefined
| behavior means and really bad if you don't treat it exactly
| like undefined behavior.
| Noughmad wrote:
| > It's actually an interesting example where undefined behavior
| allowed compiler optimization
|
| That is literally reason why any behavior is considered
| undefined. So that the compiler can skip checks to produce
| better optimized code.
| secondcoming wrote:
| And so you get an optimal, but broken, program. This helps
| nobody.
| thamer wrote:
| It's interesting that size() _has_ to be constant-time. From what
| I understand of the standard linked from the blog post, it looks
| like size() being constant-time is a property of Container and
| not each individual method (SS 23.2.1, page 747).
|
| What does this mean for implementations that want to have these
| properties but have internal structures that make this inherently
| difficult? Do they have to recompute a cached size value during
| other operations, and amortize the cost of size() on insert, for
| example?
|
| I can see it being tricky in some situations: imagine a
| collection with expiring data, for example. You might want to
| prune the expired items in the background, but still have size()
| return the number of elements currently not expired. How would
| someone do this while maintaining an exact size() value when the
| background clean-up will inherently take some time?
| plorkyeran wrote:
| The standard's requirements for the standard library types
| don't imply any corresponding requirement for types you define
| yourself. If you want to write a container which can't
| reasonably do O(1) size() then you just don't do O(1) size().
| If such a container was added to the standard library, the
| standard library's Container requirements will be adjusted to
| permit that.
| chakkepolja wrote:
| Exactly, most STL algorithms, for example, can be applied
| using iterators as well. But anyway if GP has specific
| requirements iterator invalidation might be a problem too,
| they wouldn't probably want to constrain their interface to
| that of STL containers.
| tialaramex wrote:
| Wait, isn't the _point_ of the STL algorithms that they use
| iterators?
|
| For sure the iterator over a vector may be very cheap to
| use, but it is still an iterator, even if you tacitly know
| it's just pointer arithmetic under the hood.
| brandmeyer wrote:
| Once upon a time, size() wasn't required to be constant-time.
| This showed up in some standard library implementations of
| std::list::size which actually did count all the things.
|
| The painful ABI break at C++11 is a big part of why
| implementors were so opposed to allowing another ABI break in
| C++20.
| jeffbee wrote:
| It's also why other people in the C++ community are opposed
| to ABIs themselves.
| brandmeyer wrote:
| The live-at-head model really only works well in a cloud
| computing or JIT environments. It really shouldn't have
| been a surprise to find that this model is unpopular
| outside of those narrow use-cases.
| jcelerier wrote:
| in my personal experience it also works great for desktop
| software
| fhrow4484 wrote:
| > What does this mean for implementations that want to have
| these properties but have internal structures that make this
| inherently difficult
|
| I'm not sure I can think of an example of such a data
| structure?
|
| my understanding is that whatever your data structure is, you
| can keep a "size_t current_size;" private counter which you
| atomically increment/decrement on every insertion/deletion.
| size() is simply: size_t size() { return
| current_size; }
|
| In your background clean-up example structure, if the
| background thread is removing N elements, it must decrease
| current_size by N. Atomically! (i.e. remove the element and
| decrease the current_size using a mutex::lock/unlock)
| thamer wrote:
| My point was that the background thread would remove elements
| periodically, by checking something like:
| if (it->expiry_time < now) erase(it);
|
| Where erase() could certainly remove the item and update the
| size, but it would remove elements _after_ they have expired.
| So if size() must return the actual number of unexpired
| elements _present right now_ , the cached counter you mention
| would not be up to date.
|
| This is what I meant by "have size() return the number of
| elements currently not expired". What you're describing is
| "size returns the number of elements currently not expired OR
| expired but not yet cleaned up", which is different.
| klyrs wrote:
| > I'm not sure I can think of an example of such a data
| structure?
|
| C style strings are the classical example. Computing the size
| (strlen) takes linear time. Not saying that this is a good
| idea or anything, but it's extremely common.
|
| One could imagine a binary tree implemented in a similar
| manner. It costs a little time and memory to keep track of
| the current size, but it almosy always pays off.
| johntb86 wrote:
| Some std::list::splice(const_iterator pos,
| list& other, const_iterator first, const_iterator last)
|
| implementations used to be constant-time, since they could
| just change some pointers in the spliced nodes. Nowadays they
| take linear time, since they need to count how many nodes are
| transferred so they can update the sizes correctly.
| ape4 wrote:
| Interesting, a good case for another std::list<> like
| std::non_counting_list<>
| jfrunyon wrote:
| Wouldn't this still be constant-time today? It doesn't need
| to count the nodes because the other list already has a
| count of its nodes.
| hermitdev wrote:
| No, because you'd still need book keeping around the
| length of [first, last), even if the implementation does
| under-the-hood O(1) movement of nodes in [first, last)
| from other to *this. Computing the length of [first,
| last) is still an O(n) operation for forward iterators.
|
| Maybe you're assuming [first, last) is the entirety of
| other? For that case, there are overloads for splicing an
| entire list in O(1) time: void splice(
| const_iterator pos, list& other ); void splice(
| const_iterator pos, list&& other );
| jfrunyon wrote:
| This seems like an extremely niche use-case, compared to
| the prevalence of use-cases needing to check the size of
| a list.
| MauranKilom wrote:
| No, because you can splice in the middle. Given just the
| iterators you simply have no other way than counting.
|
| (The reason why it used to be constant time is that it
| didn't have to count because it didn't have to compute
| the new size).
| tialaramex wrote:
| > What does this mean for implementations that want to have
| these properties but have internal structures that make this
| inherently difficult?
|
| Such implementations are de facto forbidden in the STL.
|
| Although its containers have vague names like "forward_list"
| today the C++ STL's containers are in practice very specific
| data structures that have the specified properties because
| other data structures would miss one or more of the
| requirements.
|
| This suits "stability at all costs" C++ programmers fine.
| Better that every C++ program is 10% slower than that their
| buggy library from 1995 has to be fixed to actually do what the
| standard says rather than whatever worked in 1995.
|
| If you care about this but you like C++ you just don't use the
| STL. Third parties are free to make containers that use data
| structures that aren't old enough to stand for President of the
| United States of America, some of them aren't even old enough
| to legally _drive_ in the United States of America.
| loup-vaillant wrote:
| If you're _actually_ at the point where you care about that kind
| of performance, you may want to shy away from the STL, and use
| (or write) something simpler instead.
| pansa2 wrote:
| On the other hand, if you're _not_ at the point where you care
| about maximum performance, you may want to shy away from C++
| altogether.
|
| This implies that there are very few situations in which C++
| with STL is the right choice - which seems contrary to how
| widely it's actually used.
| TillE wrote:
| I've always hated this bit of conventional wisdom, which has
| become less and less applicable over the past couple decades.
| STL implementations are now quite highly optimized within their
| spec. With the one little caveat that they're generally
| optimized for CPU cycles rather than per object memory usage.
|
| But you should _understand_ the performance characteristics of
| the STL. A std::vector is super fast for basically everything
| except for dynamic resizing. If performance is a concern, your
| first approach should be to see if you can avoid reallocation.
| Often it 's very simple to just preallocate a reasonable
| maximum, or calculate an exact maximum based on your input.
|
| Finally, if you truly need to do weird stuff with resizing, you
| can look at other containers or write your own.
| glandium wrote:
| I thought the conventional wisdom was to avoid the STL if you
| need consistent performance across platforms, because they
| all have different pitfalls. Things might be different with
| libc++, now, though, since you can use it in most cases.
| jb1991 wrote:
| This was just discussed yesterday also:
| https://news.ycombinator.com/item?id=29007821
| ufo wrote:
| The most impressive example I've ever seen of this kind of
| compiler cleverness is the following function from the
| "convoluted isEven" competition: bool
| isEven(int n) { return n == 0 || !isEven(n + 1);
| }
|
| Clang somehow optimizes it to a constant-time implementation,
| even if we use "unsigned"! https://gcc.godbolt.org/z/nGbn1T
| whimsicalism wrote:
| Is this code not relying on UB? Seems like it could/should be
| optimized to `return true`
| poizan42 wrote:
| It's UB for positive integers, so the compiler can assume it
| never happens. It's correct for negative integers, so the
| compiler just optimizes it for negative integers and ignores
| the positive ones - in this case with the end result that it
| works for positive integers as well.
|
| If the argument is changed to unsigned int then it seems to
| be correct and portable because the C standard[1] requires
| UINT_MAX to be of the form 2^n - 1.
|
| So the one with signed argument is UB for positive integers
| but happens to work by accident (but the nasal demons may
| still lurk just around the corner, e.g. the compiler may also
| assume that it's never called with a positive argument which
| may affect the compilation of callers). The one with unsigned
| argument is correct because unsigned overflow is well-defined
| to wrap around to zero, combined with the restrictions on max
| unsigned int.
|
| [1]: From C17 6.2.6.2:
|
| For unsigned integer types other than unsigned char, the bits
| of the object representation shall be divided into two
| groups: value bits and padding bits (there need not be any of
| the latter). If there are N value bits, each bit shall
| represent a different power of 2 between 1 and 2 _(N-1), so
| that objects of that type shall be capable of representing
| values from 0 to 2_ (N-1) using a pure binary representation;
| this shall be known as the value representation.
| carnitine wrote:
| What's the UB?
| [deleted]
| whimsicalism wrote:
| Calling it on any positive int causes signed integer
| overflow, which is UB.
| nawgz wrote:
| signed integer overflow for positive N
| edflsafoiewq wrote:
| It's only UB for positive n. For negative n, it is an is-even
| test, so it can't be optimized to return true.
| whimsicalism wrote:
| Oh yeah - negative numbers exist, doh!
| joosters wrote:
| Isn't the unsigned version the correct one? Presumably the int
| version will hit the same problem as in the original article,
| in that the C spec says that ints are not treated as
| overflowing? (i.e. the compiler cannot assume that adding one
| will eventually turn negative and then later on reach 0)
| BeeOnRope wrote:
| Well the int version returns the correct answer for "empty()"
| unlike unsigned. The size of a list of 2^32-1 elements should
| not be reported as zero, nor as "empty".
|
| In this case, the compiler has to enforce wrap semantics for
| unsigned, which is an applicaton-level level bug if it
| occurs, and doesn't need to enforce it for int, which happens
| to lead to the right answer for empty (i.e., it checks if it
| has at least one element, which continues to return the
| correct "not empty" answer for 2^32-1 or 2^31-1 elements).
|
| Both get the wrong answer for size() (over different
| domains), if you allow lists that large: they can't even
| represent very large lists.
| DSMan195276 wrote:
| Yes that's right - unsigned is defined to wrap, but ints have
| undefined behavior if you add to the max value (or do other
| silly things with them). Thus the int version given could be
| optimized to do whatever when a positive integer is passed,
| since it only doesn't have UB when negatives or zero are
| passed.
| TheCoelacanth wrote:
| > the compiler cannot assume that adding one will eventually
| turn negative and then later on reach 0
|
| Signed overflow is undefined behavior, so it is allowed to
| assume that. It's also allowed to assume anything else that
| it wants to.
|
| Once you cause a signed overflow, all guarantees are out the
| window and the compiler can do anything it wants to.
| spatulon wrote:
| Is the LLVM optimisation pass that does this hyper-specialised
| for this one particular trick, and, if so, is it valuable in
| practice?
|
| Here's an example where (I think) the two functions are
| equivalent, so I'm a little disappointed it couldn't make the
| same optimisation on the second one. (At least it still
| optimised the tail call to a jump.)
|
| https://godbolt.org/z/K9rMajsjK
| Someone wrote:
| No, it is very generic. See
| https://kristerw.blogspot.com/2019/04/how-llvm-optimizes-
| geo..., https://blog.matthieud.me/2020/exploring-clang-llvm-
| optimiza... (discussed at
| https://news.ycombinator.com/item?id=28207207)
|
| (I remember reading a page with more examples, but cannot
| find it)
| papito wrote:
| In a Wooooorld, where newer generation of developers does not
| grasp the concept of database indexes, and network latency is
| assumed to be zero, one man.....
___________________________________________________________________
(page generated 2021-10-28 23:00 UTC)