[HN Gopher] Mysterious Moving Pointers
___________________________________________________________________
Mysterious Moving Pointers
Author : camblomquist
Score : 37 points
Date : 2024-04-14 18:50 UTC (4 hours ago)
(HTM) web link (blomqu.ist)
(TXT) w3m dump (blomqu.ist)
| olliej wrote:
| I know C++ the language but not the STL (the overwhelming
| abundance of UB and total lack of safety make it an anathema), so
| my question is why the STL allows/requires non-move here copying
| here dependent on whether an object has a no throw move
| constructor?
|
| Note I'm not asking about move constructor vs memmove/cpy but
| rather the use of copy constructor vs move depending on exception
| behavior? Is it something like prefer no throw copy constructor
| over "throwing" move?
| wffurr wrote:
| What do you use instead of std::vector, map, unique_ptr, etc?
|
| I have a hard time thinking of C++ and the STL as separate.
| Even our internal utilities and such tend to be STL-like
| although often with safer defaults.
| tialaramex wrote:
| Lots of the C++ standard library, including the STL
| containers isn't provided in freestanding C++. Now, in
| reality freestanding C++ has been kind of a joke - the
| committee for years barely bothered to keep it working -
| especially compared to freestanding C (which is well defined
| and used all over the place) and say Rust no_std (likewise) -
| and so many embedded systems may have the entire standard
| library notionally available even though parts of it are
| definitely nonsense for them and they've got local rules
| saying not to use the parts that would definitely explode in
| their environment... but many C++ programmers who have worked
| under such rules just reflexively avoid the STL's containers
| and maybe its algorithms even in an environment where those
| would work.
| olliej wrote:
| Essentially the same things but reimplemented safely - see
| WTF in webkit.
|
| There are still issues (the iterator API used by for(:) is
| very hard to make safe without terrible perf issues, though I
| was looking at this recently and the compilers are doing much
| better than they used to).
|
| Things like unique_ptr and shared_ptr do not meaningfully
| improve the security of c++ despite being presented as if
| they did (all serious c++ projects already had smart pointers
| before the stl finally built them in so presenting them as a
| security improvement is disingenuous), and because of the
| desire to have shared_ptr be noninvasive it's strictly worse
| than most other shared ownership smart pointers I've used.
| tialaramex wrote:
| The overwhelming C++ priority beating both safety and
| performance, often to the consternation of the performance
| people, is backwards compatibility with dusty archaic code. If
| it was written by somebody whose funeral was last century, WG21
| thinks it's important that it still compiles in your C++ 23
| compiler whenever you get one of those. Not _crucial_. Not so
| that they actually defined the language sensibly to avoid
| compatibility problems, but _important_ enough to trump mere
| performance or safety concerns.
|
| _Last century_ move didn 't exist. The terrible C++ move
| (which is basically an actual "destructive move" plus a default
| create step) was invented for C++ 11 which was, as the name
| suggests, standardised only in 2011.
|
| So back then everybody is using _copy_ assignment semantics.
| Your compiler _might_ be smart enough, especially for trivial
| cases, to spot the cheap way to deliver the required semantics,
| but it might not especially as things get tricky (e.g. a
| std::vector of std::list) and semantically it 's definitely a
| copy, not a move.
|
| As a result the "non-move" that you're astonished by is how
| _all C++ code last century_ was written, the semantics you 're
| just assuming as necessary didn't even exist in ISO C++ 98 and
| it is considered important that such code still works.
| chaboud wrote:
| That's a bit like saying you know C++ but not streams or
| templates, or C but not floating point operations. It's
| probably worth learning STL.
|
| Anyway, the reason to use move instead of copy is for
| performance. Move constructors are faster because they can
| leave the source object modified (e.g., take over control of a
| pointer to deep contents). This falls apart when the move
| constructor can throw, because the container _might_ be part
| way through a resize when this happens, leaving the object
| before the exception modified and the code in an unrecoverable
| state.
|
| Basically, unless we can be super duper 100% certain we're
| going to make it through the operation without throwing an
| exception, we're going to copy, leaving the objects in question
| in an unaltered state, and holding to the promises of the
| standard.
| inetknght wrote:
| > _leaving the object before the exception modified and the
| code in an unrecoverable state_
|
| It isn't likely to leave the _code_ in an unrecoverable state
| even if recovery is calling std::terminate (or worse).
|
| It is likely to leave the _data_ in an unrecoverable state.
| Imagine that a vector of 4 items was resized -- the first two
| objects move successfully, but the third one throws an
| exception. Then in your move function, you catch that and
| decide to undo your changes before propagating the error.
| Then when you 're undoing the changes the first object throws
| an exception when it's being moved back. Oof! At best you've
| got multiple active exceptions (legal if you're in the catch
| handler, but should be rare and definitely should be avoided)
| and at worst your data is indeed unrecoverable (thus one of
| many reasons why std::terminate is the default option when
| multiple exceptions are alive on the same stack).
| olliej wrote:
| I phrased that badly - it should have been "I don't know
| every edge case in the STL, and so I don't know why this
| would have different behavior".
|
| However thanks for explaining the issue. This one is obvious
| and I just completely failed to think about how you ensure
| the source object is in a safe state if an exception occurs
| part way through moving the source data. It seems to imply
| the old MSVC behaviour was incorrect in such a scenario, but
| I hadn't considered that possibility so assumed it was
| correct and therefore didn't think of why this behaviour is
| required.
|
| My solution is of course to simply not allow exceptions
| because the c++ model of everything implicitly throwing is
| just as annoying as Java's "let's explicitly annotate
| everything" model albeit with different paths to sadness.
| inetknght wrote:
| Throw specifications do not change function call binding
| behavior.
|
| Move constructor and move operator will bind to an R-value
| reference if the move constructor or move operators are
| available. Conversely, if those functions which declare to not
| throw anything do end up throwing something then the result is
| std::terminate.
|
| The only things that determine whether to use a move or copy is
| whether the reference is an R-value and whether the source or
| destination is const.
|
| You _can_ declare a {const R-value move operator (not a
| constructor) for the left-hand} and /or {const R-value move
| operator or constructor for the right-hand side} of the
| argument. But you won't be able to modify anything not marked
| mutable. You shouldn't do that though: that's a sure way to
| summon nasal demons. That said, I see it fairly often from less
| experience engineers, particularly when copy-pasting a copy
| operator intending to modify it for a move operator.
| kentonv wrote:
| I think the other replies may have misunderstood your question.
| I think you are asking:
|
| Why does std::vector<T> require T's move constructor to be
| noexcept (or else it falls back to copying instead)?
|
| The reason goes something like this:
|
| When std::vector<T> grows, it needs to move or copy all of its
| elements into a new, larger-capacity array. It would prefer to
| move them, since that's a lot more efficient than copying (for
| non-trivial types). But what happens if it moves N elements,
| and then the move constructor for element N+1 throws an
| exception? Elements 0-N have been moved away already, so the
| vector is no longer valid as-is. Should it try to move those
| elements _back_ to the original array? But what if one of
| _those_ moves fails?
|
| The C++ standards body decided to sidestep this whole problem
| by saying that std::vector<T> will refuse to use T's move
| constructor unless it is declared noexcept, so the above
| problem can't happen.
|
| In my opinion, this was a huge mistake. Intuitively, everyone
| expects that when an std::vector<T> grows, it's going to move
| the elements, not making a ton of copies. Often, these copies
| result in hidden performance problems. Arguably the author of
| this post is lucky than in their case, the copies resulted in
| outright failure, thus revealing the problem.
|
| There seem to be two other possibilities:
|
| * std::vector<T> could simply refuse to compile if the move
| constructor was not `noexcept`. I think this could have been
| done in a way that wouldn't have broken existing code, if it
| had been introduced before move constructors existed in the
| wild -- unfortunately, that ship has now sailed and this cannot
| be done now without breaking people.
|
| * std::vector<T> could always use move constructors, even if
| they are not declared `noexcept`, and simply crash
| (std::terminate()) in the case that one actually throws. IMO
| this would be fine and is the best solution. Move constructors
| almost never actually throw in practice, regardless of whether
| they are declared as such, because move constructors are almost
| always just "copy pointer, null out the original". You don't
| put complex logic in your move constructor. And anyway, C++
| already has plenty of precedent for turning poorly-timed
| exceptions into terminations; why not add another case? But I
| think it's unlikely the standards committee would change this
| now.
| Joker_vD wrote:
| Honestly, trying to move the elements back and calling
| std::abort if that fails seems fine. It is indeed an
| exceptional happenstance, and how quickly you can recover
| from it is probably not as important as being able to recover
| correctly. And who catches exceptions around
| resize()/push_back() anyway?
| quuxplusone wrote:
| I have a blog post on the topic here:
| https://quuxplusone.github.io/blog/2022/08/26/vector-pessimi...
|
| The TLDR is: Using `move_if_noexcept` instead of plain old
| `move` can help you provide the "strong exception guarantee."
| For what _that_ is, see cppreference:
| https://en.cppreference.com/w/cpp/language/exceptions#Except...
|
| and/or the paper by Dave Abrahams that introduced the term,
| "Exception-Safety in Generic Components: Lessons Learned from
| Specifying Exception-Safety for the C++ Standard Library."
| https://www.boost.org/community/exception_safety.html
|
| > Is it something like prefer no throw copy constructor over
| "throwing" move?
|
| Almost. If move won't throw (or if copy isn't possible), we'll
| move. But given a choice between a throwing move and any kind
| of copy, we'll prefer copy, because copy is non-destructive of
| the original data: if something goes wrong, we can roll back to
| the original data. If the original data's been moved-from, we
| can't.
| mgaunard wrote:
| UB is a feature; people who keep on fighting it are such a
| pain.
|
| Regarding your question, nothrow operations are essential to
| maintaining invariants. And maintaining invariants is how you
| make code correct in a world where UB exists.
| TillE wrote:
| Storing a pointer to memory that you did not explicitly allocate
| is always a red flag, I think. You really need to understand how
| everything works, and be very careful.
|
| I would default to just using std::unique_ptr<Node> in a
| situation like this, especially since using std::list suggests
| performance isn't critical here, so the additional indirection
| probably doesn't matter.
| kentonv wrote:
| unique_ptr seems inappropriate here since the pointers aren't
| unique. shared_ptr doesn't even work because it looks like this
| data structure is representing a graph and would expect to have
| cycles. Perhaps you could use some sort of weak pointer that
| gets nulled out when the target object is destroyed, but that
| would not have fixed the bug here, just replaced segfaults with
| some more controlled exception or panic.
|
| In fact, full-on garbage collection wouldn't have prevented the
| bug here, and could arguably have made it worse. The problem is
| that nodes were unexpectedly being copied and then the
| originals deleted. With GC, you'd still have the copy, but
| never delete the originals, so you end up with split brain
| where there are multiple copies of each node and various
| pointers point to the wrong ones. That'd be pretty painful to
| debug!
|
| IMO the language-level problem, if there is one, is that C++ is
| too willing to copy in cases where you would expect it to move
| instead. This is, of course, for backwards compatibility with
| the before-times when there was no move and copying was the
| only logical thing to do. But I think life would be better
| these days if all non-trivial copies had to be requested
| explicitly.
| inetknght wrote:
| > _Perhaps you could use some sort of weak pointer that gets
| nulled out when the target object is destroyed_
|
| std::shared_ptr comes with std::weak_ptr. Referencing
| counting is rather ham-fisted approach but is certainly a
| solution.
|
| > _IMO the language-level problem, if there is one, is that
| C++ is too willing to copy in cases where you would expect it
| to move instead._
|
| IMO that's not a problem in the language but a problem with
| the engineer (misunderstanding when std::move is necessary)
| and the tooling (linter/static analyzer not clearly
| identifying that something should be moved instead, and
| raising a linter warning for it).
|
| For that matter, the places where I see std::list used aren't
| places where "performance isn't important" but rather places
| where an inexperienced engineer was put in charge of
| implementation and a senior engineer accepted it. I can't
| remember the last time I accepted someone using std::list in
| a code review because there has _always_ been a better design
| available even if it necessitated some teaching. If a stable
| pointer address is needed then indeed a smart pointer is the
| correct solution (perhaps std::vector <std::unique_ptr>).
| There are other reasons I've had coworkers cite for using
| std::list (eg constant allocation time) but that's generally
| resolved with std::vector.reserve(upper bound to size) or eg
| a slab allocator (unfortunately, I'm not aware of a standard-
| provided slab allocator, though to be fair I'm not very
| familiar with C++ standard allocators in general).
|
| > _I think life would be better these days if all non-trivial
| copies had to be requested explicitly_
|
| While I don't agree superficially (smells like bringing along
| deep-copy problems), I think the idea merits some thought
| experiments.
|
| It would be fairly trivial to do that for non-plain-old-data
| types by deleting the copy constructor/operator (so it cannot
| happen implicitly) and providing a `make_copy(...)` function
| instead.
| kentonv wrote:
| Agreed that std::list is never the best solution. In every
| case where I want linked list semantics, it's because I
| want to be able to dynamically add and remove objects from
| the list without any allocations at all. The only way to
| achieve that is an intrusive linked list design, which
| std::list is not...
| inetknght wrote:
| > _I want to be able to dynamically add and remove
| objects from the list without any allocations at all. The
| only way to achieve that is an intrusive linked list
| design_
|
| No it isn't.
| std::array<std::optional<std::pair<T,
| std::pair<std::size_t /* prev index */, std::size_t /*
| next index */>>>, 256U /* or whatever your maximum size
| is \*/>.
|
| Indexing does come with a wart: you'd need a sentry value
| for a "no prev" or "no next" value, I'd just use
| std::numeric_limits<size_t>::max() for that. And of
| course when you move objects within the container you'd
| need to update the indices.
|
| If you don't know your upper size bound at compile time,
| then replace `array` with `vector`, and reserve your
| runtime-known upper bound. As long as you never violate
| your upper bound then no (re-)allocations occur (unless
| T's constructor allocates).
| kentonv wrote:
| In most (quite possibly _all_ ) cases I don't know the
| upper bound even at runtime. So, this ends up requiring
| allocation.
| o11c wrote:
| The good news is that since C++ containers aren't special to
| the language, you can just implement your own wrapper classes
| that disable the copy ctor (and provide an explicit
| `.clone()` instead). Coupled with `#pragma GCC poison` it is
| pretty easy to blacklist legacy footguns in source files at
| least (though not in headers without some aggressive work).
|
| ... and yet, almost all vulnerabilities in C++ code are still
| written in C style, not even legacy C++.
| kentonv wrote:
| Yeah I pretty much only use my own alternate container
| implementations (from KJ[0]), which avoid these footguns,
| but the result is everyone complains our project is written
| in Kenton-Language rather than C++ and there's no Stack
| Overflow for it and we can't hire engineers who know how to
| write it... oops.
|
| [0] https://github.com/capnproto/capnproto/blob/v2/kjdoc/to
| ur.md
| jnwatson wrote:
| This is a great reminder of the pox that was Microsoft of the
| early part of the millennium. Besides an allergy to investing in
| web standards, they were woefully behind in their language
| support. Their non-adoption of modern C++ standards held client
| security back for a decade, and arguable held language standards
| development back.
| chaboud wrote:
| That hardly seems fair.
|
| (Microsoft was doing this to C++ well before the early part of
| the millennium...)
|
| Edit: and in non-joke fairness, Microsoft has really come a
| long way on this regard.
| pjmlp wrote:
| There is a certain irony complaining about Microsoft, while
| praising everyone else in regards to C and C++ compilers, as if
| outside the beloved GCC, in a age where clang did not exist,
| the other proprietary compilers were an example of perfection.
|
| Apparently the folks didn't learn their lesson with Web
| standards, given the power they gave Google to transform the
| Web into ChromeOS.
| userbinator wrote:
| IMHO this is another case where C++'s hidden layers of complexity
| hides bugs that would've been obvious in plain C. In fact for
| this particular use-case I'd probably use indices instead of
| pointers.
| limaoscarjuliet wrote:
| I remember the C++ ca. 1994 year when I started my career. It
| was C with Objects back then. And it was great! C++ was better
| C, it was easy for any C dev to convince him/her to jump to it.
|
| I recently had to work with C++ code and... it is not a happy
| story anymore:
|
| - Lots of magic like described here
|
| - F**ing templates - for those who like them, did you ever see
| a C++ core file? Or tried to understand a single symbol?
|
| - Standarization that feels like pulling more Boost into the
| language, which means more templates. Which makes core files
| incomprehensible.
|
| It used the be that average dev who knew C could read and work
| with simple C++. This is no longer true. C++ is no longer a
| better C.
|
| P.S. Example from recent core file, one line in the stack
| trace: boost::asio::asio_handler_invoke<boost::asio::detail::bi
| nder2<core::AsyncSignalCatcher<2, 15, 17>::waitForSignal<XServe
| r::m_accept_loop(tsr::Data&)::<lambda(spawn::yield_context)>::<
| lambda(int)> >(XServer::m_accept_loop(tsr::Data&)::<lambda(spaw
| n::yield_context)>::<lambda(int)>&&)::<lambda(const
| boost::system::error_code&, int)>, boost::system::error_code,
| int> > (function=...)
| tialaramex wrote:
| > I remember the C++ ca. 1994 year when I started my career.
| It was C with Objects back then
|
| It is possible, and perhaps even likely that the C++ you
| _wrote_ in 1994 was indeed this "better C" and maybe even
| the C++ you read, the language Stroustrup wrote about in 1985
| although "C with Objects" is much older still. The ISO
| document (C++ 98 aka ISO 14882:1998) was four years into your
| future in 1994, but the committee to _write_ that document
| had existed for quite some time. Stroustrup 's 1991 Second
| Edition of his appropriately already big book "The C++
| Programming Language" explains that the committee have
| accepted Templates, although I believe at that point they
| didn't realise they'd inadvertently thus added an entirely
| new meta-language which is programmed differently than the
| rest of C++
|
| Boost comes _much_ later, it 's only about as old as the
| actual ISO document.
| limaoscarjuliet wrote:
| I started with Stroustrup's C++, it already had templates
| back then.
| thriftwy wrote:
| I believe that not many developers really wanted C++ and
| craved Object Pascal instead.
|
| I don't like Pascal but I have to admit that Object Pascal
| is a succint and successful addition of OO to a manually
| memalloc language, whereas C++ is neither.
| forrestthewoods wrote:
| boost is just awful. Don't use boost.
|
| templates are totally fine for basic containers. Much better
| than C macro shenanigans.
|
| The world is still looking for a "better C". Zig, Odin, Jai
| and probably more are all trying. Of the three I think Jai is
| the closest. But it'll be awhile.
| inetknght wrote:
| > _- F*ing templates - for those who like them, did you ever
| see a C++ core file? Or tried to understand a single symbol?_
|
| I have 20+ years of experience writing C++.
|
| Yes, I've looked at "core" C++ headers and source. The most
| annoying part to me is style (mixed tabs and spaces, curly
| braces at wrong places, parenthesis or not, the usual style
| complaints). But other than that they're very readable to a
| seasoned C++ engineer.
|
| I've also tried to understand symbols. You're right, they're
| difficult. But there's also tooling available to do it
| automatically. Even if you don't want to use the tools, there
| is a method to the madness and it's documented...
|
| Let me ask ChatGPT:
|
| > _What tool lets me translate an exported symbol name to a
| C++ name?_ C++filt
|
| It's categorized as a demangler. That's your search term to
| look for (I had to remember what it was).
|
| Then I asked:
|
| > _Is there a function in the standard library which allows
| to mangle or demangle a given name or symbol?_
|
| It tells about `__cxa_demangle` for GCC. While I had
| forgotten about that, I'm pretty sure there is (or perhaps
| something similar) in the standard library.
|
| It also suggests to use a library such as
| `abi::__cxa_demangle`. Hah, that's what I was looking for.
| It's an implementation-specific API (eg, compiler-specific)
| API used as an example. It was mentioned on
| `std::type_info::name()` page here:
|
| https://en.cppreference.com/w/cpp/types/type_info/name
|
| So, to continue replying to you: yes, it's annoying but it's
| solvable with tools that you can absolutely integrate into
| your IDE or command-line workflow.
|
| > _- Standarization that feels like pulling more Boost into
| the language, which means more templates._
|
| The boost libraries are open source and their mailing lists
| are active. If you don't like a given library because it has
| too many templates then you could make one with fewer
| templates.
|
| And, as standardization goes, it's also quite open source.
| The C++ committee is very open and receptive to improvements.
| The committee are volunteers (so their time is limited) and
| (usually) have their own improvements to the standard that
| they want. So you have to drive the changes you want (eg,
| actively seek feedback and engagement).
|
| > _P.S. Example from recent core file, one line in the stack
| trace:_
|
| I've seen much longer -- I've seen templates entirely fill a
| terminal buffer for a single line. That's extremely rare,
| definitely not fun, and debuggability is absolutely a valid
| reason to refactor the application design (or contribute
| library changes).
|
| I find it useful to copy the template vomit into a temporary
| file and then run a formatter (eg clang-format), or
| search/replace `s/(<\\({[)/\1\n/g` and manually indent. Then
| the compiled type is easier to read.
|
| Some debuggers also understand type aliases. They'll replace
| the aliased type with the name you actually used, and then
| separately emit a message (eg, on another line) indicating
| the type alias definition (eg, so you can see it if you don't
| have a copy of the source)
| quotemstr wrote:
| In far more cases, zero-cost abstractions make obvious or
| impossible bugs that would be hard to spot in C programs, e.g.
| memory lifetime rule violations. And you could make a similar
| argument that C obscures bugs that would be obvious in
| assembly. High level languages are a blessing, and programmers
| who avoid them are only decreasing their productivity and those
| around them.
|
| The problem the article highlights appears to be an
| implementation defect: in my libstdc++ test just now, we do, in
| fact, mark the list as nothrow move constructible. The standard
| should mandate that std::list be infallibly moveable.
|
| Are we going to indict a whole programming model based on an
| isolated implementation bug? If so, well, isn't doing that from
| a "C is better" perspective the galactic black hole calling the
| kettle black? #include <cstdio>
| #include <list> #include <type_traits> #include
| <vector> struct Node; struct
| Connection { Node *from, *to; };
| struct Node { std::vector<Connection> connections;
| }; int main() {
| printf("%d\n",
| std::is_nothrow_move_constructible_v<std::list<Node>>);
| return 0; }
| camblomquist wrote:
| I mentioned it in a side note that I trimmed because there
| were so many that it spilled into the footer (faster to trim
| the article than to fix the CSS,) but Microsoft is the only
| implementation of the big three that doesn't mark the move
| constructor here as nothrow. The standard doesn't require it
| so it's valid for MSVC to do things the way they do, it just
| creates problems like this that would arguably be harder to
| find the cause of if one had to build code for multiple
| platforms.
| quotemstr wrote:
| Right. My point is that 1) this is a quality-of-
| implementation issue in MSVC, 2) the standard should be
| phrased such that the MSVC implementation is illegal, and
| 3) the C++ standard library solves a lot more problems than
| it creates despite having warts like this and C++ having
| some unfortunate defaults (e.g. mutability by default).
| wakawaka28 wrote:
| This is a noob mistake, not a huge mystery. It's not always wrong
| to store raw pointers to STL container elements, but if you do
| then you must take care of reallocations.
|
| If you find storing pointers to elements too perilous, you should
| probably just make a container of pointers instead.
| gsliepen wrote:
| There is a lesser known cousin to std::vector that doesn't have
| to move nor copy its elements when adding new elements, and that
| is std::deque.
___________________________________________________________________
(page generated 2024-04-14 23:00 UTC)