[HN Gopher] A curiously recurring lifetime issue
___________________________________________________________________
A curiously recurring lifetime issue
Author : irsagent
Score : 62 points
Date : 2023-12-16 17:37 UTC (1 days ago)
(HTM) web link (blog.dureuill.net)
(TXT) w3m dump (blog.dureuill.net)
| jimberlage wrote:
| Is there a good tutorial on Valgrind for beginners? It's a tool
| I've only ever seen praised so I'm curious to play with it a bit.
| gavinhoward wrote:
| As an avid Valgrind user, I don't know of one.
|
| But if you would like, I could post one to my blog.
|
| If you would like me to, contact me privately. [1]
|
| [1]: https://gavinhoward.com/contact/
| kentonv wrote:
| In my experience all you really have to do is:
|
| 1. Write a test program that exercises your segfault.
|
| 2. Build it in debug mode.
|
| 3. Run `valgrind <my-program>`
|
| And it just tells you where your problem is. Not much more to
| it.
| phendrenad2 wrote:
| Referential transparency be damned, I guess. This feels like a
| inherent downside to languages where you have to manage
| lifetimes.
| mungaihaha wrote:
| Nice read
| dataflow wrote:
| I think [[clang::lifetimebound]] would let the compiler detect
| this at compile time?
| IshKebab wrote:
| > Is this on Cap'n Proto? Honestly, I don't know.
|
| It absolutely is. It's a fairly basic principle that APIs should
| be difficult to misuse, and that fact that you made the same
| mistake 2/3 times shows that it is _very_ easy to misuse. In
| other words it is a badly designed API. At the very least it
| should be called ListView.
|
| I am not a big fan of CapnProto. It has some neat features but
| it's very complicated, full of footguns like this, and the API is
| extremely unergonomic.
| dmeybohm wrote:
| Yeah I agree, the ListView naming is more appropriate.
|
| I haven't used non-owning types like string_view or span too
| much because I haven't needed that level of performance or
| memory optimization yet, and so those just seem like footguns
| as compared to just a reference without those needs. I do like
| to use a technique in classes that use non-owning references
| that would work for those too to prevent this particular
| problem.
|
| For that, there are two methods with the same name, but
| different access - an lvalue version and an rvalue version.
| Then, you delete the rvalue method like this:
| class Response { auto getListView() & -> ListView {
| return ListView(m_List); } void getListView()
| && = delete; }
|
| Then you get a compile error like in Rust when you try to call
| getListView() from a temporary object, but if you call the
| method from an lvalue it still works at least as long as the
| object is in scope.
| bsder wrote:
| > In other words it is a badly designed API.
|
| I don't agree. The API is what it is _because_ it is
| specifically a zero copy API for performance. If you don 't
| care about performance, why are you using C++ (stupid) and a
| zero-copy API (doubly stupid)?
|
| I absolutely do _NOT_ expect a zero copy API to own things. If
| I drop the underlying reference that is really an alias, how on
| earth is that the fault of the zero copy API?
|
| The combination of aliasing and lifetimes are _C++_ footguns--
| full stop. This is aptly demonstrated by how quickly Rust kills
| this cold.
|
| If you use sharp knives, sometimes you cut your fingers. People
| like you would claim the knife is the problem.
| plagiarist wrote:
| My takeaway is just the last paragraph, it sounds like Cap'n
| Proto is a footgun and I should use anything else.
| nyanpasu64 wrote:
| I'd say that method chaining (referential transparency, etc.) and
| implicit destructor calls with side effects don't mix.
|
| I have a general rule that "resource" types which own a heap
| allocation should usually be given a variable name with explicit
| scope (and likely even an explicit type, rather than `auto
| response` like in this post). This is a general guideline to
| avoid holding a reference to a temporary that gets destroyed, but
| doesn't protect against returning a dangling reference into a
| resource type from a function.
|
| In other places, where languages make the opposite decision (from
| this blog post) to _extend_ the lifetime of a temporary variable
| with a destructor when you call methods on it, you get things
| like C++ 's temporary lifetime extension (not a bug, note that I
| don't understand it well), and footguns like Rust's `match
| lock.lock().something {}` (https://github.com/rust-lang/lang-
| team/blob/master/design-me...).
| HarHarVeryFunny wrote:
| Seems like someone trying to be too clever to me, and perhaps a
| case of premature optimization. Non-owning references are a
| problem waiting to happen. Even if your language/api allows you
| to check if the reference is still valid before use, you can
| obviously forget to do so.
|
| Rather than use a non-owning reference I'd rather use a design
| that didn't need it, or just use a std::shared_ptr owning
| reference instead. I realize there are potential cases (i.e. one
| can choose to design such cases) of circular references where a
| non-owning reference might be needed to break the circular chain,
| or where one wants a non-owning view of a data structure, but
| without very careful API design and code review these are easy to
| mess up.
| kentonv wrote:
| This sounds nice but it just isn't realistic. If you try to
| write a complex system in C++ without non-owning references,
| you're basically heap-allocating every single object and using
| slow atomic refcounting everywhere. Performance will likely be
| much worse than just using a garbage collected language to
| start with.
| GuB-42 wrote:
| The problem I see here is that one of the functions returns a
| pointer and it doesn't use the usual pointer syntax.
|
| I see no *, no & and no -> in the code. So I would assume
| everything to behave as if it was owned or even copied. Had it
| returned actual pointers, or pointer-like objects like iterators,
| it would have been more obvious.
| amluto wrote:
| This is C++ we're talking about. auto x =
| y();
|
| Is x a pointer or reference? There's no way to tell. _Maybe_ if
| you then do x->foo();
|
| You have some idea that x is pointer-ish, but unique_ptr works
| like this and isn't very pointer-ish.
| kentonv wrote:
| Author of Cap'n Proto here.
|
| The main innovation of Cap'n Proto serialization compared to
| Protobuf is that it doesn't copy anything, it generates a nice
| API where all the accessor methods are directly backed by the
| underlying buffer. Hence the generated classes that you use all
| act as "views" into the single buffer.
|
| C++, meanwhile, is famously an RAII lanugage, not garbage-
| collected. In such languages, you have to keep track of which
| things own which other things so that everyone knows who is
| responsible for freeing memory.
|
| Thus in an RAII language, you generally don't expect view types
| to own the underlying data -- you must separately ensure that
| whatever does own the backing data structure stays alive. C++
| infamously doesn't really help you with this job -- unlike Rust,
| which has a type system capable of catching mistakes at compile
| time.
|
| You might argue that backing buffers should be reference counted
| and all of Cap'n Proto's view types should hold a refcount on the
| buffer. However, this creates new footguns. Should the
| refcounting be atomic? If so, it's really slow. If not, then
| using a message from multiple threads (even without modifying it)
| may occasionally blow up. Also, refcounting would have to keep
| the _entire_ backing buffer alive if any one object is pointing
| at it. This can lead to hard-to-understand memory bloat.
|
| In short, the design of Cap'n Proto's C++ API is a natural result
| of what it implements, and the language it is implemented in. It
| is well-documented that all these types are "pointer-like",
| behaving as views. This kind of API is very common in C++,
| especially high-performing C++. New projects should absolutely
| choose Rust instead of C++ to avoid these kinds of footguns.
|
| In my experience each new developer makes this mistake once,
| figures it out, and doesn't have much trouble using the API after
| that.
| foxhill wrote:
| apologies, perhaps i'm missing something here, having not used
| cap'n proto in any context at all before.
|
| is it not possible to delete the rvalue reference overload of
| 'getList'?
|
| as far as i can tell, the error producing code wouldn't have
| produced a diagnostic, but failed to build in the first
| instance, like the rust case?
| kentonv wrote:
| That would catch some legitimate use cases, where you get the
| list and immediately use it on the same line. Admittedly this
| is not so common for lists, but very common for struct
| readers, e.g.: int i =
| call.send().getSomeStruct().getValue();
|
| Here, even though `send()` returns a response that is not
| saved anywhere, and a struct reader is constructed from it,
| the struct reader is used immediately in the same line, so
| there's no use-after-free.
|
| Someone else mentioned using lifetimebound annotations. This
| will probably work a lot better, avoiding the false
| positives. It just hadn't been done because the annotations
| didn't exist at the time that most of Cap'n Proto was
| originally written.
| foxhill wrote:
| i could be wrong, but i'm reasonably confident that this is
| UB for even trivial types? someone more knowledgeable with
| the language lawyering would need to opine one way or the
| other.
|
| regardless of that outcome, i think i'd prefer to require a
| value preserving the lifetime of the reader/view. in the
| cases that it may not be necessary, i'd prefer to lean on
| the optimiser to take care of it..!
| kentonv wrote:
| What's UB about it? Any temporary objects constructed
| during the evaluation of a statement live until the end
| of the statement. The standard is clear on that.
|
| > i think i'd prefer to require a value preserving the
| lifetime of the reader/view. in the cases that it may not
| be necessary, i'd prefer to lean on the optimiser to take
| care of it..!
|
| We'd all prefer APIs that cannot be used unsafely but
| realistically there's no magic the optimizer can do to
| make the problems with refcounting go away. You need to
| use a language like Rust to solve this.
| foxhill wrote:
| ah, sorry, i didn't read that correctly.
|
| perhaps for values like this you're fine. i think my
| point still stands about the reader of a built-in
| list/sequence type, surely?
|
| and, not to sound facetious, that's exactly what
| optimisers do :)
|
| the c++ type system is more than capable about reasoning
| about lifetimes, the issue is that, with c++, it's an
| optional part of the language. also, the lack of non-
| destructive moves. but to require both of those things in
| the language would require, essentially, the borrow
| checker in rust.
| kentonv wrote:
| Oh actually there's a much more obvious case where
| prohibiting getters on rvalues would be a problem. It would
| prevent you from doing this in general:
| myReader.getFoo().getBar()
|
| Here, `myReader` is already a view type; ownership of the
| backing buffer lives elsewhere. `getFoo()` returns a reader
| for some sub-struct, and `getBar()` returns a member of
| that struct. If we say getters are not permitted to be
| called on rvalues, this expression is illegal, but there's
| no actual problem with it and in practice we write code
| like this all the time.
___________________________________________________________________
(page generated 2023-12-17 23:01 UTC)