[HN Gopher] Move semantics in Rust, C++, and Hylo
___________________________________________________________________
Move semantics in Rust, C++, and Hylo
Author : tajpulo
Score : 97 points
Date : 2024-11-29 16:15 UTC (6 days ago)
(HTM) web link (lukas-prokop.at)
(TXT) w3m dump (lukas-prokop.at)
| einpoklum wrote:
| Q: What's "Hylo"? Should I have heard of it?
|
| A: It's a niche programming language the author is involved with.
| It's not widely-used enough to get its own Wikipedia page. It
| used to be called "Val". See: https://www.hylo-lang.org/
| Gualdrapo wrote:
| Maybe it's just me, but am no fan of they using the keyword
| 'fun' to define a function. Nor Rust's 'fn'.
|
| Also is it a bit strange they wrote "rust" along all the
| article instead of "Rust"?
| einpoklum wrote:
| Well, you're no fun :-(
|
| Anyway, that's pure bikeshedding. "function" is a full word
| in English, but almost 3x the length.
| consp wrote:
| And thus 3x more readable than fn. And otherwise it's bkshd
| for a 150% reduction.
| zozbot234 wrote:
| To be fair, "fun" is also a full word in English. Also,
| it's just plain fun.
| diggan wrote:
| Personally I prefer `defn` for defining functions. `fn` is
| just a function that hasn't been declared or defined,
| obviously.
| amaurose wrote:
| Its the brain child of Dave Abrahams, who is rather big in C++.
|
| https://www.youtube.com/watch?v=5lecIqUhEl4
| bluetomcat wrote:
| > So apparently, move does not prevent generation of a copy, but
| the empty string instead of expected text "Dave" is very
| interesting. Apparently, after termination of show after the
| move, the object is invalidated. This does not affect the Person
| object, but only the string object.
|
| This is a shallow understanding of C++. It happens because the
| Person object is a POD type that doesn't define a move
| constructor, and the compiler creates a default one that calls
| the move constructors of the members. The string member has a
| well-defined move constructor, but the primitive uint8_t type
| doesn't.
| flohofwoe wrote:
| A move constructor/operator for POD or primitive types doesn't
| make any sense in the first place though (also AFAIK an object
| that contains a std::string - like Person - is definitely _not_
| a POD?). Even if Person had a manually provided move-
| constructor and move-assignment-operator, a move would still
| perform a flat copy from the source to the destination object.
| gpderetta wrote:
| Correct on all accounts. It is definitely not a POD nor a
| standard layout type (the modern version of POD).
| mort96 wrote:
| Person has an implicitly generated constructor and destructor
| which calls std::string's constructor and destructor. It's non-
| POD.
| bluetomcat wrote:
| > It's non-POD.
|
| For a stricter definition of POD which requires that byte-by-
| byte copies are possible. More informally, it's a POD because
| it only defines members and all the constructors and
| destructors are implicitly generated.
| flohofwoe wrote:
| I've never seen this definition of 'POD' tbh, 'Plain Old
| Data' kinda implies that it behaves the same as a C struct
| when copying and destructing (e.g. the compiler is able to
| use a memcpy for copying, and destruction is a no-op - both
| is not the case when there's an embedded std::string
| object).
| mort96 wrote:
| I haven't heard your personal informal definition of POD
| before. I've only concerned myself with the standard's
| definition of POD. If you were using a different definition
| of POD than the standard, you should have specified that.
| Or better yet, not used the term "POD", since it is widely
| understood to mean what the standard refers to as "POD".
|
| EDIT: It seems I've had a slightly incorrect impression of
| "POD": what makes 'Person' non-POD isn't that it has an
| implicitly defined constructor but simply that it contains
| a non-POD type. The requirements for POD classes[1]
| includes "has no non-static data members of type non-POD
| class (or array of such types)". std::string is certainly a
| non-POD class, which makes discussion about Person's
| constructors and destructors moot. Not that it changes
| anything, but I don't wanna spread misinformation.
|
| [1] https://en.cppreference.com/w/cpp/language/classes#POD_
| class
| gpderetta wrote:
| you are probably confusing POD with aggregate.
| jcranmer wrote:
| The historical notion of POD is that it's a class type that
| has no C++ shenanigans going on, and thus works like it
| does in C. As a result, while there are a few slightly
| different definitions of POD, all of them share the
| commonality that having a non-POD member makes the class
| non-POD; in other words, POD-ness has a recursive quality.
|
| It doesn't make a lot of sense to _not_ have this recursive
| quality to POD-ness, because the fact that C++ shenanigans
| are involved doesn 't go away just because it's implicitly
| handled for you by the compiler.
| elteto wrote:
| POD means you can memcpy without incurring undefined
| behavior, same as you would in C to copy a struct.
| bluescarni wrote:
| > So apparently, move does not prevent generation of a copy, but
| the empty string instead of expected text "Dave" is very
| interesting. Apparently, after termination of show after the
| move, the object is invalidated. This does not affect the Person
| object, but only the string object. Recognize that I speak about
| a factual behavior on the hardware. I think we have undefined
| behavior here. And no compilation error.
|
| There is a lot of wrong in this paragraph:
|
| - a "copy" was not generated, at least not in the sense that the
| actual content of the string was copied anywhere;
|
| - there's no undefined behaviour here and no invalidation of the
| string. Standard library types are required to be left in an
| unspecified but valid state after move. "Valid" here means that
| you can go on and inspect the state of the string after move, so
| you can query whether it is empty or not, count the number of
| characters, etc. etc. "Unspecified" means that the implementation
| gets to decide what is the status of the string after move. For
| long enough strings, typical implementation strategy is to set
| the moved-from string in an empty state.
| flohofwoe wrote:
| > at least not in the sense that the actual content of the
| string was copied anywhere
|
| ...unless it's a short string within the limits of the small-
| string-optimization capacity.
|
| I think what confuses many people is that a C++ move assignment
| still can copy a significant amount of bytes since it's just a
| flat copy plus 'giving up' ownership of dangling data in the
| source object.
|
| For a POD struct, 'move assignment' and 'copy assignment' are
| identical in terms of cost.
| mort96 wrote:
| I mean it'll copy 3 pointers worth of data in all cases. It's
| just that for short strings, those 3 pointers worth of data
| contains the text of the string.
| fluoridation wrote:
| I feel like that's a pedantic detail. True, yes, but
| irrelevant. You may as well also point out that the return
| address is going to be copied to the instruction pointer when
| the constructor returns.
| jvanderbot wrote:
| It should be, but it's very much not in the real world at
| least as far as I've seen.
|
| Using std::move for anything other than "unique ownership
| without pointers" really messes things up. People put
| std::move everywhere expecting performance gains, just like
| we used to put "&" everywhere expecting performance gains.
| It's a bit of cargo cultism that can be nicely dispelled by
| realizing std::move is just std::copy with a compiler-
| defined constructor invocation potentially run to determine
| the old value. With that phrasing, it's hard to hallucinate
| performance gains that might come automatically.
| colejohnson66 wrote:
| In fact, using std::move everywhere can actually make
| your performance _worse_!
|
| https://devblogs.microsoft.com/oldnewthing/20231124-00/?p
| =10...
| gpderetta wrote:
| > std::move is just std::copy with a compiler-defined
| constructor invocation potentially run to determine the
| old value
|
| I have no idea what that means.
|
| std::move is a cast to an rvalue reference. That can
| potentially trigger a specific overloaded function to be
| selected and possibly, ultimately, a move constructor or
| assignment operator to be called.
|
| For an explicit move to be profitable, an expression
| would have otherwise chosen a copy constructor for a type
| with an expensive copy constructor and a cheap move
| constructor.
|
| std::copy is a range algorithm, not sure what's the
| relevance.
| jvanderbot wrote:
| Yes, typed too fast. I meant the explicit copy
| constructor. Luckly, HN will hide my garbage text quickly
| enough. Thanks for the correction!
| Asraelite wrote:
| I think it's a worthwhile distinction to bring up because
| it highlights a common misconception people have about
| strings and vectors. A string value is not the string
| content itself, just a small struct containing a pointer
| and other metadata. If we're talking about the in-depth
| semantics of a language then it's important to point out
| that this struct _is_ the string, and the array of UTF-8
| characters it points to is not. C++ obfuscates this
| distinction because of how it automatically deep copies
| vectors and strings for you in many cases.
| epcoa wrote:
| > then it's important to point out that this struct is
| the string, and the array of UTF-8 characters it points
| to is not.
|
| So then under this model, what's the difference between a
| string and a string_view?
| Asraelite wrote:
| ...one is a string and one is a string view?
|
| I'm not sure what you're getting at. They're both small
| structs holding pointers to char data, they just operate
| on that data differently.
| Maxatar wrote:
| Exactly, thinking about things in terms of their
| implementations is usually not a good way to actually
| understand what that thing is. By arguing that
| std::string is just the struct itself, which consists of
| who knows what... you fail to appreciate the actual
| semantics of std::string and how those semantics are
| really what defines the std::string.
|
| std::string_view also has implementation details that in
| principle could be similar to std::string, it's a pointer
| with a size, but the semantics of std::string_view are
| very different from the semantics of std::string.
|
| And that's the crux of the issue, it's better to
| understand classes in terms of their semantics, how they
| operate, rather than their implementations.
| Implementations can change, and two very separate things
| can have the same or very similar implementations.
|
| A std::string is not just some pointers and some record
| keeping data; a std::string is best understood as a class
| used to own and manage a sequence of characters with the
| various operations that one would expect for such
| management. A std::string_view is non-owning, read-only
| variation of such a class that operates on an existing
| sequence of characters.
|
| How these are implemented and their structural details is
| not really what's important, it's how someone is expected
| to use them and what can be done with them that counts.
| Asraelite wrote:
| My original comment was just saying that it's useful to
| point out to people that the concrete representation of a
| string in memory is a struct when relevant, since some
| people might not realize that. I'm not claiming anything
| about the best way to think about it overall.
|
| > How these are implemented and their structural details
| is not really what's important
|
| Usually this isn't important, unless you're talking about
| low level details impacting performance, which is
| _exactly what the article is about_.
| epcoa wrote:
| > Usually this isn't important, unless you're talking
| about low level details impacting performance,
|
| And if you're going down that path, the string may not
| have a pointer at all.
|
| "A string value is not the string content itself", but in
| most cases it is if the string is short enough,
| implementation dependent disclaimer and all that.
| epcoa wrote:
| That I think the description "the array is not the
| string" isn't very elucidating for someone that doesn't
| understand the nuance of the ownership/lifetime and move
| semantics (the topic of the article).
|
| "C++ obfuscates this distinction because of how it
| automatically deep copies vectors and strings"
|
| It does this because it _has_ to, to guarantee its
| interface invariants. That "array" (if there is one)
| really is the string. Just because there might be an
| indirection doesn't change that.
|
| > they just operate on that data differently.
|
| Well they operate on the _memory_ "array" of the char
| data differently (well in the latter not at all).
|
| Also a nitpick: std::string unlike String in Rust or
| other languages is not married to an encoding. And C++
| managed to fuck that one up even more so recently.
| quietbritishjim wrote:
| It's a real semantic difference, not a pedantic detail: It
| means that there is a practical reason that the moved-from
| object could be non-empty.
|
| A few standard library types _do_ guarantee that the moved-
| from object is empty (e.g., the smart pointer types).
|
| For some others (basically, all containers except string),
| it is not explicitly stated that this is the case but it is
| hard to imagine an implementation that doesn't (due to time
| complexity and iterator invalidation rules). Arguably, this
| represents a bigger risk than string'e behaviour, but it's
| still interesting.
| fluoridation wrote:
| >It's a real semantic difference, not a pedantic detail
|
| What's the semantic difference? Of course moving a class
| will involve _some_ amount of copying. How could it be
| any other way? If you have something like struct { int
| a[1000]; }, how are you supposed to move the contents of
| the struct without copying anything? What, you take a
| pair of really tiny scissors and cut a teeny tiny piece
| of the RAM, then glue the capacitors somewhere else?
| Joker_vD wrote:
| > how are you supposed to move the contents of the struct
| without copying anything?
|
| By taking the physical page this one struct resides in,
| and mapping it into the virtual address space the second
| time. This approach is usually used in the kernel-level
| development, but there has been _a lot_ of research done
| since the seventies on how to use it in runtimes for
| high-level programming languages.
|
| Now, it does involve copying an address of this struct
| from one place to another, that I cede.
| fluoridation wrote:
| Sure. At the cost of needing >=4K per object, since
| otherwise "moving" an object involves also moving the
| other objects sharing the same page.
| gpderetta wrote:
| You can think of a c++ move as a shallow copy that takes
| ownership of all objects originally owned by the source.
| jvanderbot wrote:
| The real gem of the article is the interlude. E.g., reaching
| back to C days and pointing out that "It's either copy, or
| pointer". Once someone has that mental model solidly in hand,
| all the syntax sugar in the world cannot harm you.
|
| Also "It was an ergonomic advancement." hides a lot of the
| overwrought syntax sugar in C++ that causes it to be such a
| _weird_ language if you come from elsewhere. But still an
| excellent insight into the state of affairs.
|
| I think the "Apparently" language makes it seem like this is
| some kind of accident that nobody would know about, when
| really the author was probably just being a creative writer,
| and the example was fundamental to the post.
| nemetroid wrote:
| The same is true of Rust. I have no idea why the author
| decided to print addresses only for C++ and not for Rust.
| // (1) struct Person { name: String,
| age: u8, } fn show(person: Person) {
| println!("Person record is at address {:p}", &person);
| println!("{} is {} years old", person.name, person.age);
| } fn main() { let p = Person { name:
| "Dave".to_string(), age: 42 }; // (2)
| println!("Person record is at address {:p}", &p);
| show(p); // (3) }
|
| Its output is: Person record is at address
| 0x7ffcfb2b4e40 Person record is at address
| 0x7ffcfb2b4ec0 Dave is 42 years old
| bluGill wrote:
| there is a lot wrong but your analisys misses the elephant: the
| function takes a copy and so a copy must be generated.
| std::move will move if possible but in this case move isn't
| possible and so a copy will be made.
|
| Move is allowed to not move because in generic code you don't
| want to have to check for if move is possible for the type in
| question.
| littlestymaar wrote:
| C++ making the most inscrutable semantic possible, speedrun
| any %.
| GrantMoyer wrote:
| In the case of the example, there is a move, and std::move
| works in the example.
|
| The function, show, doesn't take a copy, it takes a Person
| object. Persons can be copy constructed or move constructed
| (both constructors are implicit, since there's no user-
| defined constructors). std::move returns an r-value reference
| to main's p, so Person's implicit move constructor is called,
| and show's p argument is move constructed from main's p. The
| reported address changes because moving creates a new object
| in C++, but the moved-to object may take ownership of the
| heap allocated memory and other resources from the moved-from
| object.
|
| In this case, the moved-to Person takes ownership of the heap
| allocation from the moved-from Person's string member and
| sets the moved-from Person's string member to an empty
| string. Without std::move, show's p is copy constructed,
| including its string member.
| virtualritz wrote:
| > "Unspecified" means that the implementation gets to decide
| what is the status of the string after move. For long enough
| strings, typical implementation strategy is to set the moved-
| from string in an empty state.
|
| Thusly, what happens in code that accesses the string after the
| move is UB.
|
| In the implementation of C++ the article uses the string was
| just empty. But for all we know it may still contain a 1:1 copy
| of the original or 20 copies or a gobbledygook of bytes.
|
| Any code that relies on the string being something (even empty)
| may behave different if it isn't. That's the very definition of
| UB.
|
| "A typical implementation strategy" is meaningless for someone
| writing code against a language specification.
|
| You're then writing code against a specific compiler/std lib
| and that's fine. But let's be honest about it.
| UncleMeat wrote:
| That's not what UB means. "This will behave differently on
| different implementations" is implementation defined
| behavior. Compilers are not allowed to assume that
| implementation defined behavior never occurs or reject your
| program if they can prove that it happens.
|
| Undefined behavior is a stronger statement and says that if
| the behavior occurs then the entire program is simply not
| valid. This allows the compiler to make vastly more
| aggressive changes to your program.
| Maxatar wrote:
| There is nothing in the standard or definition of C++ that
| states that undefined behavior renders a program invalid.
|
| On the contrary the actual C++ standard explicitly states
| that permissible undefined behavior includes, and I quote
| "behaving during translation or program execution in a
| documented manner characteristic of the environment".
|
| It's also worth noting that numerous well known and used
| C++ libraries explicitly make use of undefined behavior,
| including boost, Folly, Qt. Furthermore, as weird and
| ironic as this sounds, implementing cryptographic libraries
| is not possible without undefined behavior.
| gpderetta wrote:
| "valid program" is not really a term that is used in the
| standard (I only count one normative usage). What the
| standard does say is:
|
| "A conforming implementation executing a well-formed
| program shall produce the same observable behavior as one
| of the possible executions of the corresponding instance
| of the abstract machine with the same program and the
| same input. However, if any such execution contains an
| undefined operation, this document places no requirement
| on the implementation executing that program with that
| input (not even with regard to operations preceding the
| first undefined operation)."
|
| I.e. a program the contains UB is undefined.
|
| Of course, as you observer, an implementation can go
| beyond the standard and extend the abstract machine to
| give defined semantics to those undefined operations.
|
| That's still different from implementation defined
| behaviour, where a conforming implementation must give
| defined semantics.
| bluescarni wrote:
| > Thusly, what happens in code that accesses the string after
| the move is UB.
|
| No, it is implementation-defined behaviour.
|
| > In the implementation of C++ the article uses the string
| was just empty. But for all we know it may still contain a
| 1:1 copy of the original or 20 copies or a gobbledygook of
| bytes.
|
| Yes, and if you want to make sure that the string is empty
| before you do something else with it, you just use a clear()
| (which will be optimised away by the compiler anyway).
|
| Or, if you prefer, you can assign another string to it, or
| anything else really.
|
| > Any code that relies on the string being something (even
| empty) may behave different if it isn't. That's the very
| definition of UB.
|
| No it is not.
|
| > "A typical implementation strategy" is meaningless for
| someone writing code against a language specification.
|
| Then don't rely on that specific implementation detail and
| make sure that the string is in the state you want or, even
| better, don't touch the moved-from string ever again.
| einpoklum wrote:
| Not sure why the author compares Rust's:
| println!("{} is {} years old", person.name, person.age);
|
| with C++: cout << person.name << " is " <<
| unsigned(person.age) << " years old" << endl;
|
| ... while C++ actually has: println("{} is {}
| years old", person.name, person.age);
|
| essentially identical to Rust. See:
| https://en.cppreference.com/w/cpp/io/println
| glandium wrote:
| Probably because it's very new (C++23)
| vlovich123 wrote:
| Well C++23 is fairly new so they probably just didn't know
| about it?
| gpderetta wrote:
| std::cout << std::format(....) ;
|
| has been available since C++20. Still not really the point of
| the article.
| 0xffff2 wrote:
| C++20 is still fairly new. There are places where C++98 is
| still in use as c++11 is considered too cutting edge.
| cjfd wrote:
| Some people are noticing that println is very new. But there
| already is https://github.com/fmtlib/fmt and it has been there
| quite a long time.
| Philpax wrote:
| That would require introducing a dependency, which is a
| digression from the point of the article and would complicate
| reproduction for the reader.
| bangaladore wrote:
| I can assure you that using a new language is a
| substantially greater task than introducing a dependency
| (or using -std=c++23). So you might as well show off the
| latest and greatest for all the competitors.
| account42 wrote:
| Using random libraries in example code isn't good practice
| though.
|
| Still, even (C) printf would have been better than the
| iostreams monstrosity.
| tovej wrote:
| fmt is not a random library, it's the inspiration and
| reference implementation for std::format
| Aurelius108 wrote:
| It's very new to the standard library (latest version of GCC
| this year was the first version to support it). Additionally,
| I've found that println adds 30+ seconds to my compile time
| even for hello world so I'll be avoiding it unless I need it
| einpoklum wrote:
| > It's very new
|
| True, but Hylo is so new that it's not even an established
| language. Plus using this should serve to higlight the
| differences the author actually cares about between the
| languages.
| bangaladore wrote:
| https://godbolt.org/z/MTo11voes > println takes 9 seconds
| https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds
|
| What machine / compiler are you on where the difference
| between these are 30 seconds? GCC is also quite a bit faster
| based off a quick tests in godbolt.
| nicce wrote:
| > https://godbolt.org/z/MTo11voes > println takes 9 seconds
| https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds
|
| That is 50% increase.
| bangaladore wrote:
| I don't believe I claimed anywhere it is not a 50%
| increase. The OC said 30 second difference.
| nicce wrote:
| I missed the "Hello, world!" mention, but otherwise you
| only need to have 10 prints in your whole project to have
| the 30 second increase. That is pretty significant.
| bangaladore wrote:
| It is not linear on number or prints. 1 vs 2 prints will
| likely have zero noticeable affect.
| eterevsky wrote:
| In C++ you can force the move of the parameter by wrapping it
| with std::move() this should take care of unnecessarily cloning
| the argument in the example.
| masklinn wrote:
| std::move does not force anything , it is a cast to an rvalue
| reference (a movable-from).
|
| Whether the object is moved depends on whether the target /
| destination / sink cares.
| fluoridation wrote:
| >Apparently, after termination of show after the move, the object
| is invalidated. This does not affect the Person object, but only
| the string object. Recognize that I speak about a factual
| behavior on the hardware. I think we have undefined behavior
| here. And no compilation error.
|
| The std::string is not invalidated, it's reset to its empty state
| (i.e. null pointer and zero length). Standard classes are all in
| defined, valid states after being moved, such that using them
| again is safe. User-defined classes may be coded to be left in
| either valid or invalid states after being moved. It's the
| responsibility of the programmer to decide which is appropriate
| according to the situation. There are valid reasons to want to
| reuse a moved object. For example, you might want to force the
| release an object's internal memory:
|
| std::string() = std::move(s);
|
| It's somewhat unfortunate that there's no way to signal to the
| compiler than an object is not safe for reuse, though.
| account42 wrote:
| While the language doesn't forbid use after move, occurences of
| it are most likely a programmer error. Which is why clang-tidy
| has the bugprone-use-after-move check.
| alkonaut wrote:
| This sounds like an enormous footgun (but as I understand it
| there are warnings that will tell you). An object isn't "valid"
| in any reasonable business logic sense just because the fields
| are initialized to anything at all, such as their default
| state? If the valid state of a Person is "the name is not empty
| " and this is enforced by a constructor then I don't want the
| program to ever have Person object floating around with a blank
| name? I either want a compiler error (good) or an immediate
| crash at runtime (bad), but at least I don't want an invalid
| object in a still running program (worse). Maybe I
| misunderstand what the reset was or how big this risk is
| though.
| fluoridation wrote:
| >An object isn't "valid" in any reasonable business logic
| sense just because the fields are initialized to anything at
| all, such as their default state
|
| That very much depends on your use case.
|
| >If the valid state of a Person is "the name is not empty "
| and this is enforced by a constructor then I don't want the
| program to ever have Person object floating around with a
| blank name
|
| If you have such strict requirements then you shouldn't be
| moving around Persons to begin with. You should just be using
| std::make_unique() and then moving the pointer. Person should
| not even have a move constructor defined. If you code your
| class such that it's possible to let it reach an invalid
| state, that's no one's fault but your own.
| Joker_vD wrote:
| Even if the std::string was guaranteed to hold a "SORRY,
| THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL
| STRING SUPPLIER" string in it after being moved from, I
| doubt this would actually help that much with the overall
| correctness of the application.
|
| There are very, very few cases where it is "sensible" to do
| anything with such an "arbitrarily conjured" state except
| than disposing of/overwriting it. In fact, the only example
| I can vaguely remember of (and can't for the life of me to
| google) is that one scheme of storing some sort of lookup
| index in two arrays that store indices into each other, and
| it's not necessary to zero out those arrays before using
| them because the access algorithm is cleverly arranged in
| such a way that no matter what numbers are stored in the
| unused parts of the arrays, it will still work correctly.
| fluoridation wrote:
| >Even if the std::string was guaranteed to hold a "SORRY,
| THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR
| LOCAL STRING SUPPLIER" string in it after being moved
| from
|
| That's a rather weak "even if", given most
| implementations just reset to the empty string after
| moving.
|
| >I doubt this would actually help that much with the
| overall correctness of the application.
|
| Like I said, it depends on your use case. A pattern I use
| frequently when processing input is to have an
| accumulator that I build up progressively, and then when
| ready I move it into a result container, and since that
| resets the accumulator I can simply keep using it. If my
| algorithm required the initial state "SORRY, THIS STRING
| HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING
| SUPPLIER" rather than the empty string, such an
| idiosyncratic post-move value would be rather convenient.
| gpderetta wrote:
| The lack of so called destructive moves in C++ is not great.
| You either add a proper empty state to your type and make it
| properly part of the invariant, which is not always possible
| or meaningful, or you need a special moved from state for
| which your object invariant doesn't hold, which is "less than
| ideal" to say the least.
| quietbritishjim wrote:
| > >Apparently, after termination of show after the move, the
| object is invalidated. This does not affect the Person object,
| but only the string object. Recognize that I speak about a
| factual behavior on the hardware. I think we have undefined
| behavior here. And no compilation error.
|
| You're right to pick up on this. The author of the article is
| confused here, or at least using incorrect terminology. There's
| certainly no "undefined behaviour" going on.
|
| But your corrections aren't quite right either, or at least use
| slightly odd definitions.
|
| > User-defined classes may be coded to be left in either valid
| or invalid states after being moved.
|
| No, even user defined classes have to be valid after a move,
| because their destructor will still be run. If you had your own
| vector-like class that points to invalid memory (or the same
| memory as the moved-to object) then you will get corruption
| when its destructor tries to free that memory.
|
| Ok, it's true that you could manually define an "invalid" state
| in your class, perhaps by adding an internal Boolean flag which
| you set when the object is moved from. Then you could throw an
| exception or abort or whatever when any method (except the
| destructor) is called with this flag set. But you'd have to go
| out of your way to do this and I've never seen it done. I don't
| think this is what most people would understand your statement
| to mean.
|
| > The std::string is not invalidated, it's reset to its empty
| state (i.e. null pointer and zero length).
|
| I'm not sure whether you're implying this is a strict
| requirement or just happens to be what happened in this case.
| In fact, the standard does not require this: the string could
| be left in any (valid, whatever that means) state. It could be
| empty, unchanged, or anything else. As other comments have
| noted, if the string's length is below the short string
| optimisation threshold then it's quite likely the original
| string will retain its value unchanged. Only a few specific
| types in the standard library have the guarantee that they will
| be empty after being moved from, and string isn't one of those.
| stonemetal12 wrote:
| >No, even user defined classes have to be valid after a move,
| because their destructor will still be run.
|
| So the compiler will complain and not compile your program??
| Nope. It should be if you want a program that functions
| correctly, but have to? No, C++ doesn't force that on you.
| fluoridation wrote:
| >No, even user defined classes have to be valid after a move,
| because their destructor will still be run.
|
| By "valid" I mean that you can use the object like normal;
| being able to destruct the object is not enough. If the
| destructor is unsafe to run (for example because the object
| ends up owning a dangling pointer) you just have an outright
| bug. An invalid state would be one where any further use of
| the object (other than destroying it) is an error.
|
| >I'm not sure whether you're implying this is a strict
| requirement or just happens to be what happened in this case.
|
| Yes, I'm saying that's what happened in that case. The string
| was not invalidated, it was reset.
| saghm wrote:
| > I think before rust, language designers mixed up the various
| properties these values can have. As a result, many
| incomprehensible designs were the result. rust models the most
| important memory-related properties through its two call
| conventions (passing or borrowing). And Hylo moves even more
| properties into the call conventions. Namely, Hylo uses the
| keywords let, set, sink, and inout. This way Hylo additionally
| represents e.g. initialization (rust models this with a separate
| type).
|
| Is anyone able to clarify what's meant by "initialization" here
| and what "separate type" Rust uses for this (e.g. something
| defined specifically for each type getting passed this way, or a
| generic warpper type in the standard library)? Offhand, my
| understanding is that three of the Hylo keywords listed
| correspond to passing by ownership, shared reference, or mutable
| reference in Rust, and whichever doesn't correspond to one of
| those is something that a separate type if used for in Rust, but
| I'm not confident that my understanding is correct because the
| only thing I can think of that might be related to
| "initialization" is constructors, which Rust notably does _not_
| have any formal concept of in the language, since functions that
| return types are just like any other function implemented on a
| type without a self parameter.
|
| I'm also not completely sure what the intended distinction is
| being made between whatever separate type is and references in
| Rust, since a reference is also a separate type than the type of
| the value of references. I could imagine someone might think that
| references are different than user-defined types in a way that
| other standard library types like Box and Arc aren't, but I'd
| argue that the unique syntax that references have is actually not
| that significant, and semantically being located inside std makes
| them far closer to references in terms of potentially behaving in
| special ways due to them having access to certain unstable APIs
| around things like allocations and fact that std is developed in
| tandem with the compiler, which leaves the door open for those
| types to take advantage of any additional internal APIs that get
| added in the future.
| hmry wrote:
| My best guess is they're referring to writing functions that
| initialize something using an "out" parameter in Hylo, which
| would be equivalent to a "&mut MaybeUninit<...>" parameter in
| Rust.
| Measter wrote:
| They mean whether the value is properly initialized, as in all
| the bytes that make up that value have set values that are
| valid for that type. For example, in Rust the only valid values
| a boolean can have are 0 and 1, anything else is invalid.
| Notably, in the abstract machine, bytes actually have 257
| values: 0-255 and uninitialized. Uninitialized means that an
| initialized value was never written to it. Reading a value that
| is not properly initialized is undefined behaviour, and
| optimization passes can result in unpredictable changes in
| behaviour of the code.
|
| The type they mentioned is MaybeUninit (https://doc.rust-
| lang.org/std/mem/union.MaybeUninit.html), which is used to
| represent values that are not fully initialized. It's worth
| reading the documentation for that type.
| fuhsnn wrote:
| Copy or move for C++ is just choosing which
| constructor/assignment overload to call. I believe it's possible
| to make C++ move-by-default if one go through the trouble of
| overloading every class you use with custom move procedures.
| quietbritishjim wrote:
| Most explanations of C++'s std::move fail because they don't
| focus on its actual effect: controlling function overloading.
|
| Most developers have no trouble getting the idea of C++'s
| function overloading for parameter types that are totally
| different, e.g. it's clear what foo("xyz") will call if you have:
| void foo(int x); void foo(std::string x);
|
| It's also not too hard to get the idea with const and mutable
| references: void foo(std::string& x);
| void foo(const std::string& x);
|
| Rvalue references allow another possibility:
| void foo(std::string&& x); void foo(const std::string& x);
|
| (Technically it's also possible to overload with rvalue and non-
| const regular references, or even all three, but this is rarely
| done in practice).
|
| In this pairing, the first option would be chosen for a temporary
| object (e.g. foo(std::string("xyz")) or just foo("xyz")), while
| the second would be chosen if passing in a named variable
| (std::string x; foo(x)). In practice, the reason you bother to do
| this is so the the first overload can pilfer memory resources
| from its argument (whereas, presumably, the second will need to
| do a copy).
|
| The point of std::move() is to choose the first overload. This
| has the consequence that its argument will probably end up being
| modified (by foo()) even though std::move() itself does not
| contain any substantial code.
|
| All of the above applies to constructors, since they are
| functions and they can also be overloaded. Therefore, the
| following function is very similar in most practical situations
| since std::string has overloaded copy and move constructors:
| void foo(std::string x);
| rocqua wrote:
| To clarify, you are saying the point of std::move is that it
| returns an rvalue reference, allowing the called function to
| pick the overload variant that is allowed to trample and
| destroy it's argument?
|
| Specifically, what you did not make clear is the return type of
| std::move.
| ryanianian wrote:
| std::move is just a cast operation. A better name might be
| std::cast_as_rvalue to force the overload that allows it to
| forward to move constructors/etc that intentionally "destroy"
| the argument (leave it in a moved-from state).
| tialaramex wrote:
| They don't destroy the argument - this is of course a big
| problem because the semantic programmers actually wanted
| (even when C++ 98 didn't _have_ move and papers were
| proposing this new feature) was what C++ programmers now
| call "destructive move" ie the move Rust has. This is
| sometimes now portrayed as some sort of modern idea, but it
| actually was clearly what everybody wanted 15-20 years ago,
| it's just that C++ didn't deliver that.
|
| What they go was this awful compromise, it's not destroyed,
| C++ promises that it will only finally be destroyed when
| the scope ends, and always then, so instead some "hollowed
| out" state is created which is some state (usually
| unspecified but predictable) in which it is safe to destroy
| it.
|
| Creating the "hollowed out" new state for the moved-from
| object so that it can later be destroyed is not zero work,
| it's usually trivial, but given that we're not gaining any
| benefit by doing this work it's pure waste.
|
| This constitutes one of several unavoidable performance
| leaks in modern C++. They're not huge, but they're a
| problem when you still have people who mistake C++ for a
| performance language rather than a language like COBOL
| focused intently on compatibility with piles of archaic
| legacy code.
| Maxatar wrote:
| Thanks for pointing this out. It's an absolute myth that
| C++ move semantics are due to backwards compatibility.
| The original paper on move semantics dating back 2002
| explicitly mentions destructive move semantics by name:
|
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
|
| It does bring up an issue involving how to handle
| destructive moves in a class hierarchy, and while that's
| an issue, it's a local issue that would need careful
| consideration only in a few corner cases as opposed to
| the move semantics we have today which sprinkle the
| potential for misuse all over the codebase.
| shortrounddev2 wrote:
| You can trample and destroy a regular lvalue reference as
| well. The point of casting to an rvalue reference (and
| invoking the rvalue reference constructor) is to copy an
| pointer to the underlying data of one container to a new
| container and then delete the pointer on the original
| container (set it to null, not destroy the data). This has
| the effect of transferring ownership of the underlying data
| from one container to the other. You can do this with an
| lvalue reference as well, but the semantics are different.
|
| This is useful for copying the data of a temporary string to
| another string without actually copying each byte of the
| data. Since the underlying characters live in the heap,
| there's no point in copying each byte to a new area in the
| heap. Instead, use move semantics to transfer ownership of
| the pointer to a new string container
| khold_stare wrote:
| I see some confusion in the comments about C++ moves. I wrote an
| article in 2013 after it clicked for me:
| https://kholdstare.github.io/technical/2013/11/23/moves-demy... .
| It goes over motivation, how it works under the hood etc, has
| diagrams if you are a more visual learner.
| pjmlp wrote:
| > We learned that working on pointers directly often leads to
| memory bugs. So we introduced references.
|
| Minor pedantic correction, references predate having pointers all
| over the place, in most systems languages.
|
| C adopting pointers for all use cases isn't as great as they
| thought.
| Thorrez wrote:
| >I compiled the C++ examples with godbolt with "x86-64 gcc
| (trunk)" and "-Wall -Wextra -Wno-pessimizing-move -Wno-redundant-
| move".
|
| Edit: everything below is incorrect.
|
| -Wno-pessimizing-move is automatically enabled by -Wall, so
| doesn't need to be specified manually. -Wno-redundant-move is
| automatically enabled by -Wextra, so doesn't need to be specified
| manually.
| quuxplusone wrote:
| -Wno-foo is turning _off_ those warnings, not turning them on.
| Thorrez wrote:
| Wow, thanks. The gcc documentation appears to have a problem.
|
| It lists -Wreorder as a warning, and says it's enabled by
| -Wall . It lists -Wno-pessimizing-move as a warning, and says
| it's enabled by -Wall .
|
| I think the documentation should be edited to not list -Wno-
| pessimizing-move , and instead list -Wpessimizing-move .
|
| https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dia.
| ..
| cpp_noob wrote:
| struct Person { string name; uint8_t age;
| };
|
| isn't this missing a move constructor?
| Person::Person(Person&& p) : name(std::move(p.name)), age(p.age)
| {}
|
| or is C++ able to make these implicitly now?
| Maxatar wrote:
| The move and copy constructors are implicit.
| nayuki wrote:
| Some basic things in the article appear to be factually wrong.
|
| > Then we ask us the following questions:
|
| > 1. When we passed Dave to show, did we create a copy?
|
| > 2. If so, how do we avoid creating a copy?
|
| > C++ example
|
| > 1. Yes. You can insert cout << "Person record is at address "
| << &p << endl; before the call of show as well as the beginning
| of show. This reveals different memory addresses of the record.
|
| Judging copies by the object's address is incorrect methodology.
| In both C++ and Rust, "moving" an object will still copy the
| struct fields, but will avoid copying any of the pointees (such
| as the variable-size array that the string owns).
|
| > 2. Replace void show(Person person) with void show(Person&
| person). So only the function needs to change. The caller does
| not have to adapt to it.
|
| Passing by reference is a different concept to moving. While the
| author used this approach for C++, they did not use the same
| approach for Rust. This is comparing apples to oranges.
| ajross wrote:
| > In both C++ and Rust, "moving" an object will still copy the
| struct fields, but
|
| Most people consider a shallow copy a "copy", certainly a
| shallow copy isn't a "reference"! One of the big problems in
| this space is in fact the divergence of terminology that leads
| to arguments like this.
|
| The introduction of move semantics to C++ was a terrible,
| terrible mistake; not because it doesn't solve a real problem
| but because the language is objectively much worse now as a
| routine tool for general developers. People used to hack on
| code to implement features, now they get confused over and
| argue about how many "&" characters they need in a function
| signature.
|
| It was a problem that was best left unsolved, basically.
| Night_Thastus wrote:
| I can't say examples like this sell me on Rust, coming from C++.
| I need to manually to_string(), every single time I want to use
| strings?
|
| And that bizarre scoping of Person p feels very un-intuitive. How
| would you work around that if you need to keep using it after
| show()? (Which is an extremely common use case)
| winrid wrote:
| to_string() gives you an owned string (like std::string) vs a
| borrowed string slice (kind of like char*). If you already have
| an owned string you don't need to do that obviously
|
| If you need to keep using Person after calling show() then
| don't pass ownership to show() - you can pass a reference or a
| mutable reference, or use Rc<> etc
| aseipp wrote:
| A raw string literal gets embedded into the binary's data
| section at compile time, just like it would in C or C++. What
| this means is that the type of the string literal is actually a
| reference (to an underlying memory address). And so it has type
| '&str' which reflects the fact you are using a reference to a
| value that exists somewhere else.
|
| The type 'String' is instead an "owned" type, which means that
| it is not a reference, and instead a complete value and has a
| copy of the data. to_string() will create a String (owned
| value) from a &str (reference) by copying it. This is no
| different than if you had a global static compile-time string
| in C and you wanted to modify or update it: you would memcpy
| the global (statically allocated) string into a local buffer of
| the appropriate size and then modify it and pass it onward to
| other things that need it. You would not modify the static
| string in place.
|
| In short, no, you do not need to_string() every time you want
| to work with a string. You need it to convert a reference type
| to an owned type. Rust's type system is just used here to
| codify the more implicit parts of C or C++'s behavior that you
| are already familiar with, but the underlying bits and bytes
| behave as you would expect coming from C++.
|
| > And that bizarre scoping of Person p feels very un-intuitive.
| How would you work around that if you need to keep using it
| after show()
|
| You take a reference just like you would in C++. Possibly a
| mutable reference if you want to modify the thing and then use
| it afterwords. This is in the article as the "Advanced rust
| example" at the end, it's right there and not hidden or
| anything.
|
| It isn't really bizarre honestly; it's a matter of defaults.
| The difference is that Rust uses move-by-default, not copy-by-
| default or ref-by-default. Every time you write `x = y` for a
| given owned type, you are doing a move of `y` and into `x` and
| thus making `y` invalid. let g: &str =
| "Austin"; // statically allocated string let x:
| String = g.to_string(); // do a copy let y: String = x;
| // no copy, x is moved
|
| Once you internalize this a lot more stuff will make sense, or
| at least it did for me.
| w10-1 wrote:
| Hasn't a language feature failed if even experts disagree on it?
| How would lay developers ever use it? This is not an algorithmic
| nicety; it's supposed to be second nature to write and automatic
| to read.
|
| And it seems weird to omit Swift from this comparison, since
| Swift seems to have the most user-friendly (but incomplete?)
| implementation of move-only types.
| Maxatar wrote:
| Not even the people who implement C++ compilers can agree on
| how certain C++ features are supposed to work.
| enugu wrote:
| In this discussion of a specific point in the post, the promise
| of Hylo language and mutable value semantics can be overlooked.
|
| Namely, we get a lot of the convenience of functional programming
| (mutating one variable doesn't change any other variable) with
| the performance of imperative languages (purely functional data
| structures have higher costs relative to in-place mutation and
| are more gc-intensive).
|
| https://docs.hylo-lang.org/language-tour/bindings
___________________________________________________________________
(page generated 2024-12-05 23:01 UTC)