hngopher.com

       [HN Gopher] Move semantics in Rust, C++, and Hylo
       ___________________________________________________________________
        
       Move semantics in Rust, C++, and Hylo
        
       Author : tajpulo
       Score  : 97 points
       Date   : 2024-11-29 16:15 UTC (6 days ago)
        
 (HTM) web link (lukas-prokop.at)
 (TXT) w3m dump (lukas-prokop.at)
        
       | einpoklum wrote:
       | Q: What's "Hylo"? Should I have heard of it?
       | 
       | A: It's a niche programming language the author is involved with.
       | It's not widely-used enough to get its own Wikipedia page. It
       | used to be called "Val". See: https://www.hylo-lang.org/
        
         | Gualdrapo wrote:
         | Maybe it's just me, but am no fan of they using the keyword
         | 'fun' to define a function. Nor Rust's 'fn'.
         | 
         | Also is it a bit strange they wrote "rust" along all the
         | article instead of "Rust"?
        
           | einpoklum wrote:
           | Well, you're no fun :-(
           | 
           | Anyway, that's pure bikeshedding. "function" is a full word
           | in English, but almost 3x the length.
        
             | consp wrote:
             | And thus 3x more readable than fn. And otherwise it's bkshd
             | for a 150% reduction.
        
             | zozbot234 wrote:
             | To be fair, "fun" is also a full word in English. Also,
             | it's just plain fun.
        
           | diggan wrote:
           | Personally I prefer `defn` for defining functions. `fn` is
           | just a function that hasn't been declared or defined,
           | obviously.
        
         | amaurose wrote:
         | Its the brain child of Dave Abrahams, who is rather big in C++.
         | 
         | https://www.youtube.com/watch?v=5lecIqUhEl4
        
       | bluetomcat wrote:
       | > So apparently, move does not prevent generation of a copy, but
       | the empty string instead of expected text "Dave" is very
       | interesting. Apparently, after termination of show after the
       | move, the object is invalidated. This does not affect the Person
       | object, but only the string object.
       | 
       | This is a shallow understanding of C++. It happens because the
       | Person object is a POD type that doesn't define a move
       | constructor, and the compiler creates a default one that calls
       | the move constructors of the members. The string member has a
       | well-defined move constructor, but the primitive uint8_t type
       | doesn't.
        
         | flohofwoe wrote:
         | A move constructor/operator for POD or primitive types doesn't
         | make any sense in the first place though (also AFAIK an object
         | that contains a std::string - like Person - is definitely _not_
         | a POD?). Even if Person had a manually provided move-
         | constructor and move-assignment-operator, a move would still
         | perform a flat copy from the source to the destination object.
        
           | gpderetta wrote:
           | Correct on all accounts. It is definitely not a POD nor a
           | standard layout type (the modern version of POD).
        
         | mort96 wrote:
         | Person has an implicitly generated constructor and destructor
         | which calls std::string's constructor and destructor. It's non-
         | POD.
        
           | bluetomcat wrote:
           | > It's non-POD.
           | 
           | For a stricter definition of POD which requires that byte-by-
           | byte copies are possible. More informally, it's a POD because
           | it only defines members and all the constructors and
           | destructors are implicitly generated.
        
             | flohofwoe wrote:
             | I've never seen this definition of 'POD' tbh, 'Plain Old
             | Data' kinda implies that it behaves the same as a C struct
             | when copying and destructing (e.g. the compiler is able to
             | use a memcpy for copying, and destruction is a no-op - both
             | is not the case when there's an embedded std::string
             | object).
        
             | mort96 wrote:
             | I haven't heard your personal informal definition of POD
             | before. I've only concerned myself with the standard's
             | definition of POD. If you were using a different definition
             | of POD than the standard, you should have specified that.
             | Or better yet, not used the term "POD", since it is widely
             | understood to mean what the standard refers to as "POD".
             | 
             | EDIT: It seems I've had a slightly incorrect impression of
             | "POD": what makes 'Person' non-POD isn't that it has an
             | implicitly defined constructor but simply that it contains
             | a non-POD type. The requirements for POD classes[1]
             | includes "has no non-static data members of type non-POD
             | class (or array of such types)". std::string is certainly a
             | non-POD class, which makes discussion about Person's
             | constructors and destructors moot. Not that it changes
             | anything, but I don't wanna spread misinformation.
             | 
             | [1] https://en.cppreference.com/w/cpp/language/classes#POD_
             | class
        
             | gpderetta wrote:
             | you are probably confusing POD with aggregate.
        
             | jcranmer wrote:
             | The historical notion of POD is that it's a class type that
             | has no C++ shenanigans going on, and thus works like it
             | does in C. As a result, while there are a few slightly
             | different definitions of POD, all of them share the
             | commonality that having a non-POD member makes the class
             | non-POD; in other words, POD-ness has a recursive quality.
             | 
             | It doesn't make a lot of sense to _not_ have this recursive
             | quality to POD-ness, because the fact that C++ shenanigans
             | are involved doesn 't go away just because it's implicitly
             | handled for you by the compiler.
        
             | elteto wrote:
             | POD means you can memcpy without incurring undefined
             | behavior, same as you would in C to copy a struct.
        
       | bluescarni wrote:
       | > So apparently, move does not prevent generation of a copy, but
       | the empty string instead of expected text "Dave" is very
       | interesting. Apparently, after termination of show after the
       | move, the object is invalidated. This does not affect the Person
       | object, but only the string object. Recognize that I speak about
       | a factual behavior on the hardware. I think we have undefined
       | behavior here. And no compilation error.
       | 
       | There is a lot of wrong in this paragraph:
       | 
       | - a "copy" was not generated, at least not in the sense that the
       | actual content of the string was copied anywhere;
       | 
       | - there's no undefined behaviour here and no invalidation of the
       | string. Standard library types are required to be left in an
       | unspecified but valid state after move. "Valid" here means that
       | you can go on and inspect the state of the string after move, so
       | you can query whether it is empty or not, count the number of
       | characters, etc. etc. "Unspecified" means that the implementation
       | gets to decide what is the status of the string after move. For
       | long enough strings, typical implementation strategy is to set
       | the moved-from string in an empty state.
        
         | flohofwoe wrote:
         | > at least not in the sense that the actual content of the
         | string was copied anywhere
         | 
         | ...unless it's a short string within the limits of the small-
         | string-optimization capacity.
         | 
         | I think what confuses many people is that a C++ move assignment
         | still can copy a significant amount of bytes since it's just a
         | flat copy plus 'giving up' ownership of dangling data in the
         | source object.
         | 
         | For a POD struct, 'move assignment' and 'copy assignment' are
         | identical in terms of cost.
        
           | mort96 wrote:
           | I mean it'll copy 3 pointers worth of data in all cases. It's
           | just that for short strings, those 3 pointers worth of data
           | contains the text of the string.
        
           | fluoridation wrote:
           | I feel like that's a pedantic detail. True, yes, but
           | irrelevant. You may as well also point out that the return
           | address is going to be copied to the instruction pointer when
           | the constructor returns.
        
             | jvanderbot wrote:
             | It should be, but it's very much not in the real world at
             | least as far as I've seen.
             | 
             | Using std::move for anything other than "unique ownership
             | without pointers" really messes things up. People put
             | std::move everywhere expecting performance gains, just like
             | we used to put "&" everywhere expecting performance gains.
             | It's a bit of cargo cultism that can be nicely dispelled by
             | realizing std::move is just std::copy with a compiler-
             | defined constructor invocation potentially run to determine
             | the old value. With that phrasing, it's hard to hallucinate
             | performance gains that might come automatically.
        
               | colejohnson66 wrote:
               | In fact, using std::move everywhere can actually make
               | your performance _worse_!
               | 
               | https://devblogs.microsoft.com/oldnewthing/20231124-00/?p
               | =10...
        
               | gpderetta wrote:
               | > std::move is just std::copy with a compiler-defined
               | constructor invocation potentially run to determine the
               | old value
               | 
               | I have no idea what that means.
               | 
               | std::move is a cast to an rvalue reference. That can
               | potentially trigger a specific overloaded function to be
               | selected and possibly, ultimately, a move constructor or
               | assignment operator to be called.
               | 
               | For an explicit move to be profitable, an expression
               | would have otherwise chosen a copy constructor for a type
               | with an expensive copy constructor and a cheap move
               | constructor.
               | 
               | std::copy is a range algorithm, not sure what's the
               | relevance.
        
               | jvanderbot wrote:
               | Yes, typed too fast. I meant the explicit copy
               | constructor. Luckly, HN will hide my garbage text quickly
               | enough. Thanks for the correction!
        
             | Asraelite wrote:
             | I think it's a worthwhile distinction to bring up because
             | it highlights a common misconception people have about
             | strings and vectors. A string value is not the string
             | content itself, just a small struct containing a pointer
             | and other metadata. If we're talking about the in-depth
             | semantics of a language then it's important to point out
             | that this struct _is_ the string, and the array of UTF-8
             | characters it points to is not. C++ obfuscates this
             | distinction because of how it automatically deep copies
             | vectors and strings for you in many cases.
        
               | epcoa wrote:
               | > then it's important to point out that this struct is
               | the string, and the array of UTF-8 characters it points
               | to is not.
               | 
               | So then under this model, what's the difference between a
               | string and a string_view?
        
               | Asraelite wrote:
               | ...one is a string and one is a string view?
               | 
               | I'm not sure what you're getting at. They're both small
               | structs holding pointers to char data, they just operate
               | on that data differently.
        
               | Maxatar wrote:
               | Exactly, thinking about things in terms of their
               | implementations is usually not a good way to actually
               | understand what that thing is. By arguing that
               | std::string is just the struct itself, which consists of
               | who knows what... you fail to appreciate the actual
               | semantics of std::string and how those semantics are
               | really what defines the std::string.
               | 
               | std::string_view also has implementation details that in
               | principle could be similar to std::string, it's a pointer
               | with a size, but the semantics of std::string_view are
               | very different from the semantics of std::string.
               | 
               | And that's the crux of the issue, it's better to
               | understand classes in terms of their semantics, how they
               | operate, rather than their implementations.
               | Implementations can change, and two very separate things
               | can have the same or very similar implementations.
               | 
               | A std::string is not just some pointers and some record
               | keeping data; a std::string is best understood as a class
               | used to own and manage a sequence of characters with the
               | various operations that one would expect for such
               | management. A std::string_view is non-owning, read-only
               | variation of such a class that operates on an existing
               | sequence of characters.
               | 
               | How these are implemented and their structural details is
               | not really what's important, it's how someone is expected
               | to use them and what can be done with them that counts.
        
               | Asraelite wrote:
               | My original comment was just saying that it's useful to
               | point out to people that the concrete representation of a
               | string in memory is a struct when relevant, since some
               | people might not realize that. I'm not claiming anything
               | about the best way to think about it overall.
               | 
               | > How these are implemented and their structural details
               | is not really what's important
               | 
               | Usually this isn't important, unless you're talking about
               | low level details impacting performance, which is
               | _exactly what the article is about_.
        
               | epcoa wrote:
               | > Usually this isn't important, unless you're talking
               | about low level details impacting performance,
               | 
               | And if you're going down that path, the string may not
               | have a pointer at all.
               | 
               | "A string value is not the string content itself", but in
               | most cases it is if the string is short enough,
               | implementation dependent disclaimer and all that.
        
               | epcoa wrote:
               | That I think the description "the array is not the
               | string" isn't very elucidating for someone that doesn't
               | understand the nuance of the ownership/lifetime and move
               | semantics (the topic of the article).
               | 
               | "C++ obfuscates this distinction because of how it
               | automatically deep copies vectors and strings"
               | 
               | It does this because it _has_ to, to guarantee its
               | interface invariants. That "array" (if there is one)
               | really is the string. Just because there might be an
               | indirection doesn't change that.
               | 
               | > they just operate on that data differently.
               | 
               | Well they operate on the _memory_ "array" of the char
               | data differently (well in the latter not at all).
               | 
               | Also a nitpick: std::string unlike String in Rust or
               | other languages is not married to an encoding. And C++
               | managed to fuck that one up even more so recently.
        
             | quietbritishjim wrote:
             | It's a real semantic difference, not a pedantic detail: It
             | means that there is a practical reason that the moved-from
             | object could be non-empty.
             | 
             | A few standard library types _do_ guarantee that the moved-
             | from object is empty (e.g., the smart pointer types).
             | 
             | For some others (basically, all containers except string),
             | it is not explicitly stated that this is the case but it is
             | hard to imagine an implementation that doesn't (due to time
             | complexity and iterator invalidation rules). Arguably, this
             | represents a bigger risk than string'e behaviour, but it's
             | still interesting.
        
               | fluoridation wrote:
               | >It's a real semantic difference, not a pedantic detail
               | 
               | What's the semantic difference? Of course moving a class
               | will involve _some_ amount of copying. How could it be
               | any other way? If you have something like struct { int
               | a[1000]; }, how are you supposed to move the contents of
               | the struct without copying anything? What, you take a
               | pair of really tiny scissors and cut a teeny tiny piece
               | of the RAM, then glue the capacitors somewhere else?
        
               | Joker_vD wrote:
               | > how are you supposed to move the contents of the struct
               | without copying anything?
               | 
               | By taking the physical page this one struct resides in,
               | and mapping it into the virtual address space the second
               | time. This approach is usually used in the kernel-level
               | development, but there has been _a lot_ of research done
               | since the seventies on how to use it in runtimes for
               | high-level programming languages.
               | 
               | Now, it does involve copying an address of this struct
               | from one place to another, that I cede.
        
               | fluoridation wrote:
               | Sure. At the cost of needing >=4K per object, since
               | otherwise "moving" an object involves also moving the
               | other objects sharing the same page.
        
           | gpderetta wrote:
           | You can think of a c++ move as a shallow copy that takes
           | ownership of all objects originally owned by the source.
        
           | jvanderbot wrote:
           | The real gem of the article is the interlude. E.g., reaching
           | back to C days and pointing out that "It's either copy, or
           | pointer". Once someone has that mental model solidly in hand,
           | all the syntax sugar in the world cannot harm you.
           | 
           | Also "It was an ergonomic advancement." hides a lot of the
           | overwrought syntax sugar in C++ that causes it to be such a
           | _weird_ language if you come from elsewhere. But still an
           | excellent insight into the state of affairs.
           | 
           | I think the "Apparently" language makes it seem like this is
           | some kind of accident that nobody would know about, when
           | really the author was probably just being a creative writer,
           | and the example was fundamental to the post.
        
           | nemetroid wrote:
           | The same is true of Rust. I have no idea why the author
           | decided to print addresses only for C++ and not for Rust.
           | // (1)       struct Person {           name: String,
           | age: u8,       }              fn show(person: Person) {
           | println!("Person record is at address  {:p}", &person);
           | println!("{} is {} years old", person.name, person.age);
           | }              fn main() {           let p = Person { name:
           | "Dave".to_string(), age: 42 }; // (2)
           | println!("Person record is at address  {:p}", &p);
           | show(p); // (3)       }
           | 
           | Its output is:                 Person record is at address
           | 0x7ffcfb2b4e40       Person record is at address
           | 0x7ffcfb2b4ec0       Dave is 42 years old
        
         | bluGill wrote:
         | there is a lot wrong but your analisys misses the elephant: the
         | function takes a copy and so a copy must be generated.
         | std::move will move if possible but in this case move isn't
         | possible and so a copy will be made.
         | 
         | Move is allowed to not move because in generic code you don't
         | want to have to check for if move is possible for the type in
         | question.
        
           | littlestymaar wrote:
           | C++ making the most inscrutable semantic possible, speedrun
           | any %.
        
           | GrantMoyer wrote:
           | In the case of the example, there is a move, and std::move
           | works in the example.
           | 
           | The function, show, doesn't take a copy, it takes a Person
           | object. Persons can be copy constructed or move constructed
           | (both constructors are implicit, since there's no user-
           | defined constructors). std::move returns an r-value reference
           | to main's p, so Person's implicit move constructor is called,
           | and show's p argument is move constructed from main's p. The
           | reported address changes because moving creates a new object
           | in C++, but the moved-to object may take ownership of the
           | heap allocated memory and other resources from the moved-from
           | object.
           | 
           | In this case, the moved-to Person takes ownership of the heap
           | allocation from the moved-from Person's string member and
           | sets the moved-from Person's string member to an empty
           | string. Without std::move, show's p is copy constructed,
           | including its string member.
        
         | virtualritz wrote:
         | > "Unspecified" means that the implementation gets to decide
         | what is the status of the string after move. For long enough
         | strings, typical implementation strategy is to set the moved-
         | from string in an empty state.
         | 
         | Thusly, what happens in code that accesses the string after the
         | move is UB.
         | 
         | In the implementation of C++ the article uses the string was
         | just empty. But for all we know it may still contain a 1:1 copy
         | of the original or 20 copies or a gobbledygook of bytes.
         | 
         | Any code that relies on the string being something (even empty)
         | may behave different if it isn't. That's the very definition of
         | UB.
         | 
         | "A typical implementation strategy" is meaningless for someone
         | writing code against a language specification.
         | 
         | You're then writing code against a specific compiler/std lib
         | and that's fine. But let's be honest about it.
        
           | UncleMeat wrote:
           | That's not what UB means. "This will behave differently on
           | different implementations" is implementation defined
           | behavior. Compilers are not allowed to assume that
           | implementation defined behavior never occurs or reject your
           | program if they can prove that it happens.
           | 
           | Undefined behavior is a stronger statement and says that if
           | the behavior occurs then the entire program is simply not
           | valid. This allows the compiler to make vastly more
           | aggressive changes to your program.
        
             | Maxatar wrote:
             | There is nothing in the standard or definition of C++ that
             | states that undefined behavior renders a program invalid.
             | 
             | On the contrary the actual C++ standard explicitly states
             | that permissible undefined behavior includes, and I quote
             | "behaving during translation or program execution in a
             | documented manner characteristic of the environment".
             | 
             | It's also worth noting that numerous well known and used
             | C++ libraries explicitly make use of undefined behavior,
             | including boost, Folly, Qt. Furthermore, as weird and
             | ironic as this sounds, implementing cryptographic libraries
             | is not possible without undefined behavior.
        
               | gpderetta wrote:
               | "valid program" is not really a term that is used in the
               | standard (I only count one normative usage). What the
               | standard does say is:
               | 
               | "A conforming implementation executing a well-formed
               | program shall produce the same observable behavior as one
               | of the possible executions of the corresponding instance
               | of the abstract machine with the same program and the
               | same input. However, if any such execution contains an
               | undefined operation, this document places no requirement
               | on the implementation executing that program with that
               | input (not even with regard to operations preceding the
               | first undefined operation)."
               | 
               | I.e. a program the contains UB is undefined.
               | 
               | Of course, as you observer, an implementation can go
               | beyond the standard and extend the abstract machine to
               | give defined semantics to those undefined operations.
               | 
               | That's still different from implementation defined
               | behaviour, where a conforming implementation must give
               | defined semantics.
        
           | bluescarni wrote:
           | > Thusly, what happens in code that accesses the string after
           | the move is UB.
           | 
           | No, it is implementation-defined behaviour.
           | 
           | > In the implementation of C++ the article uses the string
           | was just empty. But for all we know it may still contain a
           | 1:1 copy of the original or 20 copies or a gobbledygook of
           | bytes.
           | 
           | Yes, and if you want to make sure that the string is empty
           | before you do something else with it, you just use a clear()
           | (which will be optimised away by the compiler anyway).
           | 
           | Or, if you prefer, you can assign another string to it, or
           | anything else really.
           | 
           | > Any code that relies on the string being something (even
           | empty) may behave different if it isn't. That's the very
           | definition of UB.
           | 
           | No it is not.
           | 
           | > "A typical implementation strategy" is meaningless for
           | someone writing code against a language specification.
           | 
           | Then don't rely on that specific implementation detail and
           | make sure that the string is in the state you want or, even
           | better, don't touch the moved-from string ever again.
        
       | einpoklum wrote:
       | Not sure why the author compares Rust's:
       | println!("{} is {} years old", person.name, person.age);
       | 
       | with C++:                   cout << person.name << " is " <<
       | unsigned(person.age)              << " years old" << endl;
       | 
       | ... while C++ actually has:                   println("{} is {}
       | years old", person.name, person.age);
       | 
       | essentially identical to Rust. See:
       | https://en.cppreference.com/w/cpp/io/println
        
         | glandium wrote:
         | Probably because it's very new (C++23)
        
         | vlovich123 wrote:
         | Well C++23 is fairly new so they probably just didn't know
         | about it?
        
           | gpderetta wrote:
           | std::cout << std::format(....) ;
           | 
           | has been available since C++20. Still not really the point of
           | the article.
        
             | 0xffff2 wrote:
             | C++20 is still fairly new. There are places where C++98 is
             | still in use as c++11 is considered too cutting edge.
        
         | cjfd wrote:
         | Some people are noticing that println is very new. But there
         | already is https://github.com/fmtlib/fmt and it has been there
         | quite a long time.
        
           | Philpax wrote:
           | That would require introducing a dependency, which is a
           | digression from the point of the article and would complicate
           | reproduction for the reader.
        
             | bangaladore wrote:
             | I can assure you that using a new language is a
             | substantially greater task than introducing a dependency
             | (or using -std=c++23). So you might as well show off the
             | latest and greatest for all the competitors.
        
           | account42 wrote:
           | Using random libraries in example code isn't good practice
           | though.
           | 
           | Still, even (C) printf would have been better than the
           | iostreams monstrosity.
        
             | tovej wrote:
             | fmt is not a random library, it's the inspiration and
             | reference implementation for std::format
        
         | Aurelius108 wrote:
         | It's very new to the standard library (latest version of GCC
         | this year was the first version to support it). Additionally,
         | I've found that println adds 30+ seconds to my compile time
         | even for hello world so I'll be avoiding it unless I need it
        
           | einpoklum wrote:
           | > It's very new
           | 
           | True, but Hylo is so new that it's not even an established
           | language. Plus using this should serve to higlight the
           | differences the author actually cares about between the
           | languages.
        
           | bangaladore wrote:
           | https://godbolt.org/z/MTo11voes > println takes 9 seconds
           | https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds
           | 
           | What machine / compiler are you on where the difference
           | between these are 30 seconds? GCC is also quite a bit faster
           | based off a quick tests in godbolt.
        
             | nicce wrote:
             | > https://godbolt.org/z/MTo11voes > println takes 9 seconds
             | https://godbolt.org/z/he6Phr7nG > cout takes 6 seconds
             | 
             | That is 50% increase.
        
               | bangaladore wrote:
               | I don't believe I claimed anywhere it is not a 50%
               | increase. The OC said 30 second difference.
        
               | nicce wrote:
               | I missed the "Hello, world!" mention, but otherwise you
               | only need to have 10 prints in your whole project to have
               | the 30 second increase. That is pretty significant.
        
               | bangaladore wrote:
               | It is not linear on number or prints. 1 vs 2 prints will
               | likely have zero noticeable affect.
        
       | eterevsky wrote:
       | In C++ you can force the move of the parameter by wrapping it
       | with std::move() this should take care of unnecessarily cloning
       | the argument in the example.
        
         | masklinn wrote:
         | std::move does not force anything , it is a cast to an rvalue
         | reference (a movable-from).
         | 
         | Whether the object is moved depends on whether the target /
         | destination / sink cares.
        
       | fluoridation wrote:
       | >Apparently, after termination of show after the move, the object
       | is invalidated. This does not affect the Person object, but only
       | the string object. Recognize that I speak about a factual
       | behavior on the hardware. I think we have undefined behavior
       | here. And no compilation error.
       | 
       | The std::string is not invalidated, it's reset to its empty state
       | (i.e. null pointer and zero length). Standard classes are all in
       | defined, valid states after being moved, such that using them
       | again is safe. User-defined classes may be coded to be left in
       | either valid or invalid states after being moved. It's the
       | responsibility of the programmer to decide which is appropriate
       | according to the situation. There are valid reasons to want to
       | reuse a moved object. For example, you might want to force the
       | release an object's internal memory:
       | 
       | std::string() = std::move(s);
       | 
       | It's somewhat unfortunate that there's no way to signal to the
       | compiler than an object is not safe for reuse, though.
        
         | account42 wrote:
         | While the language doesn't forbid use after move, occurences of
         | it are most likely a programmer error. Which is why clang-tidy
         | has the bugprone-use-after-move check.
        
         | alkonaut wrote:
         | This sounds like an enormous footgun (but as I understand it
         | there are warnings that will tell you). An object isn't "valid"
         | in any reasonable business logic sense just because the fields
         | are initialized to anything at all, such as their default
         | state? If the valid state of a Person is "the name is not empty
         | " and this is enforced by a constructor then I don't want the
         | program to ever have Person object floating around with a blank
         | name? I either want a compiler error (good) or an immediate
         | crash at runtime (bad), but at least I don't want an invalid
         | object in a still running program (worse). Maybe I
         | misunderstand what the reset was or how big this risk is
         | though.
        
           | fluoridation wrote:
           | >An object isn't "valid" in any reasonable business logic
           | sense just because the fields are initialized to anything at
           | all, such as their default state
           | 
           | That very much depends on your use case.
           | 
           | >If the valid state of a Person is "the name is not empty "
           | and this is enforced by a constructor then I don't want the
           | program to ever have Person object floating around with a
           | blank name
           | 
           | If you have such strict requirements then you shouldn't be
           | moving around Persons to begin with. You should just be using
           | std::make_unique() and then moving the pointer. Person should
           | not even have a move constructor defined. If you code your
           | class such that it's possible to let it reach an invalid
           | state, that's no one's fault but your own.
        
             | Joker_vD wrote:
             | Even if the std::string was guaranteed to hold a "SORRY,
             | THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL
             | STRING SUPPLIER" string in it after being moved from, I
             | doubt this would actually help that much with the overall
             | correctness of the application.
             | 
             | There are very, very few cases where it is "sensible" to do
             | anything with such an "arbitrarily conjured" state except
             | than disposing of/overwriting it. In fact, the only example
             | I can vaguely remember of (and can't for the life of me to
             | google) is that one scheme of storing some sort of lookup
             | index in two arrays that store indices into each other, and
             | it's not necessary to zero out those arrays before using
             | them because the access algorithm is cleverly arranged in
             | such a way that no matter what numbers are stored in the
             | unused parts of the arrays, it will still work correctly.
        
               | fluoridation wrote:
               | >Even if the std::string was guaranteed to hold a "SORRY,
               | THIS STRING HAS BEEN MOVED FROM, PLEASE CONTACT YOUR
               | LOCAL STRING SUPPLIER" string in it after being moved
               | from
               | 
               | That's a rather weak "even if", given most
               | implementations just reset to the empty string after
               | moving.
               | 
               | >I doubt this would actually help that much with the
               | overall correctness of the application.
               | 
               | Like I said, it depends on your use case. A pattern I use
               | frequently when processing input is to have an
               | accumulator that I build up progressively, and then when
               | ready I move it into a result container, and since that
               | resets the accumulator I can simply keep using it. If my
               | algorithm required the initial state "SORRY, THIS STRING
               | HAS BEEN MOVED FROM, PLEASE CONTACT YOUR LOCAL STRING
               | SUPPLIER" rather than the empty string, such an
               | idiosyncratic post-move value would be rather convenient.
        
           | gpderetta wrote:
           | The lack of so called destructive moves in C++ is not great.
           | You either add a proper empty state to your type and make it
           | properly part of the invariant, which is not always possible
           | or meaningful, or you need a special moved from state for
           | which your object invariant doesn't hold, which is "less than
           | ideal" to say the least.
        
         | quietbritishjim wrote:
         | > >Apparently, after termination of show after the move, the
         | object is invalidated. This does not affect the Person object,
         | but only the string object. Recognize that I speak about a
         | factual behavior on the hardware. I think we have undefined
         | behavior here. And no compilation error.
         | 
         | You're right to pick up on this. The author of the article is
         | confused here, or at least using incorrect terminology. There's
         | certainly no "undefined behaviour" going on.
         | 
         | But your corrections aren't quite right either, or at least use
         | slightly odd definitions.
         | 
         | > User-defined classes may be coded to be left in either valid
         | or invalid states after being moved.
         | 
         | No, even user defined classes have to be valid after a move,
         | because their destructor will still be run. If you had your own
         | vector-like class that points to invalid memory (or the same
         | memory as the moved-to object) then you will get corruption
         | when its destructor tries to free that memory.
         | 
         | Ok, it's true that you could manually define an "invalid" state
         | in your class, perhaps by adding an internal Boolean flag which
         | you set when the object is moved from. Then you could throw an
         | exception or abort or whatever when any method (except the
         | destructor) is called with this flag set. But you'd have to go
         | out of your way to do this and I've never seen it done. I don't
         | think this is what most people would understand your statement
         | to mean.
         | 
         | > The std::string is not invalidated, it's reset to its empty
         | state (i.e. null pointer and zero length).
         | 
         | I'm not sure whether you're implying this is a strict
         | requirement or just happens to be what happened in this case.
         | In fact, the standard does not require this: the string could
         | be left in any (valid, whatever that means) state. It could be
         | empty, unchanged, or anything else. As other comments have
         | noted, if the string's length is below the short string
         | optimisation threshold then it's quite likely the original
         | string will retain its value unchanged. Only a few specific
         | types in the standard library have the guarantee that they will
         | be empty after being moved from, and string isn't one of those.
        
           | stonemetal12 wrote:
           | >No, even user defined classes have to be valid after a move,
           | because their destructor will still be run.
           | 
           | So the compiler will complain and not compile your program??
           | Nope. It should be if you want a program that functions
           | correctly, but have to? No, C++ doesn't force that on you.
        
           | fluoridation wrote:
           | >No, even user defined classes have to be valid after a move,
           | because their destructor will still be run.
           | 
           | By "valid" I mean that you can use the object like normal;
           | being able to destruct the object is not enough. If the
           | destructor is unsafe to run (for example because the object
           | ends up owning a dangling pointer) you just have an outright
           | bug. An invalid state would be one where any further use of
           | the object (other than destroying it) is an error.
           | 
           | >I'm not sure whether you're implying this is a strict
           | requirement or just happens to be what happened in this case.
           | 
           | Yes, I'm saying that's what happened in that case. The string
           | was not invalidated, it was reset.
        
       | saghm wrote:
       | > I think before rust, language designers mixed up the various
       | properties these values can have. As a result, many
       | incomprehensible designs were the result. rust models the most
       | important memory-related properties through its two call
       | conventions (passing or borrowing). And Hylo moves even more
       | properties into the call conventions. Namely, Hylo uses the
       | keywords let, set, sink, and inout. This way Hylo additionally
       | represents e.g. initialization (rust models this with a separate
       | type).
       | 
       | Is anyone able to clarify what's meant by "initialization" here
       | and what "separate type" Rust uses for this (e.g. something
       | defined specifically for each type getting passed this way, or a
       | generic warpper type in the standard library)? Offhand, my
       | understanding is that three of the Hylo keywords listed
       | correspond to passing by ownership, shared reference, or mutable
       | reference in Rust, and whichever doesn't correspond to one of
       | those is something that a separate type if used for in Rust, but
       | I'm not confident that my understanding is correct because the
       | only thing I can think of that might be related to
       | "initialization" is constructors, which Rust notably does _not_
       | have any formal concept of in the language, since functions that
       | return types are just like any other function implemented on a
       | type without a self parameter.
       | 
       | I'm also not completely sure what the intended distinction is
       | being made between whatever separate type is and references in
       | Rust, since a reference is also a separate type than the type of
       | the value of references. I could imagine someone might think that
       | references are different than user-defined types in a way that
       | other standard library types like Box and Arc aren't, but I'd
       | argue that the unique syntax that references have is actually not
       | that significant, and semantically being located inside std makes
       | them far closer to references in terms of potentially behaving in
       | special ways due to them having access to certain unstable APIs
       | around things like allocations and fact that std is developed in
       | tandem with the compiler, which leaves the door open for those
       | types to take advantage of any additional internal APIs that get
       | added in the future.
        
         | hmry wrote:
         | My best guess is they're referring to writing functions that
         | initialize something using an "out" parameter in Hylo, which
         | would be equivalent to a "&mut MaybeUninit<...>" parameter in
         | Rust.
        
         | Measter wrote:
         | They mean whether the value is properly initialized, as in all
         | the bytes that make up that value have set values that are
         | valid for that type. For example, in Rust the only valid values
         | a boolean can have are 0 and 1, anything else is invalid.
         | Notably, in the abstract machine, bytes actually have 257
         | values: 0-255 and uninitialized. Uninitialized means that an
         | initialized value was never written to it. Reading a value that
         | is not properly initialized is undefined behaviour, and
         | optimization passes can result in unpredictable changes in
         | behaviour of the code.
         | 
         | The type they mentioned is MaybeUninit (https://doc.rust-
         | lang.org/std/mem/union.MaybeUninit.html), which is used to
         | represent values that are not fully initialized. It's worth
         | reading the documentation for that type.
        
       | fuhsnn wrote:
       | Copy or move for C++ is just choosing which
       | constructor/assignment overload to call. I believe it's possible
       | to make C++ move-by-default if one go through the trouble of
       | overloading every class you use with custom move procedures.
        
       | quietbritishjim wrote:
       | Most explanations of C++'s std::move fail because they don't
       | focus on its actual effect: controlling function overloading.
       | 
       | Most developers have no trouble getting the idea of C++'s
       | function overloading for parameter types that are totally
       | different, e.g. it's clear what foo("xyz") will call if you have:
       | void foo(int x);        void foo(std::string x);
       | 
       | It's also not too hard to get the idea with const and mutable
       | references:                  void foo(std::string& x);
       | void foo(const std::string& x);
       | 
       | Rvalue references allow another possibility:
       | void foo(std::string&& x);        void foo(const std::string& x);
       | 
       | (Technically it's also possible to overload with rvalue and non-
       | const regular references, or even all three, but this is rarely
       | done in practice).
       | 
       | In this pairing, the first option would be chosen for a temporary
       | object (e.g. foo(std::string("xyz")) or just foo("xyz")), while
       | the second would be chosen if passing in a named variable
       | (std::string x; foo(x)). In practice, the reason you bother to do
       | this is so the the first overload can pilfer memory resources
       | from its argument (whereas, presumably, the second will need to
       | do a copy).
       | 
       | The point of std::move() is to choose the first overload. This
       | has the consequence that its argument will probably end up being
       | modified (by foo()) even though std::move() itself does not
       | contain any substantial code.
       | 
       | All of the above applies to constructors, since they are
       | functions and they can also be overloaded. Therefore, the
       | following function is very similar in most practical situations
       | since std::string has overloaded copy and move constructors:
       | void foo(std::string x);
        
         | rocqua wrote:
         | To clarify, you are saying the point of std::move is that it
         | returns an rvalue reference, allowing the called function to
         | pick the overload variant that is allowed to trample and
         | destroy it's argument?
         | 
         | Specifically, what you did not make clear is the return type of
         | std::move.
        
           | ryanianian wrote:
           | std::move is just a cast operation. A better name might be
           | std::cast_as_rvalue to force the overload that allows it to
           | forward to move constructors/etc that intentionally "destroy"
           | the argument (leave it in a moved-from state).
        
             | tialaramex wrote:
             | They don't destroy the argument - this is of course a big
             | problem because the semantic programmers actually wanted
             | (even when C++ 98 didn't _have_ move and papers were
             | proposing this new feature) was what C++ programmers now
             | call  "destructive move" ie the move Rust has. This is
             | sometimes now portrayed as some sort of modern idea, but it
             | actually was clearly what everybody wanted 15-20 years ago,
             | it's just that C++ didn't deliver that.
             | 
             | What they go was this awful compromise, it's not destroyed,
             | C++ promises that it will only finally be destroyed when
             | the scope ends, and always then, so instead some "hollowed
             | out" state is created which is some state (usually
             | unspecified but predictable) in which it is safe to destroy
             | it.
             | 
             | Creating the "hollowed out" new state for the moved-from
             | object so that it can later be destroyed is not zero work,
             | it's usually trivial, but given that we're not gaining any
             | benefit by doing this work it's pure waste.
             | 
             | This constitutes one of several unavoidable performance
             | leaks in modern C++. They're not huge, but they're a
             | problem when you still have people who mistake C++ for a
             | performance language rather than a language like COBOL
             | focused intently on compatibility with piles of archaic
             | legacy code.
        
               | Maxatar wrote:
               | Thanks for pointing this out. It's an absolute myth that
               | C++ move semantics are due to backwards compatibility.
               | The original paper on move semantics dating back 2002
               | explicitly mentions destructive move semantics by name:
               | 
               | https://www.open-
               | std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
               | 
               | It does bring up an issue involving how to handle
               | destructive moves in a class hierarchy, and while that's
               | an issue, it's a local issue that would need careful
               | consideration only in a few corner cases as opposed to
               | the move semantics we have today which sprinkle the
               | potential for misuse all over the codebase.
        
           | shortrounddev2 wrote:
           | You can trample and destroy a regular lvalue reference as
           | well. The point of casting to an rvalue reference (and
           | invoking the rvalue reference constructor) is to copy an
           | pointer to the underlying data of one container to a new
           | container and then delete the pointer on the original
           | container (set it to null, not destroy the data). This has
           | the effect of transferring ownership of the underlying data
           | from one container to the other. You can do this with an
           | lvalue reference as well, but the semantics are different.
           | 
           | This is useful for copying the data of a temporary string to
           | another string without actually copying each byte of the
           | data. Since the underlying characters live in the heap,
           | there's no point in copying each byte to a new area in the
           | heap. Instead, use move semantics to transfer ownership of
           | the pointer to a new string container
        
       | khold_stare wrote:
       | I see some confusion in the comments about C++ moves. I wrote an
       | article in 2013 after it clicked for me:
       | https://kholdstare.github.io/technical/2013/11/23/moves-demy... .
       | It goes over motivation, how it works under the hood etc, has
       | diagrams if you are a more visual learner.
        
       | pjmlp wrote:
       | > We learned that working on pointers directly often leads to
       | memory bugs. So we introduced references.
       | 
       | Minor pedantic correction, references predate having pointers all
       | over the place, in most systems languages.
       | 
       | C adopting pointers for all use cases isn't as great as they
       | thought.
        
       | Thorrez wrote:
       | >I compiled the C++ examples with godbolt with "x86-64 gcc
       | (trunk)" and "-Wall -Wextra -Wno-pessimizing-move -Wno-redundant-
       | move".
       | 
       | Edit: everything below is incorrect.
       | 
       | -Wno-pessimizing-move is automatically enabled by -Wall, so
       | doesn't need to be specified manually. -Wno-redundant-move is
       | automatically enabled by -Wextra, so doesn't need to be specified
       | manually.
        
         | quuxplusone wrote:
         | -Wno-foo is turning _off_ those warnings, not turning them on.
        
           | Thorrez wrote:
           | Wow, thanks. The gcc documentation appears to have a problem.
           | 
           | It lists -Wreorder as a warning, and says it's enabled by
           | -Wall . It lists -Wno-pessimizing-move as a warning, and says
           | it's enabled by -Wall .
           | 
           | I think the documentation should be edited to not list -Wno-
           | pessimizing-move , and instead list -Wpessimizing-move .
           | 
           | https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dia.
           | ..
        
       | cpp_noob wrote:
       | struct Person {         string name;         uint8_t age;
       | };
       | 
       | isn't this missing a move constructor?
       | Person::Person(Person&& p) : name(std::move(p.name)), age(p.age)
       | {}
       | 
       | or is C++ able to make these implicitly now?
        
         | Maxatar wrote:
         | The move and copy constructors are implicit.
        
       | nayuki wrote:
       | Some basic things in the article appear to be factually wrong.
       | 
       | > Then we ask us the following questions:
       | 
       | > 1. When we passed Dave to show, did we create a copy?
       | 
       | > 2. If so, how do we avoid creating a copy?
       | 
       | > C++ example
       | 
       | > 1. Yes. You can insert cout << "Person record is at address "
       | << &p << endl; before the call of show as well as the beginning
       | of show. This reveals different memory addresses of the record.
       | 
       | Judging copies by the object's address is incorrect methodology.
       | In both C++ and Rust, "moving" an object will still copy the
       | struct fields, but will avoid copying any of the pointees (such
       | as the variable-size array that the string owns).
       | 
       | > 2. Replace void show(Person person) with void show(Person&
       | person). So only the function needs to change. The caller does
       | not have to adapt to it.
       | 
       | Passing by reference is a different concept to moving. While the
       | author used this approach for C++, they did not use the same
       | approach for Rust. This is comparing apples to oranges.
        
         | ajross wrote:
         | > In both C++ and Rust, "moving" an object will still copy the
         | struct fields, but
         | 
         | Most people consider a shallow copy a "copy", certainly a
         | shallow copy isn't a "reference"! One of the big problems in
         | this space is in fact the divergence of terminology that leads
         | to arguments like this.
         | 
         | The introduction of move semantics to C++ was a terrible,
         | terrible mistake; not because it doesn't solve a real problem
         | but because the language is objectively much worse now as a
         | routine tool for general developers. People used to hack on
         | code to implement features, now they get confused over and
         | argue about how many "&" characters they need in a function
         | signature.
         | 
         | It was a problem that was best left unsolved, basically.
        
       | Night_Thastus wrote:
       | I can't say examples like this sell me on Rust, coming from C++.
       | I need to manually to_string(), every single time I want to use
       | strings?
       | 
       | And that bizarre scoping of Person p feels very un-intuitive. How
       | would you work around that if you need to keep using it after
       | show()? (Which is an extremely common use case)
        
         | winrid wrote:
         | to_string() gives you an owned string (like std::string) vs a
         | borrowed string slice (kind of like char*). If you already have
         | an owned string you don't need to do that obviously
         | 
         | If you need to keep using Person after calling show() then
         | don't pass ownership to show() - you can pass a reference or a
         | mutable reference, or use Rc<> etc
        
         | aseipp wrote:
         | A raw string literal gets embedded into the binary's data
         | section at compile time, just like it would in C or C++. What
         | this means is that the type of the string literal is actually a
         | reference (to an underlying memory address). And so it has type
         | '&str' which reflects the fact you are using a reference to a
         | value that exists somewhere else.
         | 
         | The type 'String' is instead an "owned" type, which means that
         | it is not a reference, and instead a complete value and has a
         | copy of the data. to_string() will create a String (owned
         | value) from a &str (reference) by copying it. This is no
         | different than if you had a global static compile-time string
         | in C and you wanted to modify or update it: you would memcpy
         | the global (statically allocated) string into a local buffer of
         | the appropriate size and then modify it and pass it onward to
         | other things that need it. You would not modify the static
         | string in place.
         | 
         | In short, no, you do not need to_string() every time you want
         | to work with a string. You need it to convert a reference type
         | to an owned type. Rust's type system is just used here to
         | codify the more implicit parts of C or C++'s behavior that you
         | are already familiar with, but the underlying bits and bytes
         | behave as you would expect coming from C++.
         | 
         | > And that bizarre scoping of Person p feels very un-intuitive.
         | How would you work around that if you need to keep using it
         | after show()
         | 
         | You take a reference just like you would in C++. Possibly a
         | mutable reference if you want to modify the thing and then use
         | it afterwords. This is in the article as the "Advanced rust
         | example" at the end, it's right there and not hidden or
         | anything.
         | 
         | It isn't really bizarre honestly; it's a matter of defaults.
         | The difference is that Rust uses move-by-default, not copy-by-
         | default or ref-by-default. Every time you write `x = y` for a
         | given owned type, you are doing a move of `y` and into `x` and
         | thus making `y` invalid.                   let g: &str   =
         | "Austin";      // statically allocated string         let x:
         | String = g.to_string(); // do a copy         let y: String = x;
         | // no copy, x is moved
         | 
         | Once you internalize this a lot more stuff will make sense, or
         | at least it did for me.
        
       | w10-1 wrote:
       | Hasn't a language feature failed if even experts disagree on it?
       | How would lay developers ever use it? This is not an algorithmic
       | nicety; it's supposed to be second nature to write and automatic
       | to read.
       | 
       | And it seems weird to omit Swift from this comparison, since
       | Swift seems to have the most user-friendly (but incomplete?)
       | implementation of move-only types.
        
         | Maxatar wrote:
         | Not even the people who implement C++ compilers can agree on
         | how certain C++ features are supposed to work.
        
       | enugu wrote:
       | In this discussion of a specific point in the post, the promise
       | of Hylo language and mutable value semantics can be overlooked.
       | 
       | Namely, we get a lot of the convenience of functional programming
       | (mutating one variable doesn't change any other variable) with
       | the performance of imperative languages (purely functional data
       | structures have higher costs relative to in-place mutation and
       | are more gc-intensive).
       | 
       | https://docs.hylo-lang.org/language-tour/bindings
        
       ___________________________________________________________________
       (page generated 2024-12-05 23:01 UTC)