[HN Gopher] What is `Box<str>` and how is it different fro...
___________________________________________________________________
What is `Box<str>` and how is it different from `String` in
Rust?
Author : asimpletune
Score : 103 points
Date : 2022-06-24 09:54 UTC (1 days ago)
(HTM) web link (mahdi.blog)
(TXT) w3m dump (mahdi.blog)
| sirwhinesalot wrote:
| It's unfortunate that strings are badly named in rust. They got
| that better with Path and PathBuf.
|
| str is fixed size, like a Java String
|
| String is growable, like a Java StringBuilder
|
| After that, we get into memory ownership, with &str not owning
| memory, and Box<str> owning memory, but you rarely need the
| latter, so it's really &str vs String that you need to care
| about.
|
| EDIT: changed immutable to fixed and mutable to growable to
| better reflect the real difference, though typically you almost
| always use immutable &str and &mut String. I thank the commenters
| below for pointing it out, I don't want to make the problem even
| more confusing than it already is.
| Arnavion wrote:
| String used to be StrBuf first. The rename to String was
| intentional because String was the more commonly known name in
| other languages.
|
| https://rust-lang.github.io/rfcs/0060-rename-strbuf.html
| lifthrasiir wrote:
| Note that this is a very old RFC and doesn't have much
| context and discussion compared to later RFCs. It is
| worthwhile to read the actual discussion happened [1].
|
| [1] https://github.com/rust-lang/rfcs/pull/60
| howinteresting wrote:
| This was a mistake. Having str and StrBuf would have been
| significantly less confusing than str and String.
| steveklabnik wrote:
| I often joke that this is the only change I'd desire for a
| Rust 2.0.
| OJFord wrote:
| What about aliasing it, marking String as deprecated in
| docs, 'please use StrBuf'? (Clippy warning, etc.)
| steveklabnik wrote:
| In theory you could do something like this, but it would
| be a _lot_ of churn for a questionable amount of gain. I
| probably wouldn 't support it today; Rust is past being
| able to make these sorts of changes imho.
| sirwhinesalot wrote:
| Unfortunately, judging by the fact so many people are still
| confused about it, it was a mistake. Having a shorthand for
| something (str) and that thing (String) be different things
| was dumb, and someone brought that up in the discussion at
| the time but I guess hindsight is 20/20.
|
| C++ has std::string and std::string_view which makes a loads
| more sense.
|
| Java and C# have StringBuilder and String.
|
| Go has strings.Builder and string.
|
| Objective-C/Cocoa has and NSMutableString and NSString.
|
| ADA has Unbounded_String, Bounded_String and Fixed_String for
| different use cases.
|
| Rust has by far the worst naming.
| kzrdude wrote:
| I guess C++ has the best names after all, Rust should have
| emulated those (except it couldn't - string_view came after
| Rust and maybe even was inspired by Rust.)
| cmrdporcupine wrote:
| Chromium's C++ StringPiece dates back to at least 2012,
| and pretty sure Google had something similar (I forget
| that name) to it in Google3's C++ base library (which
| became abseil's string_view) before that even.
|
| I seem to recall Boost may have had a string_view pretty
| far back, too.
|
| https://chromium.googlesource.com/chromium/src/base/+/mas
| ter...
|
| https://github.com/abseil/abseil-
| cpp/blob/master/absl/string...
| nicoburns wrote:
| Personally I'd prefer String/StringView (and potentially Path
| and PathView), but I guess that ship has sailed.
| Blikkentrekker wrote:
| I find that this explanation does not do justice
|
| The important part is that `str` is a dynamically sized type as
| it's called. What it is is simply a region of memory, of any
| size, containing UTF8. Since it is dynamically sized various
| constraints are placed onto it which in practice come down to
| that it can only really be passed around at runtime by being
| behind a pointer and is hard to directly put on the stack.
|
| `String` is three words, two words are aequivalent to a "fat
| pointer" to a `str`, as in one word for the address, and the
| other for the size, which is how Rust deals with dynamically
| sized types in general, and the third word denotes the capacity
| of memory allocated to the `String` which it uses to know when
| to reallocate.
|
| `str` is neither mutable nor immutable which isn't part of it's
| type, `&str` is immutable, and `&mut str` is mutable. It's
| perfectly possible in Rust to mutate a `str` if one obtains a
| mutable, or perhaps better called exclusive reference to it
| somehow, but the mutations that can be performed are very
| limited since the size cannot easily grow.
|
| This is where `String` comes in, which guarantees that the
| space after the `str` pointed to it, the size of it's
| "capacity" third word is not used by anything else, and thus it
| can grow more easily by manipulations.
|
| There are some limited mutation methods on `&mut str` in Rust,
| such as `make_ascii_uppercase`, which converts all lowercase
| ascii letters to uppercase, which is perfectly fine, since this
| operation is guaranteed to not ever increase the size of the
| `str`, but with unicode such a guarantee no longer applies and
| one needs a `String`.
|
| That being said, yes, I would have favored for `String` to be
| called `StrBuf`, and `Vec` `SliceBuf` instead.
| sirwhinesalot wrote:
| Sure, if you want to be truly specific about it and not do a
| Java analogy ;)
| aliceryhl wrote:
| The difference has to do with ownership, and it has nothing to
| do with mutability. For _both_ types, you can mutate them given
| a mutable refence, and you can 't given an immutable reference.
|
| For an example, an `&mut str` can be modified via various
| methods such as make_ascii_uppercase.
| sirwhinesalot wrote:
| Nope, not ownership either, Box<str> and String both own
| their memory, the different is fixed size vs growable :)
|
| But you're right, I edited my post to reflect this, the Java
| analogy is pretty strained as it is.
| Macha wrote:
| I believe the parent poster was comparing &str and String,
| not Box<str> and String.
| marcosdumay wrote:
| > but you rarely need the latter
|
| AFAIK, it's because people go with String when what they
| actually mean is Box<str>. Since they have similar costs,
| nobody ever sees the need to change it, and the String type
| does have a much better name.
|
| But the need is there all the time. People just satisfy it
| differently.
| sirwhinesalot wrote:
| I think it's mainly because unlike Java, where a
| StringBuilder is effectively an optimisation over
| concatenating Strings, in Rust managing that memory would be
| a total pain, so you tend to keep the mutable thing around.
|
| Once that happens, Box<str> becomes kinda unnecessary. There
| are many cases where it would be the correct type, for
| example reading from a file in a read-only manner, but most
| of the time you're going to be doing _something_ to that
| text, so it makes more sense to just load it up as a String
| already and avoid the unnecessary copy.
|
| Either way, it's mostly a naming problem. &str/String sucks
| :(
| fpoling wrote:
| String in Rust is very similar to std::string in C++, while str
| is std::string_view except it is safe to use.
|
| StringBuffer in Java is not like String in Rust. In particular,
| one cannot pass StringBuffer in Java to a function taking
| String, while both Rust and C++ allow to implicitly convert the
| string backed by a heap into the corresponding read-only view.
| sirwhinesalot wrote:
| Strings in Java own their memory, they aren't views, they're
| closer to Box<str>. That's why you can't implicitly convert a
| StringBuilder into one.
|
| I know this, I'm not the one you need to explain it too, it's
| Rust newbies. So many problems would have been avoided with
| Str/StrBuf or StrView/Str, but now the ship has sailed.
| rrobukef wrote:
| String in Java share their memory with other substrings of
| the same allocation. They are views.
| cesarb wrote:
| IIRC, that used to be the case, but recent Java releases
| changed it so that memory is no longer shared with
| substrings. The former behavior could cause some extreme
| memory leaks (unless you were very careful to always
| manually duplicate each substring); a one-character
| substring could keep a multi-megabyte memory allocation
| alive. See for instance
| https://stackoverflow.com/questions/33893655/string-
| substrin... which discusses this issue.
| OJFord wrote:
| If OP is here, then in this listing: let
| boxed_str: Box<str> = "hello".into(); println!("size of
| boxed_str on stack: {}", std::mem::size_of_val(&boxed_str));
| let s = String::from("hello!"); println!("size of string
| on stack: {}", std::mem::size_of_val(&s));
|
| I know it's not the point and doesn't make a difference, but you
| might want to make the two 'strings' the same (not with & without
| '!'), just to be clearer.
| umanwizard wrote:
| This might clarify the situation, for C or C++ folks:
| // heap-allocated, fixed-size struct BoxStr {
| unsigned length; // INVARIANT: this points to a heap
| allocation of length bytes, and is valid utf8
| unsigned char *data; } // heap-allocated,
| resizable struct String { unsigned length;
| unsigned capacity; // INVARIANT: heap allocation of
| capacity bytes, the first length of which are valid utf8
| unsigned char *data; }
|
| Of course you _could_ resize BoxStr, but only by reallocating
| `data` to the exact desired length every time, which will kill
| your asymptotic complexity.
| tylerhou wrote:
| Is your first example really equivalent to Box<str>? I would
| have expected something like using BoxStr =
| std::unique_ptr<Str>;
|
| where Str is defined as struct Str {
| size_t len; char data[]; };
|
| The difference is that the len is stored on the heap, and the
| data is stored inline with the length. Unfortunately C++ does
| not support flexible array members so this syntax is not
| actually valid.
|
| Edit: Never mind, after reading the article Rust does use the
| above representation because Box holds a "fat" pointer to str,
| which stores it's length on the stack. So BoxStr is the correct
| equivalent, because &[u8] is not equivalent to u8*, it's
| equivalent to std::span<u8>.
| steveklabnik wrote:
| Your parent is correct, the length is stored alongside the
| pointer, not on the heap with its data. This is true for any
| "dynamically sized type," not just Box<str>. &str is also a
| (pointer, length) pair, for example.
| the__alchemist wrote:
| I'm working on a PC-based configuration for a drone flight
| controller. PC-side is std Rust with a stack available. Firmware
| is `no-std`, running on a microcontroller. It has waypoints you
| can program when connected to a PC using USB. They have names
| that need to be represented as some sort of string.
|
| I'm using `u8` arrays for the strings on both sides; seems the
| easiest to serialize, and Rust has `str::from_utf8` etc to handle
| conversion to/from the UI.
|
| `String` is unsupported on the MCU side since there's no
| allocation. I find this low-level approach ergonomic given it's
| easy to [de]serialize over USB.
| sampo wrote:
| Title is: What is Box<str> and how is it different from String in
| Rust?
| dang wrote:
| Fixed now. Thanks!
| codedokode wrote:
| Is there official documentation about what `str` (without an
| ampersand) is? For example, documentation [1] says that `str` is
| a "string slice" (without explaining what "string slice" mean),
| and then goes on with description of &str.
|
| And a book on Rust [2] says:
|
| > A string slice is a reference to part of a String
|
| This seems wrong, because &str can reference static strings which
| are not String. And if str, or "string slice" is a "reference",
| then &str is a reference to a reference?
|
| And later:
|
| > The type that signifies "string slice" is written as &str
|
| But the documentation said that "string slice" is str, not &str.
|
| Also, I wonder, what do square brackets mean when they are used
| without an ampersand (as s[0..2] instead of &s[0..2])?
|
| Also, is an ampersand in &str the same as an ampersand in &u8
| (meaning an immutable reference to u8) or does it have other
| meaning?
|
| [1] https://doc.rust-lang.org/std/primitive.str.html
|
| [2] https://doc.rust-lang.org/book/ch04-03-slices.html#string-
| sl...
| [deleted]
| LegionMammal978 wrote:
| > Is there official documentation about what `str` (without an
| ampersand) is? For example, documentation [1] says that `str`
| is a "string slice" (without explaining what "string slice"
| mean), and then goes on with description of &str.
|
| A `str` is really just a `[u8]` with extra semantics. Thus, a
| `&str` is really a `&[u8]`, a `&mut str` is a `&mut [u8]`, a
| `Box<str>` is a `Box<[u8]>`, etc. So we call it a "string
| slice", since it mostly acts like a regular `[T]` slice.
|
| In general, the term "slice" can either refer to the unsized
| type `[T]` or the reference `&[T]`/`&mut [T]` interchangeably.
| You could also call the latter a "slice reference" where the
| distinction is important; e.g., a `Box<[T]>` would be a "boxed
| slice", while `Box<&[T]>` would be a "boxed slice reference" or
| "boxed reference to a slice". But most of the time, the correct
| meaning can be inferred from context.
|
| > Also, I wonder, what do square brackets mean when they are
| used without an ampersand (as s[0..2] instead of &s[0..2])?
|
| `s[0..2]` is a place expression that refers to the raw `str`
| subslice. But since `str` is an unsized type [0], it cannot
| appear on its own; it must appear behind some reference type.
| Thus, `&s[0..2]` creates a `&str`, and `&mut s[0..2]` creates a
| `&mut str`. However, the ampersand isn't always necessary: you
| can write `s[0..2].to_owned()` to use the `str` as a method
| receiver, which implicitly creates a reference.
|
| [0] https://doc.rust-lang.org/book/ch19-04-advanced-
| types.html#d...
| ruuda wrote:
| The & in &str is like the & in &[u8], str is like [u8] (an
| unsized type), not like u8. A &str is a "fat pointer" (pointer
| + length), unlike &u8 which is a regular "thin" pointer.
| FullyFunctional wrote:
| This is missing a conversation about
| https://lib.rs/crates/compact_str (and a few alternatives like
| it). TL;DR: String takes the space of three pointers, that is, 24
| bytes on 64-bit archs. compact_str fits up to 24 byte strings in
| the same space and reverts to String for longer strings.
|
| ADD: that is, avoids heap allocation for those, unlike both
| Box<str> and String.
| tialaramex wrote:
| Box<str> is still going to be smaller _if_ you know how big the
| text is because (unlike CompactString and String) it doesn 't
| need to carry a capacity value. In exchange of course you can't
| append things to it (without re-allocating)
|
| CompactString is a very clever+ SSO implementation, and I'll
| remember it is there if I run into a situation where it might
| help but I firmly agree with Rust's choice _not_ to implement
| the SSO optimisation in the standard library 's String type.
|
| + Storing 23 UTF-8 codepoints as one of several representations
| in a 24 byte data structure makes sense, you can see how to
| write a fairly safe SSO optimisation for Rust which does that,
| but the CompactString scheme relies on the fact Rust's strings
| are by definition UTF-8 encoded to squeeze the discriminant
| into the same space as the last possible byte of an actual
| UTF-8 string, so it can store a 24 byte value like
| "ABCDEFGHIJKLMNOPQRSTUVWX" inline despite also distinguishing
| the case where it needs a heap pointer for larger strings.
| That's very clever.
| rtfeldman wrote:
| > I firmly agree with Rust's choice not to implement the SSO
| optimisation in the standard library's String type.
|
| Out of curiosity, why is that?
|
| I don't know much about how or why that decision was made,
| but I'm curious.
| lifthrasiir wrote:
| SSO means that pretty every string operation has multiple
| code paths, which can be highly unpredictable. Basically it
| is a trade-off between memory usage and performance, and
| the standard library is not really a good place to make
| that trade-off. By comparison many C++ codes (still) copy
| strings all over the place for no good reason, so SSO in
| the standard library has a much greater appeal.
| pornel wrote:
| A nice thing is that all string types have &str as the lowest
| common denominator, so even if you use SSO or on-stack or any
| other fancy string type, it's automatically compatible with
| almost everything.
| terhechte wrote:
| I recently gave a Rust workshop to Kotlin and Swift developers.
| Strings in Rust are a really, really difficult topic for complete
| newcomers because they're understood as a basic type whereas in
| Rust they require having read half the Rust book to grasp.
|
| Consider: I can teach a lot of Rust basic with `usize`. Defining
| funcions, calling functions, enums because they're `Copy` and
| because there's only one type. String requires knowing about &str
| which requires knowing about deref which requires knowing about
| (&String -> &str), it also requires understanding lifetimes,
| moving, heap and stack, cloning. Then, if you want to work with
| the file system you also need to understand Paths, OsString and
| AsRef.
|
| With Kotlin and Swift, for all these things, you really just need
| one type, String, and you handle it just like usize.
|
| It is really a bid of a hurdle for new developers coming from
| higher level languages (especially if they just give it a quick
| try).
| klabb3 wrote:
| Don't worry. As soon as you explain to them that appending to a
| PathBuf is O(1) amortized they'll come around, and it will
| scale much better for all their GB-sized file paths.
|
| I guess this adds a prerequisite on complexity theory but
| nobody should go anywhere near advanced data structures like
| strings with less than a bachelor in CS.
| lijogdfljk wrote:
| Makes me wonder if there could be room for a SimpleString
| library.
|
| I love/use Rust. I don't think any of this is complicated. BUT,
| i'm a big fan of just "clone your problems away" for beginner
| Rust users. Going knee deep into techniques which merely reduce
| memory usage when people likely don't actually care - at all -
| about it just feels wrong to me.
|
| So yea, maybe a cursed library where SimpleString is just some
| niceties around some Cow + Arc thing which is also Copy. Hell,
| you could probably just apply it Vec and who knows what else.
|
| Anyway, clearly not something i'm advocating anyone _really_
| use. But it seems a nice way to make stuff "Just Work" in the
| beginning.
| kzrdude wrote:
| Some weird construction around Cow + Arc that is also Copy is
| not really possible in Rust, I'm sorry to report. No way to
| implement it and even if you could (you technically "can" by
| reimplementing most of Cow and Arc) - the result is not
| useful, the destructor of it doesn't work.
| codedokode wrote:
| But Rust is designed to write high-performance code. If you
| don't care about performace, you don't really need Rust.
| Swift or Go seem more readable and easier to use.
| pjmlp wrote:
| Swift is pretty much about performance, as replacement for
| C, C++ and Objective-C in the Apple ecosystem, it is even
| on Apple's official sites.
|
| What Apple isn't willing to do is sacrifice productivity
| while achieving that goal.
| howinteresting wrote:
| Swift is well-designed but is virtually non-existent
| outside of Apple platforms, so it doesn't have nearly the
| third-party ecosystem that Rust does. Go has the third-
| party ecosystem but is poorly designed and doesn't have
| basic language features like sum types.
|
| Rust is likely the best combination of thought-out design
| and ecosystem support that exists in a programming language
| today.
| pjmlp wrote:
| Rust is also pretty much focused on Linux workloads,
| mostly.
|
| Also the Apple ecosystem has plenty of third parties,
| including commercial libraries.
| jeroenhd wrote:
| Interestingly, Microsoft is also pushing Rust quite hard
| with special API packages, tutorials, and even some IDE
| integration. Windows tools are often closed source,
| though, so you'll probably never notice it if your
| favourite tool uses Rust or not.
| agumonkey wrote:
| rust has one uphill battle in the mainstream adoption is that a
| lot of things make sense if you wrote bare metal code. If not
| then it can be very confusing.
| tialaramex wrote:
| I think I'd recommend teaching Move semantics not Copy
| semantics from the outset, because Move semantics work fine
| everywhere in Rust and the Copy semantics are just an
| optimisation. As you've found, if you teach Copy then for types
| which aren't Copy you now need to teach Move.
|
| Languages like Kotlin and Swift are doing a _lot_ of lifting to
| deliver this behaviour for String, and of course they can 't
| keep it up, so students who've done more than a little Kotlin
| or Swift will be aware of the idea of "reference semantics" in
| those languages where most of the objects they use do not have
| the behaviour they've seen in String which is instead
| pretending to be a value type like an integer.
|
| Again, if you only teach Move, you're fine. After not very long
| a student will wonder how they can duplicate things (since they
| didn't know Copy), and you can show them Clone. Clone works
| everywhere. Is cloning a usize idiomatic Rust? No it is not.
| Does it work just fine anyway? Of course it does! And of course
| Clone is implemented for String, and for most types beginners
| will ever see.
| hgomersall wrote:
| Are copy semantics always used in place of move semantics for
| a Copy type? I didn't know that.
| [deleted]
| tialaramex wrote:
| Literally all that Copy does is it says after assignment
| the moved-from variable can still be used. So in this
| sense, sure, these semantics are "always used". But if you
| don't use the variable after assigning from it, you could
| also say the semantics aren't used in this case. Does that
| help? Copy does a _lot_ less than many people think it
| does.
|
| If you're a low level person it's apparent this is because
| Copy types are just some bits and their meaning is
| literally in those bits, _Copy_ the bits and you 've copied
| the meaning. Thus, this "it still works after assignment"
| Copy behaviour is just how things would work naturally for
| such types. But Rust doesn't require programmers (and
| especially beginners) to grok that.
|
| It's possible to explain Copy semantics first in a way
| that's easier to grasp for people coming from, say, Java,
| but that's only half the picture because your students will
| soon need Move semantics which are different. Thus I
| recommend instead explaining Move semantics from the outset
| (which will be harder) and only introducing Copy as an
| optimisation.
|
| I think this might even be better for students coming from
| C++, because C++ move semantics are a horrible mess, so
| underscoring that Move is the default in Rust and it's fine
| to think of every assignment as Move in Rust will avoid
| them getting the idea that there must be secret magic
| somewhere, there isn't, C++ hacked these semantics in to a
| finished language which didn't previously have Move and
| that's why it's a mess.
|
| I'm less sure for people coming from low-level C. I can
| imagine if you're going to work with no_std on bare metal
| you might actually do just fine working almost entirely
| with Copy types and you probably need actual bona fide
| pointers (not just references) and so you end up needing to
| know what's "really" going on anyway. If you're no_std you
| don't have a String type anyway, nor do you have Box, and
| thus you can't write Box<str> either, although &str still
| works fine if you've burned some strings into your firmware
| or whatever.
| afdbcreid wrote:
| This isn't really something you usually encounter, but I
| have to bring this cute example: pub fn
| foo() -> impl FnOnce() { let non_copy: String =
| String::new(); let copy: i32 = 123;
| || { drop(non_copy); // Works
| drop(copy); // error[E0373] } }
|
| https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
| lumost wrote:
| Rust strings are difficult for others coming from statically
| typed and low level languages as well.
|
| It's one of the types programmers will most often encounter,
| and yet it's one of the most obtuse topics within rust.
| k__ wrote:
| I remember strings being "not so easy" in C/C++ too.
| oconnor663 wrote:
| I think the big differences are that copying and reference
| taking are automatic and invisible in C++. So a lot of APIs
| taking string or string& will "just work" for the
| beginners, and you can delay the part where you talk about
| how different those things are.
|
| This sounds like a minor difference, but I've met lots of
| developers who do meaningful work in C++ but who don't know
| what a copy constructor is. I get the impression that
| there's an enormous difference between being a C++ "user"
| vs a "library writer", because there's so much automatic
| stuff happing under the covers.
|
| Rust tends to have a bit less invisible complexity, I
| think, but some of that difference is just making the
| complexity visible (like reference taking), which
| effectively frontloads it onto beginners. It's a tough
| tradeoff.
| jokethrowaway wrote:
| After haskell strings, rust strings actually felt reasonable
| nicoburns wrote:
| On the plus side, String makes a really good example to explain
| ownership, moving, stack vs heap, etc. All of which you need at
| least a basic understanding of to do anything non-trivial in
| Rust.
|
| I kind of feel like it goes without saying that Rust isn't
| ideal for beginners. For developers who already have a good
| knowledge of other languages I feel like learning about these
| things shouldn't be a problem, as becoming familiar with these
| concepts is one of the main benefits of learning Rust.
| smaddox wrote:
| > I kind of feel like it goes without saying that Rust isn't
| ideal for beginners.
|
| I think that depends on, first, what the goal is, and second,
| what you're comparing to. It think Rust is easier on
| beginners, in many ways, than C. And C is easier on
| beginners, in many ways, than assembly or machine code. But
| if you want to really understand computer programming,
| starting at machine code or at least assembly isn't a crazy
| way to start.
| tialaramex wrote:
| Beginning with machine code for some simple architecture
| (maybe RISC-V these days?) might be one good route in.
|
| I can also see (having experienced it myself, albeit I
| already knew C etc. these were not requirements and many of
| my classmates did not) beginning with a pure functional
| language where all the practicalities are abstracted
| entirely.
|
| Today the University where I learned this begins with Java,
| which I am confident is the wrong choice, but the person
| who part-designed their curriculum, and is a friend,
| disagrees with me and he's the one getting paid to teach
| them.
| msla wrote:
| > But if you want to really understand computer
| programming, starting at machine code or at least assembly
| isn't a crazy way to start.
|
| I've long suspected that the CS field was founded on two
| approaches: The people who started from EE and worked their
| way up, and the people who started from Math and worked
| their way down. The former people think assembly is the
| "real" way to approach software, and probably view C++ as
| "very high-level", whereas the latter people think everyone
| should start with a course on the lambda calculus and type
| systems and gradually ease into Haskell, work down to Lisp,
| and then maybe deign to learn Python for * _shudder_ *
| numerical work.
| nicoburns wrote:
| I'd argue there's also a 3rd foundation of CS: language.
| Programming languages really are languages in the general
| sense of the word, and their purpose is to allow humans
| to effectively communicate with machines. Focussing on
| optimising that communication is the 3rd approach.
| nicoburns wrote:
| > It think Rust is easier on beginners, in many ways, than
| C. And C is easier on beginners, in many ways, than
| assembly or machine code. But if you want to really
| understand computer programming, starting at machine code
| or at least assembly isn't a crazy way to start.
|
| I mean sure. But equally, starting with Python isn't a
| crazy way to start. And Python is much easier language to
| learn than any of those (esp. if you want to actually
| create something practical with it).
| hgomersall wrote:
| Sure, but if your objective is systems programming,
| you'll probably quickly get to the point of realising
| python is not the right choice.
| pjmlp wrote:
| Depends, if writing a compiler is still considered
| systems programming in modern times.
|
| https://www.amazon.com/Writing-Interpreters-Compilers-
| Raspbe...
| less_less wrote:
| Compilers are their own beast -- I wouldn't put them with
| systems code. They're pretty different from an OS, BLAS,
| machine learning kernel, game engine, network stack,
| database or what have you. There's not as much buffer
| management, speed and memory aren't usually as critical,
| you don't make direct syscalls, many structures are
| graphs rather than arrays, etc. They often aren't even
| multithreaded.
|
| It's also popular to write compilers in distinctly
| non-"systems-y" languages, most notably Standard ML but
| also eg Haskell, and lots of languages are self-hosted.
| nicoburns wrote:
| If your objective is specifically systems programming
| then you'll quickly outgrow python, but I'm not convinced
| that makes it the wrong starting point. For systems
| programming you'll likely need _both_ high-level and low-
| level programming concepts. Learning low-level first is
| absolutely a valid path, but my point is that going high-
| level first is equally valid. People on the internet like
| to make out like someone who starts out by learning
| Python are incapable of later learning low-level
| concepts, but if anything they 're at an advantage
| compared with someone with no programming experience at
| all.
| nvrspyx wrote:
| This is just my opinion, but I can't imagine systems
| programming being the objective of any beginner. A
| beginner probably wouldn't even be able to differentiate
| systems programming from applications programming.
| jez wrote:
| Do any of the string types in the Rust standard library implement
| the same sort of small string optimization that C++ libraries
| implement for std::string? (explained here[1])
|
| Some quick searching turned up a few rust-lang internals posts
| and GitHub issues, but it was hard to see whether anything came
| of them.
|
| I understand that it's probably possible to implement a
| comparable String API in a crate that uses small string
| optimizations, but being able to avoid a dedicated crate makes
| interoperability with other libraries much easier.
|
| [1] https://tc-imba.github.io/posts/cpp-sso/
| aaaaaaaaaaab wrote:
| https://github.com/rust-lang/rust/issues/20198
| edflsafoiewq wrote:
| Not in std, no.
| steveklabnik wrote:
| Rust's standard library strings cannot because of a specific
| API, as_mut_vec, which is incompatible with the internal
| representation necessary to do SSO.
| 24bytes wrote:
| https://github.com/ParkMyCar/compact_str
|
| https://old.reddit.com/r/rust/comments/t33hxp/announcing_com...
| dochtman wrote:
| The tl;dr doesn't quite make sense to me. To me the core
| difference is that a Box<str> takes one less word on the stack,
| because by virtue of the str being immutable it doesn't need to
| track the capacity of the allocation as distinct from the length.
| This is analogous to Box<[u8]> vs Vec<u8> (and in fact those are
| the same data types except for the guarantee of valid UTF-8).
| tialaramex wrote:
| One notable difference is that ToOwned for &str gives you a
| String, whereas ToOwned for &[u8] gives you a [u8] by cloning
| the slice you have.
|
| In fact all four standard library types that are ToOwned
| without invoking Clone are more or less strings (str, CStr,
| OsStr, Path)
| tines wrote:
| C++ programmer here: which one guarantees valid utf8, and why
| would a primitive container make guarantees about the values
| it's storing?
| lifthrasiir wrote:
| Everything labelled as "string" is a valid UTF-8 string in
| Rust, and to my knowledge this decision was made very early
| in the history of Rust (before 0.1). Many "modern" languages
| (including modern enough C++) have a distinction between
| Unicode strings and byte strings however they are called and
| Rust just followed the suit.
| Animats wrote:
| "str" and "String" guarantee UTF-8. To make a String from an
| array of bytes, call pub fn from_utf8(vec:
| Vec<u8, Global>) -> Result<String, FromUtf8Error>
|
| which consumes the input Vec and returns it unmodified, if
| it's valid UTF-8,, or reports an error, if it's not. There
| are a number of related functions in this family. Such as
| pub fn from_utf8_lossy(v: &[u8]) -> Cow<'_, str>
|
| which takes in a slice of bytes and checks if it's a UTF-8
| string. If it is, it returns the original str. Otherwise it
| makes a copy with any errors replaced with the Unicode error
| character.
|
| Vec<u8> and array slices such as &[u8] are primitive
| containers - they can store any sequence of u8 values. String
| is more like an object with access methods.
| pornel wrote:
| The guarantee exists to speed up UTF-8 processing, so that it
| can safely assume working with whole codepoints/sequences
| (without extra out of bounds checks for every byte) and to
| ensure you can always losslessly roundtrip every string to
| and from other Unicode encodings without introducing any
| special notion of a broken character. There's also a security
| angle in this: text-processing algorithms may have different
| strategies for recovering from broken UTF-8, which could be
| exploited to fool parsers (e.g. if a 4-byte UTF-8 sequence
| has only 3 bytes matching, do you advance by 3 or 4 bytes?).
|
| Having the "valid UTF-8" state being part of the type system
| means it needs to be checked only once when the instance is
| created (which can be compile-time for constants), and
| doesn't have to be re-checked later, even if the string is
| mutated. Unlike a generic bag of bytes, the pubic interface
| on string won't allow making it invalid UTF-8.
| ntoskrnl wrote:
| > why would a primitive container make guarantees about the
| values it's storing
|
| If you know you have valid UTF-8, you can safely skip bounds
| checks when decoding a codepoint that spans multiple bytes.
___________________________________________________________________
(page generated 2022-06-25 23:00 UTC)