[HN Gopher] Making C++ safe without borrow checking, reference c...
___________________________________________________________________
Making C++ safe without borrow checking, reference counting, or
tracing GC
Author : jandeboevrie
Score : 173 points
Date : 2023-06-23 16:04 UTC (6 hours ago)
(HTM) web link (verdagon.dev)
(TXT) w3m dump (verdagon.dev)
| diabllicseagull wrote:
| lost me at the unordered map
| oleganza wrote:
| The reason I use Rust is because I can bypass all this messy
| business altogether and have my sensible patterns wrapped in a
| usable syntax and enforced by the compiler out of the box.
|
| Whenever people say "just follow these rules" I read "just add
| this extra mental burden and do not slip up". Computers were
| invented to automate things. Rust automates ownership and
| borrowing rules. Suggestions like "do not forget to initialize
| unique_ptr with something" are not intelligent solutions.
| kubb wrote:
| it's not about making C++ memory safe, but about describing a
| safe subset of C++
| pjmlp wrote:
| Ideally we would have -fsafe and [[unsafe]], but it will take
| years for something like that.
| derefr wrote:
| Presuming syntax for "unsafe" that gracefully degrades in
| non-aware compilers, why couldn't a particular compiler start
| doing it right now, starting with a very trivial safety
| checker than can be iteratively improved upon once the
| framework is in place?
| eslaught wrote:
| I feel like D has gone this route of incrementally adding
| features (like borrow checking) to the language that, in
| principle, improve safety.
|
| I wonder if anyone here has more experience to know how
| well it has worked?
|
| One massive advantage of Rust is that they started with
| borrow checking from the beginning. I think one thing that
| often gets understated in these discussions is how much it
| matters to have your entire ecosystem using a set of safe
| abstractions. This is a major drag for C++, and I suspect
| that even if the language went a route like D they'd still
| have gaping safety holes in practical, everyday usage.
| pjmlp wrote:
| It still hasn't, that has been unfortunely a common theme
| in D's evolution, chasing the next big idea that will
| this time bring folks into D, while leaving the previous
| ones half implemented with bugs.
|
| So now there is GC and @nogc, lifetimes but not quite,
| scoped pointers, scoped references,... while Phobos and
| ecosystem aren't in a state to fully work across all
| those variations.
| pjmlp wrote:
| You can have it today on Circle, but its relationship with
| some C++ folks is complicated.
| bluGill wrote:
| It is easy to say add unsafe. However the details are very
| complex. I've read a few of the papers proposing something
| like this, and they spend a lot of time discussing some
| nasty details that are important to get right.
| rdtsc wrote:
| In Rule 3: struct Ship { int fuel; };
| void print(Ship* ship) { cout << ship.fuel << endl;
| }
|
| Should that be "ship->fuel" instead?
| MagicMoonlight wrote:
| Deleting and re-adding each item from an array every time you use
| something seems like a massive pain
| winrid wrote:
| > "We'll instead take and return the vector directly"
|
| Won't this clone it?
| rbancroft wrote:
| Not necessarily, although it's a bit complicated to understand
| in C++.
|
| Starting with C++17, there is a feature called guaranteed copy
| elision that works for many/most scenarios that you would want.
| You need to read through the following resources to understand
| it fully:
|
| https://en.cppreference.com/w/cpp/language/copy_elision
| https://en.cppreference.com/w/cpp/language/value_category
| spoiler wrote:
| > Not necessarily, although it's a bit complicated to
| understand in C++.
|
| One could say this statement applies to most lines of C++
| code. Lol
| masklinn wrote:
| Copy elision exists, the author might just assume (or know)
| it'll trigger. The rules are way too arcane for me so I could
| not tell.
| azakai wrote:
| There is also Type-After-Type:
|
| https://dl.acm.org/doi/10.1145/3274694.3274705
|
| (though maybe that's covered by what the author meant by
| "arenas").
| imtringued wrote:
| RIP all the modern languages that haven't made any improvements
| in memory management at all.
|
| There is so much low hanging fruit in programming language design
| and nobody is picking it up and instead everyone produces
| marginal improvements over existing languages.
| antonvs wrote:
| Because implementing a new language and getting it to wide
| adoption is an enormously challenging task, with a much lower
| success rate than e.g. SV startups.
|
| Languages that try to implement one new bright idea don't go
| anywhere, because that's not enough to cause people to switch.
| At best they serve as examples for feature adoption in other
| languages.
|
| Look at Rust for example: it seems to be succeeding and gaining
| adoption, but right now it's still relatively niche (check the
| number of Rust job postings), and it's taken 17 years to get to
| this point, with sponsorship from major organizations like
| Mozilla.
|
| Given this, the idea that there's much low-hanging fruit that's
| being ignored, that could easily be exploited, seems dubious.
| What's an example of what you have in mind?
| kbenson wrote:
| > it's taken 17 years to get to this point
|
| Yes and no. Rust went through quite a bit of changes early
| on, ro the point that it's not really that similar of a
| language, and 1.0 was released in May 2015.
|
| That's still quite a while (8 years), but IMO doesn't quite
| mean the same thing as a language that's been around for 17
| years with a similar level of adoption. My impression (from
| the outside) is that Rust usage is still increasing, at least
| in specific areas, and has not leveled off or tapered. It
| doesn't seem to be exploding into lots of teams and places,
| but it does seem to be getting footholds still, like at
| Azure.
| tcmart14 wrote:
| While that is true about Rust, most new languages are gonna
| have the same thing. It'll be years before they get to 1.0.
| Look at Zig, just about every new language. So I don't
| think it is valid to discount the 1.0 days because all
| languages are gonna need awhile to get to the 1.0 day. It
| still took 17 years of time investment to get Rust to where
| it is today.
| imachine1980_ wrote:
| this is because programming languages have network effects, and
| are costly to move and test in real world case, you can use
| pony, but luck searching sdk, databases, performant compilers,
| and maintained libraries, the community aspect of programming
| languages ecosystems makes this, no matters how great it is if
| inst popular you will have hard time being a developer in it.
| that why most languages that works start in niche great
| scripting, good for data analysis, great for concurrent
| programming scala, and some of then like python then scale and
| other like scala or julia don't.
| pie_flavor wrote:
| > Borrow checking is incompatible with some useful patterns and
| optimizations (described later on), and its infectious
| constraints can have trouble coexisting with non-borrow-checked
| code.
|
| Not that this isn't true, but the rest of the article introduces
| a system with a superset of those limitations, gradually
| decreasing over time but never becoming a subset. In fact the
| pattern described in the article is a common pattern in Rust and
| I make use of it all the time; the library for making use of it
| is `slotmap`.
| [deleted]
| dxhdr wrote:
| > In fact the pattern described in the article is a common
| pattern in Rust and I make use of it all the time; the library
| for making use of it is `slotmap`.
|
| Slotmap uses unsafe everywhere, it's a memory usage pattern not
| supported by the borrow checker. It's basically hand-
| implementing use-after-free and double-free checks, which is
| what the borrow checker is supposed to do. Is that really a
| common pattern in Rust?
| dralley wrote:
| > Slotmap uses unsafe everywhere, it's a memory usage pattern
| not supported by the borrow checker. Is disabling the borrow
| checker really a common pattern in Rust?
|
| Wrapping "unsafe" code in a safe interface is a common
| pattern in Rust, yes. There is absolutely nothing wrong with
| using "unsafe" so long as you are diligent about checking
| invariants, and keep it contained as much as possible.
| Obviously the standard library uses some "unsafe" as well,
| for instance.
|
| "unsafe" just means "safe but the compiler cannot verify it".
|
| Unsafe does not disable the borrow checker, though. All of
| the restrictions of safe Rust still apply. All "unsafe" does
| is unlock the ability to use raw pointers and a few other
| constructs.
|
| https://doc.rust-lang.org/book/ch19-01-unsafe-
| rust.html#unsa...
| dxhdr wrote:
| It's essentially a "user-space" memory allocator with it's
| own use-after-free and double-free checks, apparently
| because the language implementation isn't adequate. If
| anything it just reinforces the articles point that "borrow
| checking is incompatible with some useful patterns and
| optimizations."
| junon wrote:
| Eh? This is a wild take. How do you draw the conclusion
| the default implementation is inadequate?
| dymk wrote:
| Because something like slotmap has to use `unsafe` to get
| around the inadequacies of the borrow checker...
| burntsushi wrote:
| A downside for sure, but one that, at least in this
| specific example, has limited downsides. If you can
| button it up into a safe abstraction that you can share
| with others, then I don't really see what the huge
| problem is. The fact that you might need to write
| `unsafe` inside of a well optimized data structure isn't
| a weakness of Rust, it's the entire point: you use it to
| encapsulate an unsafe core within a safe interface. The
| standard library is full of these things.
|
| Now if you're trying to do something that you can't
| button up into a safe abstraction for others to use, then
| that's a different story.
| mr_00ff00 wrote:
| If unsafe means "safe but the compiler cannot verify" then
| I guess just consider .cpp to mean "safe but the compiler
| cannot verify" and we have suddenly made C++ memory safe
| ammar2 wrote:
| Sure but you're missing the
|
| > so long as you are diligent about checking invariants
|
| part. Could you go through and check all the parts of a
| huge C++ codebase to make sure invariants are held as
| opposed to a few hundred lines of unsafe Rust code?
| mr_00ff00 wrote:
| Sure, but I think the point here is the degree.
|
| Presumably if it takes a lot of unsafe rust lines to
| build something, it won't matter if it's 30% safe or
| whatever.
|
| I just see the point of "unsafe is fine" a lot when the
| whole point of rust is that memory safety issues are
| never worth the cost.
| ammar2 wrote:
| Right, I guess the question is what will that proportion
| be when Rust is used for things like operating systems
| and web browsers. 30% would be untenable but a few
| hundred/thousand lines of unsafe code is fairly easy to
| put under a microscope.
|
| For some current day research into this, there is the
| paper "How Do Programmers Use Unsafe Rust?"[1] which I'll
| drop a quote from here:
|
| > The majority of crates (76.4%) contain no unsafe
| features at all. Even in most crates that do contain
| unsafe blocks or functions, only a small fraction of the
| code is unsafe: for 92.3% of all crates, the unsafe
| statement ratio is at most 10%, i.e., up to 10% of the
| codebase consists of unsafe blocks and unsafe functions
|
| That paper is definitely worth reading and goes into why
| programmers use unsafe. e.g 5% of the crates at that time
| were using it to perform FFI.
|
| In writing "RUDRA: Finding Memory Safety Bugs in Rust at
| the Ecosystem Scale" [2], I recreated this data and year-
| by-year the % of crates using unsafe is going down. And
| for what it's worth, crates are probably a bad data-set
| for this. crates tend to be libraries which are exactly
| where we would expect to find unsafe code encapsulated to
| be used safely. There's also plenty of experimental and
| hobby crates. A large dataset of actual binaries would be
| way more interesting to look at.
|
| [1] https://dl.acm.org/doi/10.1145/3428204
|
| [2] https://taesoo.kim/pubs/2021/bae:rudra.pdf
| mr_00ff00 wrote:
| Ahh that is quite interesting, I'll check those links out
| jjnoakes wrote:
| Sure, and if a typical Rust program that I write has no
| unsafe in it directly, and 5% of its dependencies' code
| have unsafe in them, that's also the same as writing a
| program in the "not c++" language directly, and using
| "not c++" dependencies for all but 5% of the dependency
| code.
|
| Seems like a silly analogy to me, though.
| mr_00ff00 wrote:
| Right but it's that 5% the origin comment is talking
| about. The times when rust has to use unsafe for the type
| of program.
| Ygg2 wrote:
| It's not what unsafe means. Unsafe means this might cause
| UB for some invocations (accessing raw pointers, calling
| into another language, etc.). Safe means it will not
| cause UB for any invocations (it may panic or abort).
| mr_00ff00 wrote:
| I would really love a definitive answer on whether the borrow
| checker and rust's rules do really limit optimizations and
| such.
|
| It seems like I see this opinion often and every time there are
| tons of people on both sides who seem sure they are correct.
|
| What are the limitations for optimization? Does unsafe rust
| really force those?
| amelius wrote:
| Difficult to answer.
|
| However, what you can say is that the borrow-checker works
| like a straight-jacket for the programmer, making them less
| capable to focus on other things like performance issues,
| high-level data leaks (e.g. a map that is filled with values
| without removing them eventually), or high-level safety
| issues.
| steveklabnik wrote:
| You can also say that the borrow checker works like a
| helpful editor, double checking your work, so that you can
| focus on the important details of performance issues,
| safety issues, and such, without needing to waste brain
| power on the low-level details.
| amelius wrote:
| This would be true if code using the borrow checker was
| easier to read than to write.
| SubjectToChange wrote:
| I think it's generally accepted that writing code is
| nearly universally easier than reading code, in any
| language. That aside, getting a mechanical check on
| memory safety for the price of some extra language
| verbosity is obviously worth it IMO.
|
| By the same token, it is common to see criticisms of the
| complexity of templates in C++, but templates are the
| cornerstone of "Modern C++" and many libraries could not
| exist without them.
| steveklabnik wrote:
| The point is that the compiler helps you "read" it. This
| takes mental effort off of you.
|
| I agree that not everyone thinks this is true, but this
| is my experience. I do not relate to the compiler as a
| straight jacket. I relate to it as a helpful assistant.
| jjnoakes wrote:
| This is my experience as well. I find it much easier to
| work faster when the compiler is helping me, and I don't
| consider it a "straitjacket" at all.
| bluGill wrote:
| There can be no answer. Research is ongoing, smart people are
| actively trying to make optimizer better, so even if I gave a
| 100% correct answer now (which would be pages long), a new
| commit 1 minute latter will change the rules. Sometimes
| someone discovers what we thought was safe isn't safe in some
| obscure case and so we are forced to no longer apply some
| optimization. sometimes optimization is a compromise and we
| decide that the using a couple extra CPU cycles is worth it
| because of some other gain (a CPU cycle is often impossible
| to measure in the real world as things like caches tend to
| dominate benchmarks, so you can make this comprise many times
| when suddenly the total adds up to something you can
| measure.).
|
| The short answer for those who don't want details: it is
| unlikely you can measure a difference in real world code
| assuming good clean code with the right algorithm.
| verdagon wrote:
| I'd say it mostly applies to manual optimization, when we're
| restructuring our program.
|
| If the situation calls for a B-tree, the borrow checker loves
| that. If the situation calls for some sort of intrusive or
| self-referential data structure (like in
| https://lwn.net/Articles/907876/), then you might have to
| retreat to a different data structure which could incur more
| bounds checking, hasher costs, or expansion costs.
|
| It's probably not worth worrying about most the time, unless
| you're in a _very_ performance-sensitive situation.
| apendleton wrote:
| Without directly answering your question, it's worth noting
| that there are also additional optimizations made available
| by Rust that are not easily accessible in C/C++ (mostly
| around stronger guarantees the Rust compiler is able to make
| about aliasing).
| steveklabnik wrote:
| The question is far too broad, and contextual. You're never
| going to get an answer to that question.
|
| Sometimes, the rules add more optimization potential. (like
| how restrict technically exists in C but is on every (okay
| _almost every_ ) reference in Rust) Sometimes, the rules let
| you be more confident that a trickier and faster design will
| be maintainable over time, so even if it is possible without
| these rules, you may not be able to do that in practice.
| (Stylo)
|
| Sometimes, they may result in slower things. Maybe while you
| _could_ use Rust 's type system to help you with a design,
| it's too tough for you, or simply not worth the effort, so
| you make a copy instead of using a reference. Maybe the
| compiler isn't fantastic at compiling away an abstraction,
| and you end up with slower code than you otherwise would.
|
| And that's before you get into complexities like "I see
| Rc<RefCell<T>> all the time in Rust code" "that doesn't make
| sense, I never see that pattern in code".
| coliveira wrote:
| The reason why safety in C++ is difficult to achieve is due to
| the memory model used by C and C++. The memory model is a flat
| space provided by the OS that can be addressed by pointers. In
| this sense, C++ is similar to assembly code. A language like
| Java, on the other hand, assumes a different model where you can
| only access objects with well defined behavior. To change this,
| one needs to disallow the use of native pointers in C++ or make
| them less powerful, like Java did.
| [deleted]
| josefx wrote:
| > The memory model is a flat space provided by the OS that can
| be addressed by pointers
|
| From what I understand this is not true. Pointers cease to be
| valid the moment you try to leave a single allocation. You get
| to play around within a single continuous allocation and one
| past the end, everything further out is playing with fire.
|
| Even comparing the "addresses" of two separate allocations is
| undefined if done with "<" . The comparison function std::less
| is basically magic to get well defined behavior out of a
| language that doesn't guarantee it.
|
| > C++ is similar to assembly code
|
| Only if you use a compiler that does not optimize anything.
| LoganDark wrote:
| > Pointers cease to be valid the moment you try to leave a
| single allocation.
|
| For the other readers who might not know what this is
| referring to, it's pointer provenance. For an introduction to
| the topic, I always recommend Ralf Jung's blog series,
| "Pointers Are Complicated":
|
| https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
| shadowgovt wrote:
| It is, for all practical purposes, a flat space in the sense
| that for bare pointers, operator++ is defined (increments to
| the next whatever, defined based on type of pointer).
|
| There is no operator++ equivalent in Java to apply to object
| references (unless you go unsafe); you can't immediately
| shoot yourself in the foot without the compiler noticing by
| asking for "the next object after this one" when no such
| thing exists.
|
| (handwave a bit: of course, you can ask for an object past
| the last object in any container. That's (a) not the same
| thing and (b) results in an immediate runtime error in Java,
| instead of undefined behavior)
| antonvs wrote:
| > everything further out is playing with fire.
|
| That's the point. C and C++ don't prevent you from playing
| with that for. Memory-safe language do.
| kimixa wrote:
| One issue is that the memory model _isn 't_ just a flat space
| that can be addresses by any pointer value - it may look
| similar to one if your compiler and OS let you, but doing
| things like accessing memory allocated as a different type or
| outside (an array of) objects is invalid, and the compiler is
| perfectly allowed by the standard to assume that never happens
| and happily "optimize" everything that may be a result of that
| away.
|
| A lot of bugs have been caused by programmers assuming any
| access to the 'linear address space' is fine, but that has
| never been reliable as it's not allowed by the standard. The
| worse thing is when it looks like it works for a while, but
| you're relying on stuff not allowed by the standard so may
| change at any time (like a compiler version or option change,
| or even a change to a different part of the code that happens
| to tickle the compiler's analysis stages a slightly different
| way). See the "Time traveling NULL-check removal" - as the
| compiler "knows" that no pointer can ever have the value of
| NULL during deference, any path that does that can be
| completely removed - even if there's something like a NULL
| check and a logging output before said deference, if compiler
| decides that deference will eventually happen in that path
| unconditionally, that path and logging _before_ the deference
| Can Never Happen so can be removed.
|
| Or type punning and pointer aliasing - objects are created with
| a type, and so the compiler Knows if you convert a pointer type
| to another type that isn't compatible with the first type, they
| somehow magically point to different memory, and all the
| assumptions that implies for the following code.
|
| A lot of these restrictions are pretty similar to things like
| Java have - the difference is that the JVM checks and flags
| violations and/or straight up disallows them when compiling -
| not just allowing the compiler to (silently) optimize based on
| those assumptions, and throwing the result at hardware to see
| what happens.
|
| There may be a few platform/compiler-specific behavior used to
| implement super low-level stuff like OSs, but that's platform-
| specific stuff outside the C++ (or C) spec itself.
| tsimionescu wrote:
| It depends what you mean exactly. The C and C++ official memory
| model is very much not a flat space, but exactly what you
| describe for Java - you can only (validly) access objects. For
| example, the operation x < y is only defined if x and y are
| both pointers into the same object or array of objects (or one
| past the end of an array of objects). Otherwise, the operation
| is entirely undefined in both the C and the C++ memory models.
| The following program has no defined C or C++ semantics, and
| neither the C nor the C++ standards can tell you anything about
| what it could do: int x = 0; int y = 0;
| if(&x < &y) { printf("???"); }
|
| Now of course the implementation of C and C++ actually assumes
| without checking that you only access objects and not raw
| memory, and thus will happily read raw memory directly.
| leni536 wrote:
| The result of the pointer comparison is unspecified, this is
| not undefined behavior in C++.
|
| I don't know about C.
| shadowgovt wrote:
| I really feel like it's a hell of a definitions dodge to say
| "This is what the model is" when no compiler implements
| constraints to require the user to treat the model like that
| (i.e. I can always just increment the pointer, or typecast it
| to numeric type, do math on it, and typecast back to a
| pointer, without having to pull any big red levers like using
| "unsafe" methods).
|
| If it's undefined but it compiles to _something_ , is it
| _really_ undefined, or is the definition merely not
| standardized?
| [deleted]
| epcoa wrote:
| Yes it's _really_ undefined. There is a distinction from
| "implementation defined behavior" which you seem to be
| confusing it with. You are practically wrong in your
| assumptions. Since undefined behavior is undefined the
| compiler is free to do anything with compilation, it may
| compile to something but you have no guarantee what that
| something is. And in real life this often actually bites
| you when the optimizer comes into play - modern optimizing
| compilers can and do optimize undefined behavior into noops
| or other weird stuff.
|
| Read this and don't come back on this topic until you
| clearly understand it:
| https://en.cppreference.com/w/cpp/language/ub
| shadowgovt wrote:
| No; this is a common misconception I see from people who
| swallowed the "it's allowed to format your hard drive and
| blow up your monitor" dodge vs. the electrical engineers
| who know where terminology like 'undefined behavior'
| originated in engineering. In practice, it tends to do
| something _subtle and usually right but probably wrong_
| for the simple, practical reason that if it did anything
| as obviously wrong as "format your hard drive and blow
| up your monitor," _someone would have tripped over it
| testing the compiler and changed the compiler._
|
| This is why I actually hate using this programming
| language, because when you hit undefined behavior (which
| the language makes trivial to do; incrementing a pointer
| past the allocated memory is a one-line operation that
| throws no errors) the end-result is usually _subtle,
| wrong, and hard to find later_ if it isn 't actually
| "close enough to right" because the compiler desperately
| tries to make a useful program because that's what
| compilers are for. Hell, if it formatted my hard drive
| and blew up my monitor, it'd be much easier to figure out
| where the problem was! Hand-waving this flaw in the
| design of the programming tool with "oh, it's undefined
| behavior; you should never have relied on that in the
| first place" when so many valid statements in the
| language _compile to_ undefined behavior, as if that is
| _good enough,_ is building a house on sand.
|
| ... and quite frankly, our industry is full of sand
| houses and we could stand to respond to the amount of
| undefined behavior in C++ by ceasing to build on that
| shaky foundation.
| adamnemecek wrote:
| You just need an unsafe keyword.
| ajross wrote:
| That's pretty much what the article says though. "Don't use
| traditional pointers" is a fairly trivial rule to enforce via
| static analysis, and constructs like unique_ptr are
| syntactically identical anyway.
|
| The bit that has me confused is that it's inventing a new term,
| "borrowing affine style", to describe a longstanding paradigm
| that has traditionally been called "RAII". Now, neither term is
| very clear, but surely it's better to use the existing
| confusing jargon instead of inventing new terms.
| bluGill wrote:
| borrowing affine style is more than RAII. borrowing affine
| style means that there are no pointers, and always one owner.
| in borrowing affine style your functions take a unique_ptr
| for everything, if the lifetime of the data needs to live
| beyond the function, then the function returns a unique_ptr
| of that data back. std::unique_ptr<foo>
| var; // init and use var var =
| SomeFunction(std::move(var)); // use var again.
|
| Note that while in SomeFunction you lose access to var, but
| since SomeFunction returns it again you don't really lose
| anything. Of course Somefunction can also return some other
| unique_ptr<foo> that isn't var and you can't control that.
|
| It is an interesting idea, though I'm not sure if I like it
| for real world code or not.
| gpderetta wrote:
| The significant difference is a static guarantee of no reuse
| after move, hence the 'affine' qualifier (which is not new).
| gavinray wrote:
| See also:
|
| Thomas Neumann's current proposal for memory safe C++ using
| dependency tracking:
|
| - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...
|
| Google's proposal for memory safety using Rust-like lifetime
| analysis:
|
| - https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/...
|
| - https://github.com/google/crubit/tree/main/lifetime_analysis
| pjmlp wrote:
| And Microsoft' work on Visual C++ lifetime checker and SAL, as
| well.
|
| It will never be perfect, but every little improvement helps.
| Voultapher wrote:
| > It will never be perfect, but every little improvement
| helps.
|
| Or it might convince people to stay longer on a plane with a
| provably [0] terrible safety record.
|
| [0] https://alexgaynor.net/2020/may/27/science-on-memory-
| unsafet...
| pjmlp wrote:
| To put matters into perspective, Rust reference
| implementations depend on C++ toolchains.
|
| Same applies to all major Ada, Java, .NET, Swift, Ocaml and
| Haskell implementations. And any GPGPU toolchain.
|
| Which kind of shows it isn't going anywhere and those
| planes have to be improved no matter what.
| SubjectToChange wrote:
| As an addendum, the same goes for many C toolchains.
| Anything requiring GCC 4.8 or later is depending on a C++
| compiler. And projects like LLVM's libc, Fuchsia's Zircon
| kernel, the bareflank hypervisor, etc, demonstrate that
| C++ really can be used anywhere C is used.
|
| C++ is the new C in the sense that it's the language
| everything else is built on and I expect it will be even
| _more difficult_ to displace than C. For instance, the
| complexity of C++ makes it next to impossible to
| incrementally rewrite in another language, simply writing
| a production quality C++ implementation is a gargantuan
| investment so a superset language is questionable, and
| the C++ community is committed to evolving and improving
| their language whereas C has largely ossified. Perhaps C
| will outlive everyone reading this thread, but C++ will
| outlive C.
| Voultapher wrote:
| I agree these planes are important and deserve care. At
| the same time pretty much all suggestions on how to
| meaningfully improve the safety of those planes boil down
| to successor languages Cpp2, Carbon etc. or require some
| other complex manual rewrite of components of said plane.
| There is an argument to be made for having good out-of-
| the-box interoperability, however even in some of the
| most complex and important code-bases in existence,
| namely browsers Firefox and Chrome, have demonstrated
| that you can do that part replacement in Rust. I'm not
| saying there is no other way. But these suggested and yet
| unproven improvements to C++ will not automatically make
| those planes safer. They will require replacing parts
| with new code, and if we are writing new code there is a
| serious question we should ask ourselves, building on
| what foundation do we want to improve those "planes".
| pjmlp wrote:
| Rust in Firefox is a very tiny portion of it and now they
| are using some WASM sandbox tricks, because they aren't
| going to rewrite everything in Rust, given the effort.
|
| Chrome only now started to consider to allow adding Rust,
| and it is baby steps, not coming close to V8, graphics
| engine and such.
| gavinray wrote:
| The second someone makes a successor language that
| seamlessly/directly interops with C++ _AND_ has the level
| of build/IDE tooling that C++/Rust have, I'm on board.
|
| The closest thing right now is Sean Baxter's "Circle"
| compiler in "Carbon" mode IMO:
|
| https://github.com/seanbaxter/circle/blob/master/new-
| circle/...
|
| Unfortunately, Circle is closed-source and there's no LSP
| or other tooling to make the authoring experience nice.
| pjmlp wrote:
| I also see Circle as the most promisor C++ wannabe, from
| all the contenders, and it being closed-source, once upon
| a time all major compilers were, so lets see.
| freeone3000 wrote:
| Rust has been bootstrapped for nearly a decade. The rust
| reference toolchain is built in rust.
| detaro wrote:
| if you pretend that LLVM and friends are not part of the
| toolchain
| pjmlp wrote:
| So no need for LLVM and GCC, Great news!
|
| Where can we download it?
| Voultapher wrote:
| Assuming you are serious, there is https://github.com/byt
| ecodealliance/wasmtime/tree/main/crane... which is
| written in Rust and is targeted to become the default
| debug backend in rustc. LLVM has accumulated _a lot_ of
| optimizations contributed by various groups and people
| over more than a decade. It 's hard to catch up to that
| by virtue of resource limits.
| mr_00ff00 wrote:
| Is there a reason to replace LLVM? Are there still memory
| bugs that are popping up and causing issues?
| steveklabnik wrote:
| https://github.com/bytecodealliance/wasmtime/blob/main/cr
| ane...
| pjmlp wrote:
| I was being sarcastic, when Cranelift becomes the
| official reference implementation then I shut up.
| [deleted]
| Voultapher wrote:
| > Tracing GC is the simplest model for the user, and helps with
| time management and development velocity, two very important
| aspects of software engineering.
|
| > Borrow checking is very fast, and helps avoid data races.
|
| One thing many people seem to assume is that not having to care
| about memory means you can program faster and get to your goal
| faster. As the author here seems to do. However as it turns out,
| if your program is more complex than a ~100-1000 lines of code,
| explaining in a explicit way who owns what and who gets to change
| state when, is a very useful way to avoid bugs.
|
| Saoirse Shipwreckt aka withoutboats mentioned this a while ago in
| https://without.boats/hire-me/
|
| > Rust works because it enables users to write in an imperative
| programming style, which is the mainstream style of programming
| that most users are familiar with, while avoiding to an
| impressive degree the kinds of bugs that imperative programming
| is notorious for. As I said once, pure functional programming is
| an ingenious trick to show you can code without mutation, but
| Rust is an even cleverer trick to show you can just have
| mutation.
|
| and later follows up on this in
| https://without.boats/blog/revisiting-a-smaller-rust/
|
| > I still think this is Rust's "secret sauce" and it does mean
| what I said: the language would have to have ownership and
| borrowing. But what I've realized since is that there's a very
| important distinction between the cases in which users want these
| semantics and the cases where they largely get in the way. This
| distinction is between types which represent resources and types
| which represent data.
| 634636346 wrote:
| [flagged]
| cmrdporcupine wrote:
| Two things, full time Rust dev here:
|
| a) Rust's borrow checker is good and its type system good, but
| IMHO it's not really doing what you say it is as well as you're
| implying: _" explaining in an explicit way who owns what"_;
| While ownership _is_ explicit and static (apart from RefCell
| and friends), description of that ownership is scattered all
| over, program state flows are _not_ modelled in the type system
| at all, and on the whole Rust is far from having being a kind
| of explicit "I can reason about the whole program" declaritive
| system with the kind of clarity you're implying. Or maybe I'm
| taking your claims too strongly.
|
| b) Rust's borrow checker is good. But it's not perfect and
| fails to pass things that in fact should be legal borrows. In
| particular there's edge cases around where things are grabbed
| in if/let/else or matches, like this fail (from my own code):
| { let local_version = self.seek_local(tx);
| if local_version.is_some() { return match
| &local_version.unwrap().value {
| Entry::Value(v) => Some(v), // reference to value
| Entry::Tombstone => None, };
| } } // note that 'local' has
| gone out of scope here and so self should not be borrowed
|
| ... code later in func complains 'self' is still borrowed,
|
| but the same thing done this way (but less efficiently) passes:
| if self.seek_local(tx).is_some() { let
| local_version = self.seek_local(tx).unwrap();
| return match &local_version.value {
| Entry::Value(v) => Some(v),
| Entry::Tombstone => None, }; }
|
| ... same other code that uses 'self' compiles fine
|
| In neither case is the 'local_version' being used outside of
| the lexical scope, and 'self' cannot be borrowed in either
| case, but the borrow checker is convinced in version #1 that
| they are and that code below that lexical scope cannot proceed
| because 'self' is borrowed. They're logically basically
| equivalent from a program flow and state mgmt, but the second
| passes while the first fails. Rust 1.7.0 stable.
|
| (Before you ask, I did have if/let to take apart local_version
| instead of using unwrap, and the compiler griped about that
| even more)
|
| Having the burden of how to fix that fall on the programmer
| sucks. This is all a step in the right direction, but I run
| into this kind of thing here and there and I shouldn't have to.
| ziml77 wrote:
| The limitations of the borrow checker when it comes to
| borrowing self are annoying. I've had cases where I just said
| "screw it" and copied the body of a function inline in the 1
| or 2 places it was being called just to make the borrow
| checker happy.
| liuliu wrote:
| I don't write Rust.
|
| But here is what you said and what the author said don't
| conflict with each other, and it has been on my mind for a
| while.
|
| People who write similar code, or work on things for decades
| usually don't really think through what "sketch out some code"
| looks like. They spend most of their time on refactoring things
| that has clear use-cases, but not well-defined API boundaries
| within the component, or between components. So ownerships,
| nullability checks, data race checks are all comes very
| naturally as a starter.
|
| But there are other side of the world, where people constantly
| sketching out something, for things like creative arts, high-
| level game logic, data analysis, machine learning etc. Now
| putting yourself in that position, the syntax noises are
| actively in the way of this type of programming. Ownerships,
| even nullability checks are not helpful if you just want to
| have partial code running and checking if it draws part of the
| graph. This is a world Python excels, and people constantly
| complaining about why this piece of Python code doesn't have
| type-annotation.
|
| We may never be at peace between these two worlds, and this
| manifest itself somewhat into the "two-language problem". But
| that to me, is when someone mean "development velocity is
| faster".
| marcosdumay wrote:
| > Ownerships, even nullability checks are not helpful
|
| Memory management does get on the way. But you are wrong
| about algebraic data types, they will help you sketch
| something.
|
| Ideally, if you don't know what you want, you will want
| extendable1 algebraic types, more like Type Script than Rust,
| but what you call "nullability check" is a benefit since the
| beginning.
|
| 1 - Where you can say "here comes a record with those
| columns" instead of "here comes this record". You _can_ write
| this in Rust, but it 's easier to simply completely define
| everything.
| convolvatron wrote:
| I really love parts of rust and kinda hate other parts.
|
| but this is what really ruins it for me. I want to play. I
| want to knock something together and work with it and see
| what kind of shape it is.
|
| rust demands that I cross every last t before I can run it at
| all. which is great if you already have a crystal notion of
| what you are building
| cmrdporcupine wrote:
| This is definitely true, but I also don't know what a
| reasonable alternative is at this point for systems dev
| (aka places where a GC is a Bad Idea). I wouldn't unleash C
| or C++ onto a new project like that? I'd just feel icky.
| And Zig's type system IMHO isn't good enough, I'd really
| miss pattern matching for one.
|
| I _do_ think many people are using Rust in the Wrong
| Places(tm). It seems like torture to me to be applying it
| for general application development (though because I
| basically now "think" in it, I can see I myself would be
| tempted to do so).
|
| And for things with complicated ownership graphs or nested
| interrelated data? It's just... no. Dear god, _Iterator_ in
| Rust is an ownership and type traits nightmare, let alone
| anything more complicated
|
| So I think people should just use a hybrid approach and
| keep Rust where it belongs down in the guts and use
| something higher level and garbage collected higher up.
|
| Here's another thing about Rust that's driving me batty: it
| is nominally positioned as a "systems" programming
| language, but key things that would make it more useful
| there are being neglected, while things that I would
| consider webdev/server programming aspects are being highly
| emphasized.
|
| Examples I would give that have driven _me_ nuts recently:
| allocator_api / pluggable per-object allocators ... stuck
| in nightly since _2016_ (!). Full set of SIMD intrinsics
| and broader SIMD support generally ... also stuck.
| const_generics_expr ... still not there.
|
| Meanwhile async this and async that and things more useful
| to the microservice crowd proliferate and prosper
| Yoric wrote:
| I think I agree with most of what you write, but note
| that async has lots of applications beyond microservices.
| In particular, writing anything that uses the network
| (e.g. a web browser), which definitely feels system-y to
| me.
| sroussey wrote:
| This is the nice thing about TypeScript--you can type want
| you want. As you iterate you can either ramp or down your
| type checking. This is outside the realm of memory
| management, of course.
|
| And new to JS/TS land is the separation of pure data
| structures from resources. Something a sibling comment or
| brought up.
| snek_case wrote:
| > rust demands that I cross every last t before I can run
| it at all.
|
| It's worse than that IMO. Rust makes it very
| awkward/impractical to have cyclic data structures, which
| are necessary to write a lot of useful programs. The Rust
| fans will quickly jump in and tell you that if you need
| cycles, your program is wrong and you're just not a good
| enough programmer, but Maybe it's just that the Rust borrow
| checker is too limited and primitive, and it really just
| gets in the way sometimes.
|
| Some of the restrictions of the Rust borrow checker and
| type system are arbitrary. They're there because Rust
| currently can't do better. They're not the gospel, they
| aren't necessarily inherent property that must always be
| satisfied for a program to be bug free. The Rust notion of
| safety is not an absolute. It's a compromise, and a really
| annoying, tiresome drain on motivation and productivity
| sometimes.
| nsajko wrote:
| > currently can't do better
|
| The limitations are an inherent consequence of basic
| tenets of Rust's design. Rust wouldn't be Rust anymore if
| you fixed them.
|
| > Some of the restrictions of the Rust borrow checker and
| type system are arbitrary. They're there because Rust
| currently can't do better. They're not the gospel, they
| aren't necessarily inherent property that must always be
| satisfied for a program to be bug free. The Rust notion
| of safety is not an absolute. It's a compromise, and a
| really annoying, tiresome drain on motivation and
| productivity sometimes.
|
| Yeah, but this actually seems consistent with the
| philosophy behind Rust: to take away the tools a
| programmer needs for creativity, so they couldn't do
| potentially costly mistakes, as applicable to big teams
| in huge corporations. Another commenter in this thread
| put it nicely: the borrow checker is a straitjacket for
| the programmer.
|
| It's not meant to foster creativity, it's meant to be
| safe for big business and novice employees.
| Yoric wrote:
| > It's not meant to foster creativity, it's meant to be
| safe for big business and novice employees.
|
| Interestingly, my experience is the opposite.
|
| I find that the "straightjacket" is extremely precious
| during refactorings - in particular, the type of
| refactorings that I perform constantly when I'm
| prototyping.
|
| Compared to this, I'm currently writing Python code, and
| every time I attempt a refactoring, I waste considerable
| amounts of time before I can test the interesting new
| codepath, because I end up breaking hundreds of other
| codepaths that get in the way and I need to go through
| the testsuite (and pray that it contains a sufficient
| number of tests) hundreds of time until the code is kinda
| stable.
|
| Which is not to say that Rust matches every scenario. We
| agree that it doesn't, by design. But I don't think that
| the scenarios you sketch out are the best representation
| of what Rust can/should be used for and can't/shouldn't
| be used for.
| yipyip wrote:
| [dead]
| ordu wrote:
| Cyclic data structures are implemented easily with
| unsafe. Like non-cyclical ones (Vec for example). The
| difficult part is to make a safe API to that. This
| difficulties are not of syntactic nature but design
| difficulties. You need to think through your use cases
| for such a struct and to devise an API that supports
| them.
|
| This is more difficult than C++ way "just do it". With
| C++ you will solve the same problems but on a case by
| case basis as they come into view. With Rust you need to
| solve these problems upfront or do a lot of refactoring
| later. There are upsides and downsides in both
| approaches, but it is clear that Rust is not good to
| sketch some code quickly to see how it will do.
|
| It is still possible to do it quickly with Rust in a C++
| way by leaking usafety everywhere and passing raw
| pointers, but I think it is still easier to do it with
| C++ which was designed for this style of coding.
| jackmott42 wrote:
| I would never tell you that you are wrong to have cyclic
| data structures. But there are reasonable workarounds
| like using handles into an array to do it, which of
| course re-creates some of the same problems as pointers,
| but not the worst ones, and is often a positive for
| performance on modern hardware due to improved data
| locality.
|
| Or you can use reference counted types and take a small
| performance hit.
|
| Or use unsafe and git gud.
| jcranmer wrote:
| The basic model of Rust is to move use-after-free from a
| dynamic, runtime check to a static, compile-time check.
| But to keep the static checks from being Turing-complete,
| you need to prohibit arbitrary cycles while something
| like a tree (or other boundable recursion) is doable. So
| Rust not being able to check cyclic data structures isn't
| a "Rust currently can't do better" situation, it's a
| "Rust just can't do better" situation.
|
| What Rust's intended solution for that is that you add in
| data structures that do the dynamic checking for you in
| those cases. But the Rust library doesn't provide
| anything here that's useful (RefCell is the closest
| alternative, and that's pretty close to a this-is-never-
| what-you-want datatype), which means your options are
| either to use integers, roll your own with unsafe, or try
| hard to rewrite your code to not use cycles (which is
| usually a euphemism for use integers anyways). The
| problem here, I think, is that there is a missing data
| structure helper that can sit in between integers and
| references, namely something akin to handles (with a
| corresponding allocator that allows concurrent
| creation/deletion of elements).
| cmrdporcupine wrote:
| _missing data structure helper_ -- didn 't you already
| just name-check that though, since that's basically
| RefCell .. or if you're willing to roll the dice...
| UnsafeCell (aka "trust me I know what I'm doing")?
| jcranmer wrote:
| What you essentially want for the user to not write any
| unsafe code is this kind of interface:
| trait Allocator { fn allocate<'a, T>(&'a self,
| init: T) -> Handle<'a, T>; fn deallocate<'a,
| T>(&'a self, handle: Handle<'a, T>); fn
| read<T>(&self, handle: Handle<'_, T>) -> impl Deref<T>;
| fn write<T>(&self, handle: Handle<'_, T>) -> impl
| DerefMut<T>; }
|
| &'a RefCell<T> is pretty close to a definition of
| Handle<'a, T>, except that Rust provides no
| implementations of allocate and deallocate that take a
| const instead of a mut reference for self. Trying to make
| an allocator that lets you safely deallocate something
| requires a completely different implementation of
| Handle<'a, T> than what RefCell can provide, and even if
| you're fine without deallocation, allocation with a const
| ref still requires unsafe to get the lifetime parameter
| right.
| db48x wrote:
| Yea, different languages for different purposes. Rust is
| for finished products, not so much for experimentation.
| When you want to play or experiment you should use Lisp.
| adamc wrote:
| That makes it expensive to move from experimentation to
| "fairly usable", though.
| db48x wrote:
| Your Lisp program will be entirely usable once you have
| experimented and found the right way to do it. Lisp
| compilers are really good, and they support gradual
| typing: you can write your program with no explicit type
| information, and then speed it up by adding type
| information in the hot spots. You can deploy that to
| production and it will serve you well.
|
| At some point your Lisp program will be mature, you will
| have implemented most of the features you know you will
| need, and you will know that any new features you add in
| the future will not alter the architecture. Once you
| understand the problem and have established the best
| architecture for the program, you can consider rewriting
| it in Rust. Lisp's GC does have a run-time cost, and you
| can measure it to figure out how much money you will save
| by eliminating it. If you will save more money than the
| cost of the rewrite, then go for it. Otherwise you can go
| on to work on something more cost-effective.
|
| Note that you might not need to rewrite the whole
| program; it might be more effective to rewrite the most
| performance-critical portion in Rust, and then call it
| from your existing Lisp program. This can give you the
| best of both worlds.
| jjnoakes wrote:
| > rust demands that I cross every last t before I can run
| it at all. which is great if you already have a crystal
| notion of what you are building
|
| Maybe I'm a weirdo, but I don't find this to be the case
| for me.
|
| When I'm knocking things together in Rust I use a ton of
| unwrap() and todo!() and panic!() so I can figure out what
| I'm really doing and what shape it needs to have.
|
| And then when I have a design solidified, I can easily go
| in and finish the todo!() code, remove the panic!() and
| unwrap() and use proper error types, etc.
| IshKebab wrote:
| In my experience _even in those "sketching" areas_ static
| types and strict checking is the better trade-off.
|
| I think the real criteria for "will static types and stricter
| checks help?" is "how long will this thing last for?".
|
| E.g. for a shell _REPL_ you definitely don 't want to have to
| write our types, but for a shell _script_ you definitely do.
|
| Something like using MATLAB for exploratory research is
| probably another decent example. Or maybe hackathon games.
|
| But for most games, data analysis, machine learning etc. then
| being stricter pays for itself almost immediately.
| Karrot_Kream wrote:
| In your framing there's a sort of implicit _downplaying_ of
| the frequency of exploratory work and an implicit
| _promotion_ of stricter work.
|
| > Something like using MATLAB for exploratory research is
| probably another decent example. Or maybe hackathon games.
| But for _most_ games, data analysis, machine learning etc.
| then being stricter pays for itself almost immediately.
|
| (Emphasis mine)
|
| This is where the viewpoints differ. Some people spend a
| lot more time on the exploratory aspect of coding. Others
| prefer seeing a program or a system to completion. It
| largely depends on what you work on and where your
| preferences lie.
|
| Years ago I wrote a script that grabs a bunch of stuff from
| the HN API, does some aggregation and processing, and makes
| a visualization out of them. I wrote it because the idea
| hit me on a whim while intoxicated, and I wrote the whole
| thing while intoxicated. The script works and I still use
| it frequently. I haven't made any changes to it because it
| just does what it needs to. It has no types. It's written
| decently because I've been coding for a long time but I was
| intoxicated when I wrote it. The important thing is _it 's
| still providing value_.
|
| There's a surprising amount of automation and glue code
| that doesn't need the correctness of a type system. I've
| written lots of stuff like this over the years that I use
| weekly, sometimes daily, that I've never had to revisit
| because they just work. I suspect it's a matter of personal
| preference how much time a person spends on that kind of
| work vs building out large, correct systems. I suspect
| there's a long tail of quality-of-life tooling that is
| simple and exploratory in nature much like large, strict
| systems are much bigger than most people expect at first
| blush because of how many cases they handle.
|
| I think trying to say that one is more common than the
| other without anything approaching the rigor of at least a
| computing survey is really just to use your gut to make
| generalizations. Which is what the strict vs loose typing
| online debates really are. A popularity contest of what
| kind of software people like to write given the forum the
| question is being discussed on.
| verdagon wrote:
| I would love a language (or C++ subset!) where we could get the
| benefits of that secret sauce, while mitigating or avoiding
| some of its downsides.
|
| Like Boats said, the borrow checker works really well with
| data, but not so well with resources. I'd also opine that it
| works well with data transformation but struggles with
| abstraction (both the good and bad kinds), works well with
| tree-shaped data but struggles with programs where the data has
| more intra-relationships (like GUIs and more complex games),
| and works well for imposing/upholding constraints but can
| struggle with prototyping and iterating.
|
| These are a nice tradeoff already, but if we can design some
| paradigms that can harness the benefits without its particular
| struggles, that would be pretty stellar.
|
| One promising meta-direction is to find ways to compose
| borrowing with mutable aliasing. Some promising approaches off
| the top of my head:
|
| * Vale-style "region borrowing" [0] layered on top of a more
| flexible mutably-aliasing model, either involving single-
| threaded RC (like in Nim) or generational references (like in
| Vale).
|
| * Forty2 [1] or Verona [2] isolation, which let us choose
| between arenas and GC for isolated subgraphs. Combining that
| with some annotations could be a real home run. I think Cone
| [3] was going in this direction for a while.
|
| * Val's simplified borrowing (mutable value semantics [4])
| combined with some form of mutable aliasing (like in the
| article!).
|
| * Rust does this with its Rc/RefCell, though it doesn't compose
| with the borrow checker and RAII as well as it could, IMO.
|
| [0] https://verdagon.dev/blog/zero-cost-borrowing-regions-
| part-1... (am author)
|
| [1] http://forty2.is/
|
| [2] https://github.com/microsoft/verona
|
| [3] https://cone.jondgoodwin.com/
|
| [4] https://www.jot.fm/issues/issue_2022_02/article2.pdf
| latenightcoding wrote:
| "Rule 4: When you want a raw pointer as a field, use an index or
| an ID instead."
|
| literally just woke up but: wouldn't it be simpler to use a
| pointer to a pointer, or am I missing something
| [deleted]
| corysama wrote:
| You might like: "Handles are the better pointers (2018)
| (floooh.github.io)"
|
| https://news.ycombinator.com/item?id=36419739
| floor_ wrote:
| Use memory arenas and never think about any of this again.
| spacechild1 wrote:
| How do arenas prevent out-of-bound access, double free or stale
| pointers?
| estebank wrote:
| Out of bound access is avoided because you ise handles that
| the arena has given you, creating an invalid handle is
| restricted. You avoid double free because of Rust's owbership
| semantics that make the arena itself reaponsible for
| "deallocation" (which is just blanking the value and letting
| Drop do its thing). You avoid stale pointers because every
| access is checked at runtime if you're using a generational
| arena.
| spacechild1 wrote:
| We are talking about C++ ;-)
| shadowgovt wrote:
| Sadly, untrue. Source: I use memory arenas, and it's still
| pretty trivial to copy (instead of reference) an object onto a
| stack and then try to save a pointer to that object. All you
| need is to leave out one `&` and the compiler won't tell you
| anything went wrong: it'll cheerfully let you retain a pointer
| to a stack-based object that is going to die because explicit
| lifetime analysis isn't a part of the language spec.
| LoganDark wrote:
| Absolutely love to see CHERI mentioned here <3
| nraynaud wrote:
| very nice array of ideas to open the debate for us mere mortals.
| [deleted]
| pizlonator wrote:
| You could also just isoheap according to type, where the type is
| whatever you come up with to make C++ casts sound. It could
| literally be C++ types or something looser (like if you want to
| say that bitcasting a int ptr to a float ptr is ok).
|
| Then you don't need any language changes to make UAF type safe.
| kbenson wrote:
| Methods such as these for C and C++ are interesting, and needed,
| but only solve a part of the problem.
|
| As others have noted before, they do little good because they're
| opt-in. I think there's a bit of nuance to that which needs to be
| explored though, as I think it's less a problem that the extra
| checks are opt in, and more a problem of how we use and
| categorize libraries.
|
| As long as we encourage dynamic and static library inclusion (and
| why wouldn't we, it's how we build upon the work of others),
| every language has a problem similar to how C and C++ are opt-in
| and you can't easily control the code you include or link. If you
| load openssl from Java or Rust or Go, you might have some benefit
| from a well defined API layer, but ultimately you are still
| beholden to the code openssl provides in their library.
|
| Just as one of the real benefits of Rust or Java or Go is not
| necessarily that the code is completely safe, but that weird
| unsafe behavior usually requires special escape hatches which are
| easier to audit, what we need are ways to categorize the code we
| include, no matter the language it comes from, with appropriate
| labels that denote how strong the safeguard guarantees it was
| compiled with are and of which type, so we can make easier and
| better informed decisions on what to include and how to audit it
| easily when we do.
|
| This applies to including something written in Rust as well. If
| someone is writing something in C++ and wants to include a
| library written in Rust, that it's written in Rust is only part
| of the picture. It's equally important to how often (as a total
| and as a percentage of code) the safety checks that language
| required (or that the developers opted into) where escaped in
| that library.
|
| If the choice is a Rust library with 95% of the code in unsafe
| blocks or a C++ library that opted into multiple different safety
| checker systems and has almost no escapes from those
| requirements, Rust is not providing any real safety benefits in
| that situation, is it? What we need is better information exposed
| at a higher level to developers about what they're opting into
| when they use third party code, because we can all control what
| safety mechanisms we use ourselves, so that's mostly a solved
| problem.
| jjnoakes wrote:
| I feel like a few languages are better than others in a related
| but not quite identical area:
|
| Languages like Java and Go, while they CAN escape to native
| libraries, have cultures that tend to avoid that kind of thing.
| At least, in my projects, I have quite an easy time using zero
| native dependencies with those languages (except for the
| underlying kernel of course), and so I feel like there is a
| much lower chance of escape-hatch issues sneaking in.
|
| They aren't built on a foundation of legacy C and C++ libraries
| - not even the crypto - and I find that to be an advantage.
| verdagon wrote:
| This is a great point, and one that doesn't get enough
| attention. The article talks about using a static analysis
| tool, but usage of that tool is indeed opt-in, like you say.
|
| I suspect a language could mitigate this with the ability to
| sandbox a library's code. That could be pretty slow though, but
| we could compile it to wasm and then use wasm2c to convert it
| back into native code. I wrote a bit about this idea in [0],
| but I'd love to see someone make this work for C++.
|
| [0] https://verdagon.dev/blog/fearless-ffi
| jackmott42 wrote:
| If you were starting a new project you could put lints in place
| to make these things enforced. But at some point you have all
| these lints and customizations in place, and you can't use old
| or 3rd party C++ code any more because of them, so you begin to
| ask, why not just use a new language where this stuff isn't
| pasted together with glue and bailing wire?
| kbenson wrote:
| My point is really not about the code you write yourself, but
| the code you need to include in your project. Rare is the
| professional programmer that always gets to finish their
| project using only code they wrote themselves, and for many
| projects that's _highly inadvisable_ (don 't roll your own
| crypt unless you have a very good reason).
|
| So, given that at times we will have to use external
| libraries, and given that even very safe languages often have
| escape hatches meaning you can't be _sure_ the code of one
| language has more constraints than another, it would be great
| to have other indicators than the language it was written in
| that indicated what safety checks it uses.
|
| If next year you're writing a new program in a language that
| hasn't even been invented as of now, and is viewed as safer
| than every language out today, what does that actually get
| you if one of your constraints is that you need to include
| and use openssl or one of a few forks for compatibility
| reasons? Wouldn't you rather be able to look at the available
| options and see that come opt into specific safety
| constraints, and have been good about not them circumventing
| them, and do so _extremely easily_? Network effects and
| existing known projects seem to have an inordinate amount of
| staying power, so we might as well deal with that as a fact.
|
| The world is a messy place, but the more information we have
| the better our chances of making order out of it, even if
| temporarily.
___________________________________________________________________
(page generated 2023-06-23 23:00 UTC)