[HN Gopher] Linus Torvalds on Rust support in kernel
___________________________________________________________________
Linus Torvalds on Rust support in kernel
Author : EvgeniyZh
Score : 569 points
Date : 2021-04-16 09:59 UTC (12 hours ago)
(HTM) web link (lkml.org)
(TXT) w3m dump (lkml.org)
| [deleted]
| bashinator wrote:
| This seems to be a non-issue? Torvalds has valid concerns, the
| patch submitter acknowledges those concerns and describes how
| they can and will be fixed before anything is merged.
| davidgerard wrote:
| It's interesting tech stuff that's relevant to HN, it certainly
| caught my interest. There doesn't have to be a conflict for
| something to be a good HN submission.
| diegocg wrote:
| We are lucky that this is a link to the mailing list, and you
| can read the answers. Tech "journalists" read all mails from
| Linus and try to create click-baity articles from them,
| completely ignoring the context.
| ludamad wrote:
| Linux Declares War On Rust Community
| qzw wrote:
| Don't forget to use the picture of Linus giving the middle
| finger along with that headline.
| [deleted]
| leugim wrote:
| Is this excitement in Linus-lang?
| 1_player wrote:
| More like cautious optimism on his part.
| teknopaul wrote:
| Coming from C its a striking missing feature of rust. I still
| find it unnatural, and most of the code I write is Java,
| where the "what if" is replaced by allocation of project time
| for gc tuning. Rust's "don't think about allocs" when
| everything you do is dominated by memory ownership is wierd.
| Rust eradicates a whole class of memory related issues except
| "not enough". Panic is not necessarily safe/secure, that proc
| might be doing something important. Nofixilla take this
| approach too much, the assumption that no action is secure,
| what dies might be your burgler alarm. Fortunatly Linus
| prioritises working. I feel like his input will help the rust
| community.
| icy wrote:
| Hardly. He's reserving judgement.
| GordonS wrote:
| Hmm, I would have thought that Rust produces standard ELF
| binaries that will run on Linux as-is.
|
| Could someone ELI5 why the kernel needs changes to support Rust?
| pta2002 wrote:
| This is for writing kernel modules, which ideally need to be
| integrated in the kernel build system.
| ajb wrote:
| Because this is about allowing parts of the kernel _itself_ to
| be written in rust.
| [deleted]
| dijit wrote:
| Mostly this is surrounding (as others have mentioned) writing
| kernel components, such as drivers, in Rust.
|
| Context on the post Linus says is: "If rust cannot report an
| error without aborting when it runs out of memory, then we
| can't use it". I have never written any rust code which does
| not abort when it runs out of memory, but I have never written
| for hardware (only with userland in mind) and my knowledge
| isn't that great anyway, so I'm not sure if it's possible or
| not.
| GordonS wrote:
| Ah, I didn't realise it was about writing bits of the kernel
| _itself_ in Rust, now it makes sense, thanks!
| tgdn wrote:
| I've got "Body for this message unavailable"
| busymom0 wrote:
| Try this archived version:
|
| https://archive.is/VpcHT
| ilaksh wrote:
| They will probably solve the panic issue. However, it is strange
| to me that no one is explicitly mentioning the fundamental
| underlying tension here between C and Rust. Rust is a reaction to
| the lack of safety inherent in languages like C.
|
| Rust is an opportunity to really evolve operating systems
| forward. That's why projects like Redox OS have more promise in
| the long term for me than dragging the Linux community to Rust.
| [deleted]
| alexnewman wrote:
| How can he just be discovering this . Does he know rust at all?
| zibzab wrote:
| Maybe he likes to get his facts right before presenting it to
| the world?
|
| Edit: according to other threads, this is part of an ongoing
| discussion on Rust feasibility for the kernel.
| teknopaul wrote:
| shock someone who doesn't know rust. :)
|
| n.b. He states he does not.
| gpm wrote:
| This also isn't a feature of rust, thankfully, just of the
| default libraries that the rust-kernel people are still using.
| coldtea wrote:
| Why would he "know rust"?
|
| Did it become a requirement for being a dev that he somehow
| missed?
| aetherspawn wrote:
| It's a little haphazard to say that Rust is a requirement for
| being a [kernel] dev, when it is, as of yet, not actually
| merged into the main kernel due to conceptual flaws. Nor is
| it actually a major component in any kernel that makes up >1%
| market share AFAIK.
| coldtea wrote:
| That's my point too...
| darthrupert wrote:
| Does he even HN?!
| jedimastert wrote:
| Probably not. Why would he?
| gigatexal wrote:
| Linus's proposal for Result<T, E> seems a lot like the Golang
| approach to things where you almost always return a value and an
| error.
|
| I'm not a rust developer and a very weak go dev so take the above
| as you will.
| coldtea wrote:
| The Result<T, E> is actually Rust's aproach, which is a formal
| (better) version of Golang's approach.
|
| Better as in the result is standardized, and not ad-hoc like in
| Golang, and must be handled to use the value.
| pornel wrote:
| In Rust it's guaranteed that you never get both Ok and Err
| values at the same time (Result is a tagged union, not a
| tuple).
|
| In golang both are independent, and it's unclear whether you
| can or should use the ok value when err != nil. There are
| some interfaces that actually return both at the same time.
| coldtea wrote:
| > _In Rust it 's guaranteed that you never get both Ok and
| Err values at the same time (Result is a tagged union, not
| a tuple)._
|
| Yeah, that's the standardized part -- it's a tagged union.
| In Goland the second value might not even be an error, it's
| just a convention using the variadic return capabilities...
| joseluisq wrote:
| Yeah, I see `Result<T, E>` more solid in Rust than on-purpose
| Go error approach as you already mentioned and even more
| concise than `Exceptions` in other languages.
|
| BTW there are already interesting discussions around the
| topic if someone wants to have a look.
|
| https://news.ycombinator.com/item?id=25254737
| scoutt wrote:
| Aside from Linus' reaction, there are some really interesting
| pearls in that thread, for example:
|
| Regarding code style[1]:
|
| _> The more you make it look like (Kernel) C, the easier it is
| for us C people to actually read. My eyes have been reading C for
| almost 30 years by now, they have a lexer built in the optical
| nerve; reading something that looks vaguely like C but is
| definitely not C is an utterly painful experience.
|
| > You're asking to join us, not the other way around. I'm fine in
| a world without Rust._
|
| CoC was already brought into battle[2]:
|
| _> > I could be mistaken but you seem angry. Perhaps it wouldn't
| be a bad idea to read your own code of conduct, I don't think you
| need a browser for that either.
|
| > Welcome to LKML. CoC does not forbid human emotions just yet.
| Deal with it._
|
| These ([3] [4]) messages have an interesting perspective about
| maintenance.
|
| _> I 'm sure about one thing, the C bugs we have today will be
| fixable in 20 years. I'm not even sure the Rust code we'll merge
| today will still be compilable in 10 years nor will support the
| relevant architectures available by then, and probably this code
| will have to be rewritten in C to become maintained again._
|
| Linus wants to see a real, working kernel driver instead of
| Android Binder[5]:
|
| _> Would there be some kind of real driver or something that
| people could use as a example of a real piece of code that
| actually does something meaningful?_
|
| [1] https://lkml.org/lkml/2021/4/16/118
|
| [2] https://lkml.org/lkml/2021/4/16/143
|
| [3] https://lkml.org/lkml/2021/4/16/181
|
| [4] https://lkml.org/lkml/2021/4/16/283
|
| [5] https://lkml.org/lkml/2021/4/14/1091
| finnthehuman wrote:
| >>> I could be mistaken but you seem angry. Perhaps it wouldn't
| be a bad idea to read your own code of conduct
|
| A passive-aggressive "you mad bro?" followed by namechecking
| the coc, all in service of doubling down on antagonizing
| someone over their choice in workflow? Good grief.
| Blikkentrekker wrote:
| "CoC" seems to suffer from the same problem that "bad cops"
| and "zero tolerance" politicians seem to suffer from, so
| that's why it's often brought up.
|
| It seems to be a common thing that those that demand the
| strictest morality rules also seem to have the most
| aggression problems and often overstep their own rules, but
| typically have an excuse ready why in their case it's
| different.
|
| Though, perhaps it simply stands out more if it's one to
| chant "code of conduct" or "zero tolerance", but as far as
| statistics in the Dutch parliament goes, it's often pointed
| out that all the parties that are in favor of lighter
| punishments and rehabilitation tend to be spot free, whereas
| politicians of parties that advocate harsh punishments and
| zero tolerance tend to very often have past criminal records
| themselves.
| hlpq wrote:
| Not sure why this is down-voted. At the very least the
| people who invoke the CoC are among the most pushy and
| dominant ones in projects, even when they manage to cloak
| the dominance so as not to be _perceived to be aggressive_.
|
| U.S. people are better at that game, since it is expected
| and rewarded in work life. This is also why they push for
| CoCs, which have nothing to do with manners or niceness,
| but are just another power tool.
|
| I have never seen a genuinely nice person push for a CoC.
| Not once.
| Blikkentrekker wrote:
| > _Not sure why this is down-voted._
|
| What I just considered due to your post is that it's
| quite likely that those who support C.o.C.s are also more
| likely to cast votes.
|
| I think it quite likely that those with a libertarian
| life philosophy are far less likely to cast votes on
| websites in general, especially to voice disagreement.
|
| I concur that those who want enforced niceness seldom are
| nice themselves and tend to often have their reasons and
| excuses of why they are justified when they are not so
| nice.
| jabedude wrote:
| The person he responded to had all but told him to go to hell
| in the previous email. Seems like a reasonable response to an
| unreasonable email
| bitwize wrote:
| The CoC is working as designed.
| bitwize wrote:
| Rust's strengths lie where Rust is written like Rust. If Linus
| wants C, he knows where to find it. Rust isn't for "C people"
| -- it's for their replacements.
| sanity31415 wrote:
| Winning friends and influencing people
| toyg wrote:
| From one of those posts:
|
| _> I don 't see how the two languages might coexist peacefully
| without rust toolchain being necessary for building any kernel
| useful in practice and anyone seriously involved in kernel
| development having to be proficient in both languages._
|
| I can empathise with that. Just last week I butted head with an
| issue in a python package that requires Rust internally. The
| lib compiles fine on its own, but something gets screwed when
| running in a virtualenv. Opened a bug in github, and nobody has
| any idea about how to get even _a detailed log_ out of the rust
| toolchain.
|
| I'm sympathetic about Rust, I really am. But sprinkling it
| mindlessly everywhere is a big risk.
| wycy wrote:
| > The more you make it look like (Kernel) C, the easier it is
| for us C people to actually read. My eyes have been reading C
| for almost 30 years by now, they have a lexer built in the
| optical nerve; reading something that looks vaguely like C but
| is definitely not C is an utterly painful experience.
|
| I think he makes a good point about the fact that it's
| certainly possible the Rust code written today won't still
| compile in 10 years, but writing Rust in C-style seems like a
| terrible approach. Write using the idioms of the language used.
| alfonsodev wrote:
| I agree, but if it helps creating more and better drivers, it
| might be a good thing considering the lifetime of the target
| hardware matches Rust lifetime.
| mcguire wrote:
| Personal pet peeve: new people who come into a project
| without any context and use their own idiosyncratic code
| style. :-)
| [deleted]
| Hackbraten wrote:
| > I think he makes a good point about the fact that it's
| certainly possible the Rust code written today won't still
| compile in 10 years
|
| It's possible but unlikely. The Editions feature [1] has been
| specifically designed to provide longevity.
|
| [1]: https://doc.rust-lang.org/edition-
| guide/editions/index.html
| HelloNurse wrote:
| Section 3.16 ("Platform and target support") of the linked
| document is so inadequate for Linux kernel related
| questions, such as "what compilation targets will be
| supported by this Rust Edition in 10 years", that there is
| nothing to quote to show it's inadequate. It doesn't even
| tell what compilation targets are supported _right now_.
| steveklabnik wrote:
| The canonical link for platform support is here:
| https://doc.rust-lang.org/stable/rustc/platform-
| support.html
| seodisparate wrote:
| Redox-OS has a similar situation that kernel code should never
| panic https://gitlab.redox-os.org/redox-
| os/redox/blob/master/CONTR... . No possible
| panics should ever exist in kernel space, because then the whole
| OS would just stop working.
|
| Note that Redox-OS is written completely in Rust.
| mixedCase wrote:
| I see in that document the mention of a libredox as a libstd
| replacement, I'm guessing that means they have their own
| primitives with constructors that handle allocation failures?
| nemetroid wrote:
| Previously discussed in
| https://news.ycombinator.com/item?id=26812047.
| [deleted]
| quietbritishjim wrote:
| Previous post about the RFC suggesting to add it is here [1].
| This is Linus's reaction to that RFC. The top comment there also
| links to this email from Linus, and there's some interesting
| discussion there.
|
| [1] https://news.ycombinator.com/item?id=26812047
| brunoluiz wrote:
| Every time something like "Linus Torvalds on ..." pops up, I get
| curious if it will be an interesting explanation or a full Linus
| rage TM. This post is defo the former though (perhaps he is not
| raging anymore hehe).
|
| If you never saw one of his rage moments, take a look at:
| https://lkml.org/lkml/2012/12/23/75 and
| https://lkml.org/lkml/2013/7/13/132
| chem83 wrote:
| There's enough rage from other members to cover Linus and then
| some, just follow the thread:
| https://lore.kernel.org/lkml/YHiMyE4E1ViDcVPi@hirez.programm...
| ambentzen wrote:
| The last half is basically his own opinions stated as truth
| from God, which, to me, makes him look like a complete tool.
| Personally I'm not a fan of parens around the conditional in
| ifs, but I don't go around spouting that it is the only true
| way.
|
| It basically makes me write off the first half, even if it is
| correct (I'm not a linux dev, so I don't know).
|
| Added: From the reply on lkml I'm not the only one.
| foperator wrote:
| It's a style that has been recognized as humorous for
| decades before the corporate CoC-wielding bureaucrats moved
| in. Much more refreshing than passive aggressive replies
| like this one:
|
| https://lore.kernel.org/lkml/YHkaaTQ2KQML2iqt@google.com/
|
| I think Rust is not a culture fit for the Linux kernel and
| its adoption will further undermine Linux.
| KptMarchewa wrote:
| Humorous? I don't think how anyone could think stuff like
| "I cannot view with less or vim. Therefore it looks not
| at all." as even remotely funny.
|
| Sounds just like "neckbeard yells at cloud" style to
| justify that the world does not do exactly what he wants.
| marsven_422 wrote:
| Those neckbeards makes computing go around!
|
| The Rust pushing SJW CoC toteing lib-bags adds nothing to
| the universe.
| bennysomething wrote:
| Thought he decided he was gonna try to be polite? I prefer the
| rage version. Less corporate email style
| carlhjerpe wrote:
| I like the Linus that shittalks NVIDIA, though i guess that's
| free NVIDIA markeing, which I really do not want them to
| have.
| noir_lord wrote:
| Both those posts are 8 or more years old.
|
| He has toned it down somewhat in recent years - still blunt
| but that's an admirable trait for a techie (note blunt !==
| rude).
| brunoluiz wrote:
| Indeed, people change after so many years. It is good to
| see that now he is providing insightful explanations,
| without raging or anything. After all, he is the creator of
| Linux and his thoughts about its directions are always
| super valuable.
| azernik wrote:
| IMO his rage posts usually still had great insightful
| points (cf "we do not break userspace"). The rage part
| was maybe unnecessary, but it wasn't the entire content
| of the emails.
| phekunde wrote:
| You have to understand that the Linux Foundation is using his
| "rage" as a PR exercise. He was outspoken from the start, even
| as a kid. His reply to Tanenbaum when he first released the
| Linux kernel is well know. But the Linux foundation realised
| that they can capitalise on this "rage" by making it more
| visible. Any publicity is good publicity, as they say in PR
| world. Negative news makes more impact than positive news. How
| many people outside the BSD community know who the lead
| developers are for BSD OSes? How many of them are interviewed
| at TED talks[0]? It is all about publicity.
|
| [0]
| https://www.ted.com/talks/linus_torvalds_the_mind_behind_lin...
| foperator wrote:
| The Linux Foundation does nothing like that. Linus had been
| famous before its existence.
|
| On the contrary, they made an example of him and had him
| recant to further the new corporate substitute religion that
| is used as a worker suppression tool and that solidifies the
| power of idle bureaucrats.
| qalmakka wrote:
| You may argue that OpenBSD's founder Theo de Raadt is
| arguably not a shy, calm person either, but that hasn't
| brought to its project the same amount of mindshare and
| attention Linux has.
| loloquwowndueo wrote:
| I know about Theo :) ( though the rage pattern also applies
| there)
| phekunde wrote:
| I know about Theo as well even though I am not a user on
| any of the BSD distribution. And that is my point. I knew
| about him because there was a discussion about him similar
| to the discussion that happens on rage of Linus. "Squeaky
| wheel gets the oil" as they say! In this case oil is
| publicity.
| [deleted]
| de6u99er wrote:
| This was already shared yesterday. See top comment.
|
| https://news.ycombinator.com/item?id=26812047
| azernik wrote:
| From a later email in the thread: > There's a
| philosophical point to be discussed here which you're skating
| > right over! Should rust-in-the-linux-kernel provide the same
| memory > allocation APIs as the rust-standard-library, or
| should it provide a Rusty > API to the standard-linux-
| memory-allocation APIs? Yeah, I think that the
| standard Rust API may simply not be acceptable inside the
| kernel, if it has similar behavior to the (completely
| broken) C++ "new" operator.
|
| Having done C++-in-the-kernel work, this is precisely right. That
| work required not only abandoning the C++ memory model, but also
| implementing a separate standard library that complied to kernel-
| space limitations/requirements.
| dralley wrote:
| You're skipping over all the emails that explain that it's not
| a fundamental limitation, many of the falliable allocation APIs
| already exist and many others have been in planning for a long
| time.
| azernik wrote:
| I was referring to the C++ description as "precisely right".
|
| Though note from those emails that Rust-in-the-kernel will
| also require a rework of a lot of stdlib stuff, especially if
| they want to use the kernel allocator; it'll just be a LOT
| neater and more idiomatically-Rusty than the stuff I had to
| deal with, and will be much more compatible with off-the-
| shelf Rust libraries.
|
| And the developers of this Rust port repeatedly talk about
| removing panicking API calls from the library for kernel use,
| and adding extra non-panicking versions to the standard
| library. It's only that latter work that will likely make it
| into Rust upstream.
| ncmncm wrote:
| You can write bad code in any language.
|
| You can invent BS about any language.
|
| Linux kernel does not use glibc, why assume it would use the
| default generic application-level std lib for Rust, or for C++?
|
| FUD does not enlighten.
| KirillPanov wrote:
| Agreed. The whole panic situation is an awful mess that the Rust
| designers refuse to deal with. It's the sort of thing you put
| into a language when it's a toy, but Rust isn't a toy anymore and
| they refuse to do any of these:
|
| * remove it
|
| * make panics reliably "catch"-able 100% of the time like thrown
| exceptions (in languages with exceptions)
|
| * make it possible to statically check panic-safety, the way we
| check unsafe {}
|
| The dodge I always hear is "yeah but your code could always go
| off into an infinite loop, and we'll never be able to prevent or
| detect that". Okay, but _nobody uses "while(1){}" as a deliberate
| response to an error condition_. The Rust standard library is
| jammed full of code that will deliberately panic if tickled the
| wrong way.
|
| Look, I love Rust, but the denial around panic is just
| ridiculous. We need "might_panic" annotations, enforced exactly
| the way "unsafe" annotations are enforced, and the standard
| library needs to be updated to provide them. Bite the bullet.
| Linus is right. This should have happened years ago.
| salicideblock wrote:
| Hear hear!
|
| Here's to hoping that inclusion in the kernel will provide the
| motivation for addressing this in Rust.
|
| This is a point that may benefit from the new backing of Rust
| by the foundation - non-userspace was a side thing for Mozilla,
| but in Google and Amazon they have some core stuff depending on
| kernel.
| bitlevel wrote:
| Here's to hoping it goes nowhere near the kernel until the
| Rust developers address it first, not the other way round.
| littlestymaar wrote:
| This comment is strongly missinformed:
|
| 1- panicking allocations are here to stay, because in lots of
| case, it's the most convenient behavior. BUT Rust is adding
| fallible allocations methods (prefixed with _try__ ) which
| return a result instead of panicking on allocation failure.
|
| 2- panics are catch-able as long as you don't compile your
| binary with _panic=abort_ setting (and as long as you don 't
| panic in your panic handler itself (including in your types'
| destructor, which are called during stack unwinding))
|
| 3- panics can only occur in specific places (array indexing,
| allocations, utf-8 validation, unwrap, etc.) which are by
| definition known at compile-time, and there's tooling to catch
| these up [1].
|
| In practice, a _might_panic_ annotation would add a lot of
| noise for pretty much everybody, because most of us mortals use
| panicking function all days and it 's not a big deal. Obviously
| it is critical for Linux, but because it's relevant only to the
| minority of Rust users, it doesn't make sense to include it in
| rustc itself: it's exactly the kind of situation where external
| tooling is the good option.
|
| [1] https://github.com/Technolution/rustig
| KirillPanov wrote:
| This comment is strongly confused about typechecking.
|
| > [1] https://github.com/Technolution/rustig
|
| That's a binary analysis tool (for x86_64 only!).
|
| It is only approximate, and does not claim to be an accurate
| analysis like typechecks (such as the unsafe check) are:
|
| https://github.com/Technolution/rustig#limitations
|
| > All paths leading to panic! from one of those functions
| (whether actually used or not) will be reported.
|
| Crude binary analyses like these lack the critical
| _efficiency_ and _predictability_ properties of type checks:
|
| * Typechecks like the unsafe-check can be done on each module
| in isolation, using the code from that module and only the
| types (not the code) from any modules that it calls. This is
| the essential property that makes typechecking _efficient_
| enough to run on every compilation.
|
| * If each module separately passes the check, then a program
| using both of them passes the check. This is the essential
| property that makes typechecking _predictable_ enough to
| produce useful error messages. This binary analysis tool
| operates on the whole program. Turning optimizations on and
| off will cause the tool to switch from reporting "ok" to
| reporting false positives.
|
| Pretending that a tool like this is a replacement for a
| typechecker amounts to grossly negligent software
| engineering.
|
| Panics are an ugly leftover from the bad old days before Rust
| had nice monad-like "?" syntax for Result<T,E> error-
| handling. It's time for unrestricted panic to sunset. The
| Rust community has waited too long to do this, which is why
| it will be painful. Waiting longer will only make it even
| more painful.
|
| > panics are catch-able
|
| False. Unwinding panics runs drop handlers; if one of these
| drop handlers itself panics, Rust just sort of throws its
| hands up in the air and aborts the process no matter what:
|
| > Given that a panic! will call drop() as it unwinds, any
| panic! in a drop() implementation will likely abort.
|
| https://doc.rust-lang.org/std/ops/trait.Drop.html#panics
|
| The panic mess is full of nasty ill-defined edge cases like
| this. It's gotta go.
| TickleSteve wrote:
| Its quite common to use "while(true);" in the embedded world in
| order to trigger the hardware watchdog and cause a reset in a
| panic situation.
|
| It indicates that the only sensible option is to reset the
| processor to get back to a known-good state.
|
| This is pretty much how you handle unrecoverable errors in the
| embedded world where you must always take account of what
| happens in a reset condition for all situations and maintain
| 100% uptime because no one is going to reset it manually.
| KirillPanov wrote:
| No, it is not common at all.
|
| You simply jump to the watchdog interrupt handler, or execute
| the "reset the CPU" instruction.
|
| No need for such convoluted foolery with loops and timeouts
| to achieve exactly the same effect. Remember, embedded
| processors have no MMU -- there's no context switch when
| entering an interrupt handler.
| andy_ppp wrote:
| I'm not au fait with Rust or memory management or systems code to
| understand the issue here. Does anyone have a good explanation?
| steveklabnik wrote:
| The short explanation is, this code wasn't in its final form
| yet, but good enough to ask for a high level review of the
| code, the review came back "hey this looks okay overall but I
| have some questions about <details>" and the reply was "great!
| <details> is the work we haven't done yet; we'll get on that."
| andy_ppp wrote:
| Ah I understand that bit! I don't understand why panicking if
| you run out of memory is a thing and or why it's especially
| bad in kernel code...
| steveklabnik wrote:
| Ah okay.
|
| Rust's panics are used as the "I cannot recover from this
| error" mechanism. Most applications don't really properly
| handle OOM, and so this kind of error falls into the "I
| want my current thread to just die, thanks" style of error
| handling, hence a panic.
|
| That's bad in the kernel because you don't want the kernel
| to die, you _do_ want to handle it and do something.
|
| The semi-ironic part here is that this behavior is the way
| that it is largely because of how the Linux userland works,
| where it's often tough to even tell that your system is is
| in a near-OOM or OOM state. Which is why applications
| rarely handle it, even if they theoretically could.
|
| Now, Rust itself has good support for these environments; I
| work on an embedded kernel, and we do no allocations at
| all. But what the kernel wants is where Rust is currently
| weakest in support; we have good support for "no
| allocations" and "panic on OOM or return Result on OOM" but
| not great support for "return Result on OOM and do not
| panic on OOM", and reasonably, the kernel would like the
| behavior they don't want to be impossible. That's the work
| that needs to be done.
| utxaa wrote:
| > we have good support for "no allocations" and "panic on
| OOM or return Result on OOM"
|
| so if i understand it correctly, the ability is there by
| just "returning a result on oom" - linus is just asking
| for complete assurance that a panic will never happen?
| like say from a library?
| steveklabnik wrote:
| It is not 100% clear to me if he's asking for no panic in
| any possible situation. (AFAIK, the BUG macro in the
| kernel does something very similar, so I would find it
| _slightly_ surprising, though he may only want it to
| happen "explicitly" or something, we'll just have to
| see.)
|
| It is clear to me that he is asking for no panics for
| OOM.
| utxaa wrote:
| > It is clear to me that he is asking for no panics for
| OOM.
|
| agreed. which is why i'm confused. maybe this is a non-
| issue that has exploded into an issue :)
|
| thanks.
| zidoo wrote:
| I really miss old Linus - This email is way too long :). However,
| it is 100% on point.
| drummer wrote:
| This whole effort to bring rust into Linux kernal should be
| killed with fire asap before it's too late.
| j10c wrote:
| Agree, kernel is responsible for managing all low level stuffs
| that we mostly take granted, it should not have an extra level of
| abstraction.
|
| I have read many mains of Linus where he specifically rants about
| the need of avoiding kernel panic, he specfically points what not
| to do in those mails. Certainly worth reading to understand this
| mail.
| joseluisq wrote:
| Yeah, worth reading. Even not being me a systems-developer
| stricto sensu, but Rust dev BTW.
|
| There are valuable sort of pro/cons standpoints about Rust RFC
| for the Kernel.
|
| And honestly I have learned many things. Hours well spent :).
| Just recomendable even if you like Rust or not.
| flazzarino wrote:
| The ultimate memory safety PL is rejected because it cannot
| malloc without blowing up?
| steveklabnik wrote:
| It has not yet been rejected. The review is vaguely positive.
| Like any pull request, there has been some feedback, which will
| now be addressed.
| The_rationalist wrote:
| Are there discussion about java/kotlin kernel support through
| Graal Native? That would have too a great value proposition
| KingOfCoders wrote:
| A Rust noob. But isn't allocation done with e.g.
| String::from("abc")? I wouldn't want to have an API there.
| TheCraiggers wrote:
| This is the first time I've ever seen this usage of NAK. I'm used
| to the ACK/NAK jargon, but this one is new to me. I can figure
| this one out by context I think, but I find the new usage
| interesting. Can anyone shed light on how exactly he's using it
| here, and the reasons why you think that?
| mixedCase wrote:
| As a synonym of "rejected as invalid", for that is the
| information conveyed by a NAK response.
| siscia wrote:
| The point is extremely valid.
|
| It has never been an issue in my use case of Rust, but the lack
| of an interface for when the system is out of memory is
| problematic.
|
| Not only in kernel, but in all system-level software.
|
| Then, it happens, less and less frequently, but still, I want to
| know and I want to be able to handle it.
| skohan wrote:
| Having this feature is one of the USP's of Zig isn't it?
| [deleted]
| dathinab wrote:
| > but the lack of an interface
|
| It's not lacking, the standard library just defaults to not use
| it by default because handling OOM is a conceptual mess (yes,
| the OOM handling features C/C++ has are included in this).
|
| But even in the standard library you have methods (some not
| stable) like `try_reserve` which returns an error is allocation
| fails.
|
| Anyway all allocation parts are part of the standard library
| (or the alloc library), i.e. they are not a core part of the
| language itself.
|
| I don't think anyone ever planed to use rust's standard library
| as-it-is in the kernel (it's just not designed for this, e.g.
| see panics) or pull in external dependencies from
| cargo/crates.io without vendoring them.
|
| So it's _totally_ possible to run rust without alloc caused
| panics.
|
| Now besides that there is the question about panics outside of
| allocations. Like e.g. if code realizes it ran into a violated
| invariant, e.g. it knows we have a bug which isn't explicitly
| handled.
|
| And guess what the kernel already has a handler for it: `BUG()`
| so basically any panic will call `BUG()` (and we don't panic on
| memory allocations).
|
| Now maybe some tweaks to the panic system are necessary to make
| sure this all works well (panic=BUG(), is like panic=abort and
| on abort call BUG()!).
|
| Now wrt. the integer and float parts, there are two thinks
| first the panic will likely call without debug assertions like
| overflow checks (which would call BUG()!! So many are gone, but
| here are some special cases around floats and 128bit integers
| on platforms which don't support it where calling BUG() is
| inappropriate but I have to look into this, tbh.
|
| So short:
|
| - panic on mem-alloc failure is a (lib)std/alloc thing, kernel
| code would anyway have used something easel
|
| - panic in the kernel can be made into being basically BUG()
|
| - issues around float and 128-bit integers should be fixable,
| at worst with a compiler flag. But embedded code is by it
| affected, too. So there is a good chance there is already a
| fix.
| volta83 wrote:
| The problems are that:
|
| - most such interfaces do have a constant overhead that all
| programmers using the language have to pay.
|
| - some major operating systems (like Linux, ehem) make these
| interfaces useless for all their user space apps by enabling
| overcommit by default, which makes it hard/impossible to write
| portable code that can handle OOM
|
| Explicit OOM handling would mean that most users end up paying
| a relatively high upfront cost for something that in practice
| for them (e.g. if they are Linux programmers) delivers no
| value.
|
| For example, on Linux with overcommit (the default), even if
| the system is out of memory, malloc won't return null. It
| returns a pointer that's not null, such that your if(ptr ==
| nullptr) will act as if everything is "ok", but then, when you
| try to read/write that memory, that will trigger a hardware
| exception, that the kernel will catch, and then the kernel will
| tell the OOM-killer to "make space" by killing "some process",
| and maybe your app is just killed, or some other app that your
| app is working with is killed leading to a race condition,
| or... or....
|
| When your app is killed by the OOM killer, the signal that your
| app gets is "irrecoverable", which means that your app will
| die, you can at best try to do some cleanup before it does, but
| it will die nevertheless.
|
| So I find it extremely ironic for Linus to argue that practical
| programming languages are hard to use for environments that
| must handle OOM errors, when they are championing one of the
| major platforms that makes handling OOM useless by default.
|
| I keep saying this, but the obvious fix is for the Linux kernel
| to use overcommit internally just like they expect user space
| to do. When the kernel then runs out of memory, it should then
| start killing drivers at random to make for some space. If they
| think that's such a great default behavior, they should commit
| to it. \s
| teknopaul wrote:
| You can't solve memory allocation problems by turning off
| overcommit and handling alloc errors in rust. You don't get
| Null, you get panic. That's a problem for the kernel, even if
| you argue it's not a problem elsewhere. Start killing drivers
| is nuts, oh look my screen went blank, and three HDs stopped
| working. Damn where is my swap.
| volta83 wrote:
| > Start killing drivers is nuts,
|
| I don't disagree, just saying that its as nuts as killing
| random user-space programs on OOM.
|
| "The web browser misbehaved, lets kill the pulseaudio
| daemon, and if that doesn't fix it, lets kill WiFi, and if
| that doesn't fix it, lets kill vim, and... "
|
| FFS if some app tried to allocate too much memory just let
| the app know, or kill that app, but don't start randomly
| killing user-space processes.
| simiones wrote:
| While I agree that OOMKiller is pretty mad, it's also
| important to note that the shared nature of memory means
| that the app which will die when the system is out of
| memory is anyway random. Even without overcommit, you can
| get into the situation of "the web browser is occupying
| 99% of RAM to open YouTube, and now pulseaudio tried to
| allocate another 50 bytes that the system just doesn't
| have, so pulseaudio should handle this, and then the WiFi
| manager will be the next to need 100 bytes and be refused
| etc".
|
| Even worse, Linux by design stalls to a complete crawl
| when memory almost runs out, even with 0 swap and a fast
| disk, as it will first start swapping out code pages
| before giving up. Which means that every time a process
| is context switched, its code ends up first needing to be
| read from disk, the worst possible kind of cache
| thrashing imaginable.
| sai_c wrote:
| So, basically you are saying "Rust is mostly used as an
| application programming language anyways, so let's not pester
| those users with the overhead of OOM handling"?
|
| Fair enough. But this would mean that Rust is (factually) a
| systems language in the same sense that Go was initially
| declared to be a systems language.
|
| If I look at the Rust community (I give it a try from time to
| time), I would totally agree with your point, that most users
| would be pestered by this overhead. I see mostly CLI tools,
| "web apps" and the kind.
|
| Furthermore, I just can't shake off the feeling that the
| embedded (bare metal, 16 or 32 bit architectures) or (in-
| house) kernel crowd will always be 2nd class citizens.
| I keep saying this, but the obvious fix is for the Linux
| kernel to use overcommit internally just like they
| expect user space to do.
|
| I'm not a kernel developer, but aren't you (maybe) asking for
| a bit too much? You ask the Linux kernel devs to change the
| kernel in a significant way, just so you can use Rust for
| writing drivers in a way you write user space applications.
| volta83 wrote:
| > So, basically you are saying "Rust is mostly used as an
| application programming language anyways, so let's not
| pester those users with the overhead of OOM handling"?
|
| No. What I am saying is that Rust intended to be a systems
| programming language that could handle OOM properly
| everywhere, but that got a lot of opposition because some
| operating systems, mainly Linux, make it impossible for
| programs to handle OOM _at all_, so adding proper OOM
| support to Rust would mean that every Rust Linux programmer
| would be paying for a feature (in ergonomics, etc.) that
| they cannot use _at all_.
|
| That's the irony.
|
| Linux design has made it not worth it for new programming
| languages to be designed to handle OOM properly, because
| that's impossible to do in OSes like Linux.
|
| People have been complaining about this for user space
| programs for the last 20 years, and the Linux kernel stand
| was "that's a feature, not a bug".
|
| But now that they are confronted with it, you see Linus
| writing stuff like "that's not acceptable".
|
| That's a huge double standard IMO.
| ssokolow wrote:
| I don't think you can use "double standard" as a
| derogatory term when you're comparing the needs of
| kernelspace code and userspace code.
|
| ...plus, they're already planning to write their own
| `alloc` replacement if for no other reason that they need
| to support API features of the kernel allocator that are
| absent from the userspace allocator, like GFP flags:
|
| https://github.com/Rust-for-
| Linux/linux/issues/2#issuecommen...
| dathinab wrote:
| No it's about "by default" not pestering users.
|
| Rust has all the tools needed to gracefully handle memory
| allocation failure.
|
| Yes, the tools are a bit limit wrt. usages in the standard
| library where you mainly can use them to handle "known to
| potentially fail big" allocations but not every single
| small allocation.
|
| But enter no_std and it's basically all your choice, which
| is also the default thing to use for embedded/bar-metal and
| I count the kernel as embedded/bar-metal. (It's default
| because the std library types are tuned for the most common
| use-cases including web-server and user-space system
| programming, in which you always can have panic on OOM, and
| error kernel as a _much_ more ergonomic pattern then
| explicitly returning a result on every thing which
| potentially allocates).
|
| Through without question the discussion around it is a
| mess.
| gjulianm wrote:
| > For example, on Linux with overcommit (the default), even
| if the system is out of memory, malloc won't return null. It
| returns a pointer that's not null, such that your if(ptr ==
| nullptr) will act as if everything is "ok",
|
| Not always. The default overcommit policy will reject certain
| allocations that are too big. Also, you can change the
| overcommit policy if you need it.
|
| Also, overcommit policy not rejecting allocations by default
| tends to be more useful than harmful. One example is forking
| processes: the memory space gets duplicated with copy-on-
| write, so if you didn't allow overcommit you could end up
| with processes that couldn't be forked due to that behavior.
| Not to mention that most programs don't use all the memory
| they have assigned, so without overcommit you'd have a lot of
| problems with RAM underusage.
|
| > So I find it extremely ironic for Linus to argue that
| practical programming languages are hard to use for
| environments that must handle OOM errors, when they are
| championing one of the major platforms that makes handling
| OOM useless by default.
|
| Kernel and user space programming are very, very different. A
| lot of developers use languages with garbage collection
| without issues, but that's not acceptable in kernel
| programming. The fact that the platform provides X feature
| does not mean it has to be developed including X feature. In
| this case it's very clear. Even if they used overcommit in
| kernel code, Linux does not always overcommit memory, so
| you'd still need to manage the OOM case in kernel properly.
| nabla9 wrote:
| You are talking about userspace. Linus is talking about
| kernel.
|
| You can change the Linux memory overcommit policy using
| sysctls if you need it.
| teknopaul wrote:
| die on run out of memory is a bad policy for a userspace
| server. Seen some hairy production incidents with that,
| basically solution is piss off customers until they stop
| coming back.
|
| action on run out of memory should be finish up currently
| running work and hold off new stuff until you have
| resources, not die and start to reprocess the thundering
| herd. When you run out of memory last thing you want to do
| is empty your hot caches.
| nabla9 wrote:
| If that would be important you would disable overcommit
| in Linux. If you didn't do that, it's probably because
| it's good tradeoff that saves resource.
| bronson wrote:
| You're out of memory. Every possible cache has already
| been freed.
|
| Unless your app is faking it's own caching, in which case
| it might be contributing to the problem.
| angry_octet wrote:
| It's quite possible that an app is holding onto data that
| can be flushed, at the expense of future database
| lookups, session negotiations, etc. Without a proper
| mechanism in the kernel it is hard to manage memory
| pressure.
| pornel wrote:
| The "but the OOM killer!" argument has been a disaster for
| Rust, and keeps derailing all design discussions.
|
| * Linux is not the only OS in the world.
|
| * Even on Linux you have containers/cgroups that can impose
| hard limits.
|
| * Platforms without virtual memory are also an important
| target for Rust.
|
| * On 32-bit platforms you can run out of address space before
| you run out of RAM.
|
| * Regardless of what the OS does, the application may still
| want to impose its own internal limit (e.g.
| https://lib.rs/cap) to avoid being OOM-killed or swap death.
|
| People keep telling me how Linux never runs out of memory,
| while I'm currently firefighting a torrent of coredumps
| caused by Rust's self-own on OOM that actually happens.
|
| I generally love Rust, but its OOM handling is awful, and the
| "but the OOM killer" nonsense is to blame for stalling the
| absolutely critical fixes it urgently needs.
| teknopaul wrote:
| word
| simias wrote:
| I agree that the OOM killer is frankly irrelevant in this
| discussion (and a bad idea in the first place IMO, cue the
| "airplane company deciding which passenger to throw out the
| plane" argument).
|
| But on the other hand in my experience with C and C++,
| languages which do allow explicit OOM handling, most
| applications either just crash when that happen (the
| `xmalloc` route) or attempt to recover but often really
| can't do much.
|
| It goes even beyond programming languages. Very few
| environments deal nicely with near-OOM conditions. My Linux
| desktop becomes effectively unusable if I start
| aggressively swapping. And technically at this point as
| long as I have swap I'm not literally out of memory.
|
| My general point is not that I don't think Rust would
| benefit from allowing the developer to explicitly handle
| OOM conditions, it's more that the vast majority of
| applications are not written to gracefully degrade in these
| conditions anyway, and if the only handling you do is
| effectively `alert("out of memory!"); exit(1);` it's really
| not worth bothering with it.
|
| On the other hand for the minority of applications that
| want to do something more meaningful on OOM you probably
| want to take the time to design some ergonomic system that
| won't litter the code with boilerplate, i.e. you probably
| don't want every single Vec and String and basically every
| single facet of the standard API that can allocate behind
| the scenes to return a `Result<_>`.
|
| I'm sure there's something to do, but I think the Rust devs
| are right not to rush it given that there are so many other
| features left to implement.
| ff317 wrote:
| > if the only handling you do is effectively `alert("out
| of memory!"); exit(1);` it's really not worth bothering
| with it.
|
| I'd argue that even this is worth it, for most software.
| At least then you get a clean exit and an obvious
| problem. The alternative is to not detect it and let the
| code walk off into undefined behaviors and potentially
| subtle bugs that can harm data and/or break security.
| loeg wrote:
| Yeah, explicit OOM exit beats random NULL deref any day.
| The former can be analyzed by an L2 tech out of a
| logfile. The latter requires spinning up GDB to figure
| out where NULL came from.
| drbawb wrote:
| In my experience the problem with the latter isn't even
| that it requires a skilled analyst & a debugger. The
| reality is that, in a multi-threaded program, by the time
| your process has crashed on a null dereference the
| offending call stack is probably long gone.
|
| The faster you fail the more readily apparent the root
| cause will be from your logs/core dump/etc. For this to
| work though reboots (of the application/environment) need
| to be inexpensive, and you also need supervision. That
| supervision can either be at the OS level, like Linux's
| systemd, Solaris' SMF, Apple's launchd, etc., or at the
| application level like Erlang/OTP.
| simias wrote:
| To be clear Rust will give you a clean exit in this
| situation (or at least, I assume that to be true, it
| would be a glaring issue if it didn't). On linux however
| since malloc usually can't fail you won't ever see that
| since the kernel will realize the issue on access and not
| on alloc, then the OOM killer will play russian roulette
| with your processes.
| varajelle wrote:
| The overcommit is not the only argument.
|
| The second argument is that complex programs that need to
| handle OOM properly will be full of complex untested error
| paths.
|
| How should one handle a OOM error If one can't allocate
| some temporary node during an algorithm and how to recover
| without leaving a corrupted state? That's not a question I
| want to answer for almost every function call.
|
| That's why the rust standard library data structures don't
| handle OOM. To make it easier to use for the most common
| case. And for the cases where the OOM handling is required,
| one can simply use data structures for it, which is what
| Linux will do (problem solved)
|
| Also compare that to the case of mutex poisoning which can
| return an error on each unlock, and which is now considered
| a mistake because in practice, nobody want to handle a
| poisoned mutex error.
| pornel wrote:
| I argue it's a rare-enough problem:
|
| * Statistically, likelihood of failure is proportional to
| the allocation size. You'll most often fail to allocate a
| very large Vec, and then only need to allocate a few
| bytes for a string that says "error!". I'm using
| https://lib.rs/fallible_collections and just a few
| strategically placed try_reserve has brought down my
| coredumps by 99%.
|
| * Panic on OOM would unwind, and unwinding tends to
| _free_ memory rather than allocate anything. The standard
| library should reserve /preallocate enough memory to be
| able to start unwinding.
|
| * Rust is good at avoiding temporary allocations and
| relies on the stack a lot. You don't have to use fancy
| error-handling libraries that collect backtrace on every
| error. Error handling with enums doesn't touch the heap
| at all.
|
| * Even if everything fails, and you get OOM during OOM
| handling, that's fair, and still better than abort every
| time.
|
| Note that Rust currently doesn't handle OOM by panicking.
| It unconditionally aborts the whole process. Libstd could
| switch to panics without API changes, and that would make
| OOM handling possible via catch_unwind.
| varajelle wrote:
| For example of the problem that the API would have if it
| wanted to handle OOM: Box, String or Vec, the most basic
| types in the rust alloc crate, couldn't implement Clone,
| because clone() can't fail, and you need to allocate
| memory to clone these data structures. As a result, lots
| of generic algorithms that relies on on their type to be
| clonable wouldn't work, and most user types could not
| #[derive(Clone)] anymore.
| astrobe_ wrote:
| So, if I get this right, this means that the "just don't
| use the alloc crate" argument given somewhere else here
| is not practical ?
| pixel_fcker wrote:
| No not at all.
| astrobe_ wrote:
| Because ...? I mean, from the perspective of an outsider
| reading that you "can't use strings" if there's a risk of
| allocation failure is a bit worrying.
| volta83 wrote:
| Hows that' a problem?
|
| Provide `TryClone`, there, problem solved.
| loeg wrote:
| I agree, it doesn't seem like these types can actually
| implement Clone if allocation can fail.
| silon42 wrote:
| It would be really helpful, if the OOM killer should be
| adjustable to kill non-overcommitting programs last (kill
| any with overcommit first). Then programs could be slowly
| fixed/improved.
| dathinab wrote:
| > but the OOM killer!" [..] disaster for Rust [...]
|
| Only on HN discussions (the disaster part) ;=)
|
| (Through it's is a broken discussion, OOM on Linux is
| broken, but it becomes increasingly "un-"broken and like
| you said other platform exists, too).
|
| Anyway there is no disaster as:
|
| Rust _does support_ handling memory allocations gracefully,
| the allocator API defaults to this!!
|
| Just various types defined in [lib]std (or precise
| [lib]alloc) default in there default methods to calling the
| (replaceable!) allocation error hook.
|
| This is due to ergonomics, there is no sane/ergonomic way
| for "common non embedded/kernel programming or non special
| purpose use-cases" to make literally every method which
| potentially allocates return a result. And _outside of such
| special purpose cases you can always recover from panics_
| (i.e. outside of special purpose cases panic=abort is a
| _anti-pattern_ ).
|
| Still, even in std, methods like `try_reserve` exists and
| are used, e.g. by serde deserializers to make sure to
| gracefully fail if the user tries to allocate a 10GiB array
| (because compressed data formats ;-) ).
|
| BUT for the kernel all this doesn't matter as it is very
| unlikely it will use [lib]std or [lib]alloc. So they can
| default to just always implement their types in a way which
| doesn't panic on memory allocation.
|
| And wrt. to panic in other (basically guaranteed to be a
| bug cases) panic=BUG() (ther kernel BUG() macro) can be
| done.
|
| I'm more worried about the float/128bit integer thing he
| mentioned, as I don't know anything about it. I assume it's
| not just a miss understanding of overflow checks and other
| debug assertions the kernel likely will disable for non-
| debug builds?
| littlestymaar wrote:
| > I'm more worried about the float/128bit integer thing
| he mentioned, as I don't know anything about it. I assume
| it's not just a miss understanding of overflow checks and
| other debug assertions the kernel likely will disable for
| non-debug builds?
|
| Idk about 128bits integer, but the floating point
| question has been discussed on Rust's subreddit[1]:
|
| > Normally, the kernel leaves the floating-point state of
| the CPU in whatever state the userspace process left it
| in, so that you can do a quick system call without having
| to save and restore all that state. It's possible to use
| floating point in the kernel with a great deal of care,
| but you have to notify the kernel that you're doing so,
| so that it can save the userspace floating-point state.
|
| > So, it'd be helpful if by default Rust didn't allow use
| of floating-point, and then you could opt in to allowing
| it for specific code.
|
| And from Linus[2]
|
| > In other words: it's still very much a special case,
| and if the question was "can I just use FP in the kernel"
| then the answer is still a resounding NO, since other
| architectures may not support it AT ALL.
|
| [1]: https://www.reddit.com/r/rust/comments/mqxr1a/rfc_ru
| st_suppo...
|
| [2]: (from the same reddit thread) https://ipfs.io/ipfs/Q
| mdA5WkDNALetBn4iFeSepHjdLGJdxPBwZyY47i...
| dathinab wrote:
| Well if that it all this could be solved by rust by
| having a "floating_point_usage" (or similar) lint which
| normally defaults to "allow" but for kernel devs defaults
| to "error" (which could be done through an option in the
| target spec, which as a side note is also what defines if
| you have hard,soft or no fp support). (there are probably
| better solution but this one should be the easiest).
| ssokolow wrote:
| There's actually a lint specifically intended for that
| already. `float_arithmetic`:
|
| https://rust-lang.github.io/rust-
| clippy/master/#float_arithm...
|
| # What it does
|
| Checks for float arithmetic.
|
| # Why is this bad
|
| For some embedded systems or kernel development, it can
| be useful to rule out floating-point numbers.
| dathinab wrote:
| Thanks, nice, but for the kernel use case it would need
| to be part of rustc.
| zaarn wrote:
| Simply install clippy and call "clippy-driver" instead of
| "rustc", which will handle this. Clippy is largely the
| lints that haven't either made it into rustc or are
| fairly edge-case-y.
| [deleted]
| prussian wrote:
| It honestly isn't.
|
| Out of memory means, your system is simply not designed for the
| task at hand. The kernel returning -ENOMEM only masks the fact
| that eventually Linux will have to OOM Panic if it can't OOM
| Kill. Hell imagine the swapping and the I/O spike because your
| VFS cache has been or is currently being purged. I honestly
| think the best case is to just fail when a fundamental resource
| is simply not there.
| [deleted]
| dTal wrote:
| Is it really too much to ask, in 2021, that our computers be
| cabable of saying "I'm sorry Dave, I can't do that", instead
| of "Halt and Catch Fire"?
| magicalhippo wrote:
| > I honestly think the best case is to just fail when a
| fundamental resource is simply not there.
|
| Indeed. I'm using Firefox in a VM. It's a pain, because
| invariably at some point Firefox uses enough memory that the
| VM starts to swap. And then the whole thing grinds to a halt,
| and I usually just end up with the VM equivalent of power-
| off.
|
| Instead I'd be perfectly fine with the kernel telling Firefox
| "computer says no" when it tries to malloc, _before the
| system runs out of memory_ , and then Firefox can do whatever
| it wants with that.
|
| Yes I know there's some cgroups magic or whatever I can do,
| but man, why does it have to be so painful?
| sfink wrote:
| Firefox is a bit of an interesting case.
|
| We have essentially 3 categories of allocations: (1) those
| that are large and/or the size is user-controlled, which
| are (mostly) handled; (2) most of those that happen within
| the JS engine, which are handled but we're constantly
| debating whether it's worth the cost; and (3) all the rest,
| which includes all other allocations outside the JS engine
| as well as the ones within the JS engine that are expected
| to be rare and are too hard to handle in any sensible way.
| For (3), we crash on OOM.
|
| So if you _do_ use cgroups or ulimit or whatever, Firefox
| may do something reasonable with OOMs. Or it may not,
| depending on what code sees the OOM. It 's still an open
| question how often the JS engine handling an OOM is
| worthwhile (as in, it won't just continue to OOM until it
| finds something that will choose to crash.) OOM telemetry
| is a little iffy, so I don't trust statistics based on it.
|
| The cost of (2) is not just code size and code complexity.
| It's also a larger vulnerability surface that is rarely
| exercised. Within the JS engine, we have ways to synthesize
| OOM events to at least get _some_ level of testing. (We 'll
| run a chunk of code repeatedly, OOMing on the 1st, 2nd,
| 3rd, ... allocation, and make sure we either handle it
| properly or do a controlled crash.)
| magicalhippo wrote:
| Personally I'd be fine with crashing a tab or five.
|
| For example Slack uses several gigabytes of memory if I
| forget to reload the tab every hour or so. It has a DOM
| live-leak or something. Perfectly fine to let that thing
| crash and burn.
|
| Letting my PC grind to a halt is far worse for me.
|
| Now my point is, I don't want the kernel to try to fluff
| Firefox and try to limp it along. Sure for a few
| applications it's good that the kernel tries its utmost.
|
| But in most cases, I don't want one application to
| dictate my PC's performance. Gone are the days where I
| use my PC for one thing at a time.
|
| And if the kernel was more strict with applications, then
| hopefully sane OOM handling would force its way into more
| applications.
| lxgr wrote:
| > Out of memory means, your system is simply not designed for
| the task at hand.
|
| Speaking as somebody who's never built such a system,
| wouldn't maintaining some opportunistic cache (e.g. for some
| space-time-tradeoff) be a valid use case of asking for more
| memory and gracefully degrading in case it's not available?
|
| Error handling would simply consist of not expanding the
| cache size (if it happens during an allocation related to the
| cache) or freeing up some cache memory (if it happens for an
| essential allocation).
| marcosdumay wrote:
| You can free opportunistic caches, undo some calculations
| (and optionally put a descriptor of them into the disk),
| freeze some internal services into disk, not start some
| large task (and keep the small but time critical ones
| running)... There are all sorts of things you could do on
| out of memory errors.
|
| But on practice programs ask the users to decide those
| things, and just fail if the user choice is invalid. It's
| very rare that some program makes those decisions by
| itself, and it's usually frowned upon, because each single
| program can not assume that it owns the entire system.
| prussian wrote:
| I mean, it sounds like you're describing overcommit to me.
| Ask whatever large amount you want. Maybe even be like
| webkit and use overcommit for heap isolation. It works out
| great for the userspace case, until the limits are actually
| reached and you still have a failure problem, one probably
| harder to deal with than without overcommit.
| cies wrote:
| Zig (another language fit for low level programming, like C
| and Rust) begs to differ.
|
| [1]: https://ziglang.org/learn/why_zig_rust_d_cpp/#no-hidden-
| allo...
| littlestymaar wrote:
| I love how the section about Rust links to a github issue
| that is more than five years-old, when the fallible
| allocation story has evolved so much in the past year.
| cies wrote:
| So how has it? I was not aware it has changed (also I
| dont need the changes, as I use Rust for more highlevel
| stuff)
| littlestymaar wrote:
| There are new methods returning a _Result_ instead of
| panicking, slowly being implemented for all allocating
| collections in the standard library.
|
| see https://github.com/rust-lang/rust/issues/48043 and
| https://github.com/rust-lang/rust/pull/80310
| cies wrote:
| Like what was mentioned in the linked thread, the "try_*"
| functions.
|
| This is what Zig has as default behavior.
| prussian wrote:
| What does this have to do with my comment? If you're out of
| memory, how can zig know you can just continue on? What if
| your memory is held in tasks that are effectively dead-
| locked because a dependent task is incapable of allocating?
| There are many things that can be happening once memory is
| effectively maxed out. The more common towards the edge is
| higher I/O and the system crawls.
|
| I'm sure Zig is great, but I don't see from what you linked
| how that changes what I said.
| patrec wrote:
| > Out of memory means, your system is simply not designed for
| the task at hand.
|
| You seem not very familiar with the memory "management"
| behavior of Linux and the wonderful ecosystem it has
| engendered. For example no matter how much physical memory
| you have, Chrome for example will just crash all the time if
| you turn off Linux's insane "lie about memory allocation
| succeeding" default.
| CJefferson wrote:
| I agree Rust should have it, but it is VERY hard to do
| correctly.
|
| I've worked on some systems which claimed they were dealing
| with it, but when I purposefully pushed it (by making a malloc
| which would occasionally fail), I quickly uncovered dozens of
| bugs, and the solution was to stop pretending would could
| sensibly handle low-memory situations. SQLite famously does
| handle this situation correctly, but it is a huge amount of
| work.
| Fronzie wrote:
| There are counter examples. For example, data processing as
| part of a measurement application: If allocating the data
| fails, the application should abort that, but keep running to
| allow further control and let the user reduce e.g. the
| sampling size.
|
| I know of one application with a worldwide customer base that
| supports this.
| darthrupert wrote:
| Check out Zig. That language will work much better as a
| kernel/driver language because it is simple by design.
| zlynx wrote:
| Yes, if allocation failure is not tested, it is not likely to
| work correctly. SQLite is also famous for its strict and
| complete test sets.
| kzrdude wrote:
| Rust std::alloc has the same standard interface as everything
| else- return null on failure to allocate. It's just that the
| development in std collections to use this is in progress. And
| it's being retrofitted to collections that assume they can just
| abort or panic on alloc failure.
| steveklabnik wrote:
| (Some of) Those collections live in std::alloc as well, which
| is the issue here.
| mcguire wrote:
| Linus' comments are surprising to me.
|
| First, I would have thought that Rust is far too much like C++
| for him. :-)
|
| Second, he seems remarkably calm about the very idea of a
| compiler inserting memory allocations in kernel code. He's only
| talking about panics. It's been a long time since I was involved
| in any kernel code, but invisible allocations would have set me
| (and any kernel programmers I knew) off worse than the worst rant
| anyone has ever seen from Linus. _Any_ kind of invisible things
| (destructors, anyone?) would.
| paavohtl wrote:
| Rust doesn't have invisible allocations. The compiler doesn't
| even know what an allocation is.
| gpm wrote:
| > but invisible allocations would have set me (and any kernel
| programmers I knew) off worse than the worst rant anyone has
| ever seen from Linus
|
| Reading tea leaves here, but I think he's calm about this
| because he's pretty sure it's not happening (Which it isn't)
| flakiness wrote:
| We should pay more attention to this rust-on-linux project than
| this specific thread. It's a treasure trove. The first email of
| the thread is good introduction:
| https://lkml.org/lkml/2021/4/14/1023
|
| The code: https://github.com/Rust-for-
| Linux/linux/tree/rust/rust/kerne...
|
| Very cool!
| Communitivity wrote:
| What I would love to see is Rust in the kernel, with a caveat.
| That caveat is something akin to Erlang's OTP supervision trees.
| Erlang has a similar philosophy to Rust in that if something
| erroneous happens the process fails. The difference is that
| Erlang OTP is designed to have supervisor processes of the worker
| processes. The only thing the supervisors do is monitor the
| processes under them and control when, if, and how those
| processes are restarted on failure or other conditions. The
| supervisors themselves have supervisors, right up to the root
| supervisor for each Erlang application.
|
| Rust got the language and the tooling perfect, but Erlang got the
| services and service infrastructure perfect. The more I think
| about it the more I think I should shut up and put up. In other
| words, apply my knowledge of Erlang and OTP to create a
| gen_server in Rust as a jumping off point for a OTP like Rust
| framework, perhaps called OARS (Open Advanced Rust Services).
| This is definitely bigger than one person. If you'd like to join
| me on this journey then reply to this comment and I'll send you
| project details by the end of the weekend.
| ta988 wrote:
| I don't know enough Rust and Erlang for helping with that, but
| that definitively sounds great. What would you use for IPC?
| tylerscott wrote:
| I love the idea of OARS and would like to help any way I can!
| steveklabnik wrote:
| Doing this well requires a heavy enough runtime that it would
| disqualify it for this kind of work. Not on a technical level,
| but on a social one. While there have been operating systems
| created with this sort of runtime, they're not as well known or
| successful as ones that haven't. I would imagine this email
| would be a flat "no" if this were a core part of Rust, sadly.
| acomjean wrote:
| From what I remember the Linux kernel doesn't even use
| std.lib. It's pretty straight up C with no dependencies (the
| os being low level). Makes kernel module programming
| difficult.
| SubjectToChange wrote:
| The Linux kernel is freestanding. But that's hardly the
| reason why kernel programming is difficult.
| staticassertion wrote:
| > a heavy enough runtime
|
| Hey, asking out of pure ignorance - Why do you think this? My
| naive point of view is that the Linux kernel already has
| enough of a runtime to support such at hing - a scheduler,
| kernel threads, and interrupts.
|
| An actor system shouldn't be particularly heavy - you can
| probably implement an actor in just a couple of bytes.
|
| Agreed that it couldn't be a core part of Rust though.
|
| Maybe it's too off topic or whatever, just curious.
| steveklabnik wrote:
| Erlang makes heavy use of green threads to do this kind of
| work. You spin up thousands or hundreds of thousands of
| these. Kernel threads are too heavy weight to do so. And
| making them lighter weight has tradeoffs too. Erlang makes
| use of a GC and immutability to make tasks more restart-
| able; maybe Rust's memory safety features would let you do
| this sorta kinda, but I don't think it's been really
| demonstrated fully yet.
|
| Like sure, you could build out some of these features, but
| they only truly work really well if you extremely commit to
| the architecture, in my opinion. And the kernel isn't about
| to do that.
| staticassertion wrote:
| Oh yeah, I 100% do not believe this would be something
| that ships to the kernel :)
| gpm wrote:
| I don't really know much about erlang, but I think this may be
| along the lines of what you are thinking of:
| https://github.com/bastion-rs/bastion
|
| (I also don't really think the linux kernel people would be
| interested...)
| nahuel0x wrote:
| You want a microkernel.
| scrubs wrote:
| I don't see how one can run and hide from Linus' point.
| Exception/panic based work is problematic in libraries, and has
| no place in kernels. Erlang's OTP supervision trees is as
| others have pointed out, a runtime issue that app-devs build
| on. Therefore it's an abstraction that's above the kernel and
| out of scope w.r.t. to kernel work.
| Communitivity wrote:
| This is a fair and valid point. On reflection, what I am
| looking for is not something that would be in the kernel.
| It's more something that is an added layer on Rust, quite
| possibly with compiler support, to provide for OTP style
| services and supervisors.
| fortran77 wrote:
| Why not write an entire OS in a message-passing VM based
| language with garbage collection? No crashes, ever!
|
| In fact, you can run Erlang directly on raw metal, or write an
| OS in Erlang: http://www.erlang-
| factory.com/static/upload/media/1498583896...
| bitwize wrote:
| Because borrow checking > GC. When you have a GC, you need
| several times more memory to run the same program with the
| same performance as without -- and usually, you have to say
| goodbye to any sort of determinism in execution time, as
| well.
|
| What Rust brings to the table is _guaranteed_ memory safety
| _without_ GC, and all memory is released in strictly
| deterministic time.
|
| So for an OS, it's much better to bring the good bits of
| Erlang to Rust.
| weakfish wrote:
| If you don't mind a student rust enthusiast, I would love to
| take a crack at this.
|
| Email is john123allison AT gmail DOT com
| zozbot234 wrote:
| You may want to write a proper user story for this and submit
| it to https://blog.rust-lang.org/2021/04/14/async-vision-doc-
| shiny... (mentioned in the latest "This Week in Rust"
| development summary).
| fortran77 wrote:
| If it's based from OTP we should give it proper credit and call
| it OTP/OARS.
| asabil wrote:
| I would love to see the same thing, I always told people around
| me that Erlang/OTP is more akin to an OS than a traditional
| programming language. That being said, the key feature to
| enable what Erlang supports is asynchronous termination, which
| as far as I know is not possible in regular Rust.
| oconnor663 wrote:
| I think the current options for killing a generic task in
| Rust are either 1) make it an async task, which can be
| cancelled as long as it doesn't accidentally block a thread,
| or 2) make it a separate process, and have the OS kill it. Do
| either of those fit this use case?
| jb3689 wrote:
| I don't understand what you're getting from this. Crash looping
| can still happen in OTP, the root supervisor can still die if
| the crash threshold is met in a small window. This would also
| be very heavy weight. IIUC the issue is not that errors occur,
| but that errors occur (and panic) and cannot be handled
| [deleted]
| jerf wrote:
| That is a good idea, but one thing I would advise, having both
| seen several attempts made at this sort of thing and having
| made one myself [1], try very hard to separate the _accidental_
| things Erlang brings to the idea from the _fundamental_ things
| Erlang brings to the idea. Most attempts I 've seen made at
| this flounder on this pretty hard by trying to port too
| directly the exact Erlang supervisor tree idea while grinding
| hard against the rest of the language, rather than porting the
| core functionality in in a way that integrates natively with
| the language in question as much as possible.
|
| For instance, one thing I found when I was writing my library
| that will probably apply to most other languages (probably
| including Rust) is that Erlang has a somewhat complicated setup
| step for running a gen_server, with an explicit setup call, a
| separate execution call, several bits and pieces for
| 'officially' communicating with a gen_server, etc. But a lot of
| these things are for dealing with the exact ways that Erlang
| interacts with processes, and you probably don't need most of
| them. Simply asking for a process that makes the subprocess
| "start" from scratch is probably enough, and letting that
| process use existing communication mechanisms already in the
| language rather than trying to directly port the Erlang stuff.
| Similarly, I found no value in trying to provide direct ports
| of all the different types of gen_server, which aren't so much
| about the supervision trees (even if that's where they seem to
| be located) as a set of standard APIs for working with those
| various things. They're superfluous in a language that already
| has other solutions for those problems.
|
| In addition to keeping an eye out for features you don't need
| from Erlang, keep an eye out for features in the host language
| that may be useful; e.g., the most recent suture integrates
| with the Go ecosystem's ever-increasing use of context.Contexts
| as a way to manage termination, which hasn't got a clear Erlang
| equivalent. (Linking to processes has some overlapping
| functionality but isn't exactly the same, both offering some
| additional functionality contexts don't have as well as missing
| some functionality contexts do have.)
|
| Erlang has a lot of good ideas that I'd love to see ported into
| more languages. But a lot of attempts to do so flounder on
| these issues, creating libraries so foreign to the host
| language that they have zero chance of uptake.
|
| The other thing I'd point out is that even in Go, to say
| nothing of Rust, _crashing_ is actually fairly uncommon by
| Erlang standards. Many things that crash in Erlang are
| statically prevented at compile time in Go, and Rust statically
| precludes even more of them. However, I have found it OTP-esque
| supervision trees to be a very nice _organizational structure_
| to my code; I use suture in nearly every non-trivial Go program
| I write because it makes for a really nice modular approach for
| the question of "how do I start and stop persistent
| services?". I _have_ seen it hold together runtime services
| that would otherwise be failing, the way it is supposed to, and
| that 's nice, but the organization structure is still probably
| the larger benefit.
|
| (There is deep reason for the way Erlang is doing it the way it
| does, which is that a lot of Erlang's type system, or lack
| thereof, is for communicating between nodes, so even if you
| perfectly program Erlang, if two nodes running different
| versions of code try to communicate with each other and they've
| changed the protocol you might get a pattern matching fail on
| the messages flowing between versions. The Erlang way of doing
| cross-machine communication with this sort of automatic
| serialization at the language level has not caught on, and all
| modern languages have a relatively distinct serialization step
| where this sort of error is better handled, as you try to
| deserialize the remote message into your internal data
| structure.)
|
| Anyhow, the upshot is, you want to _translate_ the
| functionality out of Erlang into other languages, not
| _transliterate_ it.
|
| [1]: https://github.com/thejerf/suture
| phoe-krk wrote:
| The title is slightly misleading, and so is Linus' response here.
| This RFC never claimed to be in shape to be immediately mergeable
| into the mainline kernel as-is. Miguel (the author of the patch)
| has replied to this mail (and in other places in the thread) that
| the Rust alloc() is currently called only as a temporary measure
| to speed up development, and all panic() calls from allocation
| failures are just as temporary. This is all because the Rust code
| that hooks into the kernel memory allocation functions is not yet
| usable.
|
| The main point of this RFC is that "the [in-kernel Rust] support
| is good enough that prototyping modules can start today." There's
| no point in making long arguments about alpha-quality design
| shortcuts on an alpha-quality prototype that are also explicitly
| mentioned by the patch authors to be of alpha quality.
|
| See e.g. https://lkml.org/lkml/2021/4/14/1130 and
| https://lkml.org/lkml/2021/4/14/1023
|
| EDIT: Thanks for the child comments; it seems that Linus is
| simply not aware of all the specifics and is asking for more
| information and/or decided to look at the code before reading the
| full mail thread.
| caust1c wrote:
| I don't really see anything misleading in the post. Linus says
| that the comments/responses are from a position of ignorance
| and it seems like he's just seeking understanding.
|
| If anything is misleading it's linking to random emails in lkml
| without context. :-P (Maybe that's what you were getting at).
|
| Personally, I'm very excited about Rust in the kernel.
| Ceezy wrote:
| He doesn't mention that it's an alpha feature
| CodeWriter23 wrote:
| There are literally millions of things he _didn't_ say. He
| did however express a requirement for acceptance. That's
| the message, if you want this in the kernel, it has to
| never call panic() at run time. Why? Because kernel crashes
| are unacceptable for the types of deployment Linux is used
| for.
| UtherII wrote:
| I not sure he mean _never_ panic at runtime. I think
| there are good reason to panic at runtime if you care
| about safety a buffer overflow is a good reason for
| instance.
|
| But a memory allocation failure is clearly not a good
| reason.
| dnautics wrote:
| Out of memory allocation panic is absolutely not
| acceptable in a kernel, and even less so when you're
| linux (which does some sneaky memory overcommit things)
| CodeWriter23 wrote:
| I think the gist of what he is saying is typical
| application programming patterns like crash on exception,
| gc, etc. are not a good fit for kernel programming. And I
| agree. Handle the request or return an error and let
| higher level code handle the error processing.
| OskarS wrote:
| It would be hard for him to be clearer:
|
| > With the main point of Rust being safety, there is no
| way I will ever accept "panic dynamically" (whether due
| to out-of-memory or due to anything else - I also reacted
| to the "floating point use causes dynamic panics") as a
| feature in the Rust model.
| darthrupert wrote:
| > Personally, I'm very excited about Rust in the kernel.
|
| Why?
| gpm wrote:
| Speaking for myself and not the person you replied to
|
| - It would make me 10x more likely to work on the kernel,
| just because I enjoy programming in rust more than I enjoy
| programming in C. (At least given my current employment,
| any kernel contributions would be on my own time)
|
| - It would give me more faith in the security of other code
| people are contributing, things like binder in C strike me
| as pretty scary components of android from a security
| perspective, I'm much more comfortable running the same
| thing written in Rust.
|
| - I think it would generally reduce the number of kernel
| bugs in components written in it. Kernel bugs are rare, but
| and very frustrating when you encounter them.
|
| - I think it would increase the general productivity in
| kernel development. A better kernel helps everyone out.
|
| - It acts as validation for rust as a language. Obviously
| the kernel people shouldn't care about this in the
| slightest and it's not an argument for including it.
| However if it is included for other reasons (see above), it
| does help me argue that "rust would be a good fit for x" in
| other situations.
| caslon wrote:
| >- I think it would increase the general productivity in
| kernel development. A better kernel helps everyone out.
|
| Don't you think the massive increase in compilation time
| would negate any productivity gains and probably decrease
| productivity overall?
| gpm wrote:
| No, because
|
| - I don't think it will be massive.
|
| - I especially don't think it will be that large for
| incremental builds, which is the main thing that matters.
| (But I'm not involved in this project, so I don't
| actually know how incremental the builds are...)
|
| - My experience is that the vast majority of programming
| time is spent fixing mistakes, not compiling. Rust
| reduces the amount of time spent fixing mistakes a lot
| more than it increases time spent compiling.
|
| - Rust moves many errors early in the compilation process
| (instead of when you try and test your code), which
| reduces iteration time instead of increasing it.
|
| I'm not involved in this project, but I imagine it's at a
| point where you could get some numbers for the fixed
| overhead that adding rust adds to the compile times. I'd
| be interested in seeing those numbers.
| staticassertion wrote:
| Have you compiled the Linux kernel before? I doubt Rust
| will be the bottleneck, it's a massive project - a tiny
| fraction being in Rust will be a blip.
| caslon wrote:
| Pretty frequently, yes. It doesn't take that long on
| modern, moderately-powered devices. Even when I was using
| a mid-tier device from a decade ago, compilation time was
| still around half an hour, which was less than most Rust
| projects I've encountered are on modern and reasonably
| high-end hardware, despite doing much more and being much
| larger.
| staticassertion wrote:
| Yep, that sounds exactly right to me - about 30 minutes
| on older hardware. That's very odd to me that you have 30
| minute rust build times - as someone who works on a rust
| project professionally, with 10KLOC, that isn't my
| experience at all. If Rust gets into the kernel I would
| expect it to account for <1% of the code, so even if it
| were 100x slower to compile, which it isn't, I don't see
| it having an impact.
| bluecalm wrote:
| Man, 10KLOC is a very small project. Obviously it
| compiles quickly. Linux kernel is almost 30 million lines
| of code.
| staticassertion wrote:
| Yes... exactly. The Linux Kernel is 30 million lines of
| code - so how exactly will some minimiscule-by-comparison
| amount of Rust code slow down compile times considerably?
| bluecalm wrote:
| If it's always going to be miniscule then why bother? If
| it has potential to grow to something not miniscule then
| it's important to it compiles quickly.
| himujjal wrote:
| I wrote 10kLOC last week. Most of the code didn't require
| much extra thinking, it was mostly a translation of an
| old TypeScript project to Zig. But measuring compile
| times with 10kLOC is really not a good argument.
| staticassertion wrote:
| Well, 15KLOC, and of course not including dependencies.
| But my point was that there will be a tiny, tiny amount
| of Rust in the kernel by comparison to the 10s of
| millions of lines of C code. Rust would have to compile
| _radically_ slower than C, like hundreds or thousands of
| times slower, in order to be a limiting factor.
| steveklabnik wrote:
| "Productivity" is notoriously hard to measure in
| software.
|
| While slow compile times may slow you down, and hence
| reduce your productivity, if the compiler prevents hard
| to fix bugs, it still may _increase_ your overall
| productivity. Consider things like
| https://hacks.mozilla.org/2021/04/eliminating-data-races-
| in-...
|
| > Overall Rust appears to be fulfilling one of its
| original design goals: allowing us to write more
| concurrent code safely. Both WebRender and Stylo are very
| large and pervasively multi-threaded, but have had
| minimal threading issues. What issues we did find were
| mistakes in the implementations of low-level and
| explicitly unsafe multithreading abstractions -- and
| those mistakes were simple to fix.
|
| >
|
| > This is in contrast to many of our C++ races, which
| often involved things being randomly accessed on
| different threads with unclear semantics, necessitating
| non-trivial refactorings of the code.
|
| Maybe the C++ was faster to compile, but in the end,
| fixing these issues took more time. There were more of
| them, and they were harder to track down.
|
| Nobody truly knows the answers to these questions in the
| general case yet, of course. My point is just that "slow
| compile == bad productivity" is not _inherently_ true.
|
| Faster compile times are, of course, always desired no
| matter what.
| diragon wrote:
| > "Productivity" is notoriously hard to measure in
| software.
|
| We can look at history and results. In these terms, C
| (and perhaps C++ to some extent) is, I believe, the only
| productive programming language for making the low level
| parts of non-experimental kernels.
|
| As far as it can be publically measured, Rust so far has
| proven itself for application programming in a certain
| niche and not much more. It would be cool if it could
| prove itself in kernel space too -- we certainly need
| less system crashes caused by bad kernel-level code.
| Curiously though, it has been a very long time since I've
| last bumped into such a thing in Linux. This makes me
| suspect that Rust is trying to fix a problem here that's
| already been fixed in another way.
|
| As for ease of coding, C seems like a massively easier
| language to learn than Rust. But I might be wrong there.
| Any data about that, I wonder?
| tene wrote:
| I've got some vague opinions about "easier to learn" that
| I'd like to hear some disagreement on, to help me work
| out my thoughts more. Please forgive me if I'm not very
| clear here.
|
| I don't know if this is what you mean or not, but I've
| seen a lot of claims that a language is "easier to learn"
| that seem to be considering "learning a language" as a
| valuable topic on its own, separate from "learning to
| write and maintain correct nontrivial programs in the
| language", and that seems wrong to me.
|
| There's a part of this idea that does seem valuable to
| me, in that at the beginning of your learning process,
| there are a lot of benefits from being able to quickly
| get to a point where you can successfully write small
| programs that do something. It helps your motivation. It
| helps you reach some amount of productivity faster. When
| a language is better at early onboarding, it's more-
| useful to people who have smaller needs and more-
| constrained use-cases. Python being so easy to learn to
| glue together some libraries makes it a fantastic,
| valuable tool for many people.
|
| The part of this idea that I really disagree with is how
| it applies to non-trivial, non-beginner use-cases. There
| are topics and skills that languages vary in their
| coverage of, but that you still need to learn about and
| deal with anyway for many types of programs. Memory
| management, resource handling, ownership and sharing,
| concurrency, nullability, error handling, composition,
| organization, abstraction, refactoring, testing,
| debugging, etc. A language including more or less that
| directly addresses these topics doesn't necessarily mean
| you won't still need to learn them.
|
| To me, the relevant question isn't "Which language is
| easier to learn in isolation?", but instead "Which
| language is easier to learn to implement safe,
| performant, efficient, reliable, concurrent code with?".
|
| If you take a new engineer who has "learned C", how easy
| is it to train them to get their rate of memory safety
| errors, thread safety errors, missed error checking, etc.
| down to the same rate as you'd get from a new engineer
| who has "learned Rust"?
|
| Without tooling support like you get from Rust, you
| instead need to learn safe idioms, learn strategies to
| minimize your exposure to errors, train yourself to
| always always check everything at all times, learn how to
| write tests to discover mistakes you've made, learn how
| to use a collection of third-party tools you can use to
| approximate some of the benefits of Rust's compile-time
| checking, and train yourself to always use it. That's not
| "learning C", but it's still required in order to
| implement something like Linux.
|
| Rust's bet is that there are ways to reduce the overall
| complexity of everything involved in implementing high-
| reliability high-performance systems by moving some of
| that complexity into the language. If you don't think
| it's accomplishing that goal, that's fine, but make that
| case directly.
|
| I agree that the C programming language is smaller and
| easier to learn in isolation. It's not so obvious to me
| that something like "C + Valgrind + ASan + TSan + UBSan +
| ..." is easier to learn than Rust.
|
| On the other hand, for many classes of errors that C
| offers no help with, Rust's compiler will directly point
| out where you've made a mistake, why it's wrong, and
| often offers advice on how to fix it. When learning a new
| language, having that kind of tooling support universally
| available is extremely helpful.
|
| To be clear, Rust doesn't handle everything, and there's
| still a lot of benefit you can get from dynamic analysis
| tools, fuzzing, etc. There are also levels of reliability
| and assurance that aren't currently feasible with Rust.
| Rust has a long way to go.
|
| The point I think I'm trying to make is that Rust really
| raises the bar in a meaningful way. There's some nonsense
| and awkward bits in Rust, but a lot of what you need to
| learn to be effective with Rust are things that you'd
| need to learn anyway to be effective at this level with
| C, and I think it's easier to learn those with Rust's
| help, and it's significantly easier to build systems with
| a much lower rate of these problems by using Rust.
|
| Sorry for the length, and lack of organization. This has
| been rattling around in my head for a while, and I wanted
| to get some thoughts out in writing.
| steveklabnik wrote:
| Yes, that is true. But the only way to get that data is
| to do it. This work is one part of doing that. Someone
| has to be first :) (There are of course a ton of kernel-
| level things in Rust that don't pass the "non-
| experimental" bar for various people. As always, depends
| on exactly what you mean.)
|
| > Any data about that, I wonder?
|
| Possibly one of the only things harder to measure than
| productivity is ease of learning, haha! I had programmed
| in C for decades before Rust even existed. We do have a
| lot of people who say that they think Rust was easier to
| learn for them than C was. And of course many who believe
| the opposite. Not sure anything is conclusive in any
| direction. For example, it's quite possible that some
| people find C easier, and some people find Rust easier,
| and there will never be a clear winner.
|
| Time will tell.
| alfiedotwtf wrote:
| > It would make me 10x more likely to work on the kernel,
| just because I enjoy programming in rust more than I
| enjoy programming in C
|
| That sound was a thousand people nodding in unison
| guenthert wrote:
| And it would be a million people if it were Python. That
| doesn't make it a good idea to encourage submissions to
| the kernel in Python. There is much more to kernel
| programming than the programming language. C might be
| inconvenient and lacking expressive power, but if people
| can't handle pointers and goto, what mess are they going
| to make with different address spaces and interrupts?
| tialaramex wrote:
| > C might be inconvenient and lacking expressive power,
| but if people can't handle pointers and goto, what mess
| are they going to make with different address spaces and
| interrupts?
|
| I have _terrible_ news for you. People (well, humans, and
| I don 't see anybody else signing up to maintain Linux)
| cannot in fact correctly handle pointers and goto. That's
| why they keep making mistakes.
|
| It's actually to be hoped that Rust can usefully express
| constraints it has today to prevent some of those
| problems onto things like address spaces. It'd be great
| if say, driver code which can confuse a virtual address
| with a physical one just _won 't compile_ rather than
| compiling and then mysteriously not working as expected
| or causing occasionally BUG() reports.
| utxaa wrote:
| what about interacting with the existing kernel code
| base? would data coming back from the kernel into rust
| space need to be wrapped to provide safety guarantees? or
| would it be necessary to turn safety features off?
|
| a bit confused about this.
| gpm wrote:
| So... it doesn't really impact this discussion other than
| "shouldn't be an issue". I'll try and give a summary of
| what's happening technically, but HN is frankly the wrong
| form for a "how to use the C ffi in rust" tutorial. Also
| a big disclaimer that I don't know what precisely this
| project is doing, so I'm talking about C/Rust projects in
| general.
|
| Rust talks the C ffi really well. It can call external
| functions that follow the C abi the same way it can call
| native unsafe functions [1]. You can tell it to layout a
| struct the same way that C does, etc.
|
| Because calling the C abi requires unsafe code, it's
| common to provide wrappers around the C abi that are safe
| against missuse. I.e. that make it so that the only way
| to call C functions is the correct way. This is doing
| things like making it so the only way to get a `struct`
| is to call a (safe) `new` function that calls the
| (unsafe) C initializer internally, and exposing the C
| "methods" on that struct (that expect it to be
| initialized) as safe methods on the rust struct that
| internally call the unsafe C functions (and they can do
| so because they know the struct has been initialized).
| Obviously for any particular C api you have to look at
| what it requires to be called safely, and then figure out
| how to encode it in the type system, but that's usually
| surprisingly easy.
|
| Calling rust from C doesn't really require any "unsafe"
| code (other than the fact that C is basically a giant
| unsafe block by nature), because the assertion that
| you're calling it correctly happens on the C side of
| things, not the rust side of things. Just like rust can
| call C abi functions, rust can make it's functions follow
| the C abi by simply saying extern "C" fn
| foo()
|
| instead of fn foo()
|
| But many of the data structures you might pass from C to
| rust will need a wrapper to use "safely". E.g. if I pass
| a doubly linked list, it's going to need raw pointers
| more or less by nature (at least if rust wants to be able
| to mutate it), and someone is going to need to do a
| similar wrapping thing where they expose some functions
| that correctly work with the list, that internally use
| unsafe, but expose a safe api.
|
| [1] So what unsafe here means that the compiler doesn't
| know that the function is safe to call, so you have to
| tell it "I checked and how I'm using it is fine" by
| putting the call inside an unsafe block. This looks like
| the following. Note that you can also have unsafe native
| rust functions (e.g. if you want to index an array
| without checking the array bounds that's an unsafe
| function implemented in rust) unsafe {
| c_function_here(arg1, arg2) }
| utxaa wrote:
| thank you for this. this helps.
|
| but let's say one is writing a filesystem in rust, so
| you're implementing most of the functions in "struct
| file_operations", and moreover you are passing "struct
| inode" , "struct page" etc ... back and forth between c
| and rust. with such heavy handed interaction, aren't we
| basically doing c in rust by necessity of the interface?
| by which i mean "unsafe" the way you defined it?
|
| are there examples where you see a clear win?
| cies wrote:
| Networking, especially wireless (as it's more complex and
| potentially more dangerous: attacker needs not even a
| wire).
|
| Google is developing a bluetooth stack in Rust.
|
| [1] https://blog.desdelinux.net/en/google-desarrolla-una-
| nueva-p...
| gpm wrote:
| You'll have to excuse a bit of unfamiliarity with linux
| internals here, I'm taking a guess, but I expect that
| filesystems are an example where you would see a clear
| win.
|
| My assumption would be that a file system is calling the
| same methods on a few different objects repeatedly. E.g.
| "read me some bytes from this page" or "get the id of
| this inode". For each of these APIs you _once_ write a
| small amount of unsafe code that encodes into the type
| system "and this is how you can call it safely", and
| then you repeatedly get to make use of that code with
| guarantees that you aren't making any mistakes that are
| too terrible (logic bugs still exist obviously, which on
| a file system could delete or corrupt files, but you
| aren't going to corrupt some random kernel memory by
| accident). That's a pretty big win in my mind.
|
| Meanwhile file systems probably include a lot of non-ffi
| things I think rust is substantially better for too. Like
| handling of a ton of different error's (oh no, the disk
| failed to give me bytes. Oh no, these bytes make no
| sense. etc) in the codes "happy"(ish) path. And like
| parsing data structures out of bytes (correctly).
| Tracking exclusive access to various resources.
| Implementing compression algorithms. Etc.
|
| The case where you would see the sort of issue you're
| discussing is where all the code is doing basically
| unique ffi calls, so you don't get any reuse out of safe
| abstractions. I don't know of any great examples of this,
| maybe things like boot sequence code where you're running
| a lot of unique things exactly once to initialize the
| hardware?
| utxaa wrote:
| thanks gpm for taking the time. let's see how it pans
| out. rust is definitely interesting.
|
| now let me not impose on your kindness further and go
| learn a little rust.
| steveklabnik wrote:
| That is what this patch series is about. It will require
| using unsafe code to some degree, yes.
| darkwater wrote:
| But looks like Linus doesn't know the specifics, he is asking
| for more info while at the same time making already clear
| enough that if his concern cannot be addressed than there is no
| point working on the Rust integration until it is fixed.
| karmakaze wrote:
| Anyone thinking that Rust is ready to be used in the kernel,
| just read the post first. It is very short. Then read these
| comments to look for what you need answered.
|
| For me, the main point is Can Rust be written to guarantee that
| no oom (or other hard fail 128-bit math) panic occurs that is
| not under control of the written code? I want the answer to be
| yes and also want to see how (which isn't far off from what
| Linus is asking).
| steveklabnik wrote:
| You got two good answers, but also, there's also now some
| pressure to make these scenarios better (say, "compile error
| if you use a i128" rather than "just don't use an i128"),
| which is nice. I'll be glad on the kernel's needs putting
| some pressure on Rust to improve.
| gpm wrote:
| > Can Rust be written to guarantee that no oom
|
| Yes, easily, OOM is entirely a library created concept in
| rust, just don't use the standard library (or the `alloc`
| susbset of the standard library) and you don't have OOMs...
|
| > (or other hard fail 128-bit math)
|
| 128 bit math, and floats, need to be avoided by just "not
| using them", the same as for floats in C code in the
| kernel...
| mywittyname wrote:
| What's the issues with floats? Everyone is casually talking
| about this issue like it's common knowledge. Online
| searches of floating point panics doesn't really bring up
| any information beside bug reports.
|
| Is it something about floating point operations not
| actually being fixed-sized?
| tialaramex wrote:
| In a modern pre-emptive multitasking operating system,
| the kernel needs to be able to stop what your user task
| was doing, do its own thing for a while or even run a
| different task entirely - and then put everything back
| apparently as it was and allow your task to carry on. The
| CPU affords this capability by providing a way to bottle
| up all its internal state, store that somewhere, and then
| put it back later.
|
| Operating system kernels don't generally need floating
| point math (some of them use it anyway, lots don't).
|
| So _if_ your CPU has a way to say "Bottle your state,
| but er, don't worry about the floating point stuff" and
| it's faster, or uses less memory, or both, which are
| common, all the kernels (like Linux) which do not use
| floating point know they aren't touching that anyway and
| needn't bottle it up. This is potentially an important
| performance win.
|
| If you try to take this performance win, but then you
| actually do use floating point in the kernel, the world
| suddenly changes beneath the feet of user tasks. A
| program is adding up some floating point numbers, and
| then, huh, suddenly the total is now negative? Wait, now
| it's zero? Nope, negative again? What's happening! If the
| program was aware of being interrupted this makes sense,
| but the whole point of pre-emptive multi-tasking is not
| to need to custom design every program to be interrupted
| everywhere.
|
| So Linux (mostly) never uses floating point.
| crocarneiro wrote:
| I would like to know about this too. Did you find any
| interesting article?
| nickez wrote:
| Yes, don't use the std lib (or more specifically the alloc
| crate) and don't use types like floats and u128.
| dncornholio wrote:
| I get great value in a lot of comments from Linus. He can explain
| himself really well, even be an asshole at times, but I always
| feel he makes himself very understandable in a way that doesn't
| make me think he actually is one.
|
| He seems to have learned a lot in how to respond though, because
| this one is pretty tame :)
| davidhyde wrote:
| In the embedded Rust world it is common to explicitly define a
| custom global allocator and panic handler. The global allocator
| is even optional although you then won't be able to use heap
| allocated structs like String and Vec. Therefore the Rust
| compiler does not force you to use the built in allocator or
| panic handler and you are free to implement them how you choose.
|
| See for alloc: https://docs.rust-
| embedded.org/book/collections/index.html See for panicking:
| https://docs.rust-embedded.org/book/start/panicking.html
| finnthehuman wrote:
| I don't have a problem with the idea of panic, but the way it's
| used seems like a wart. In a language with a result type why
| there are there any cases where the system isn't too broken to
| return a result, but default to panicking anyway?
|
| Thinking about it from the embedded side, my panic is going to
| tell the failsafe circuitry that the processor can't be
| trusted. That's too aggressive of an action for any case where
| we still believe that the processor can return control to a
| caller that could have a more-graceful error path.
| steveklabnik wrote:
| This code is using that allocator, that's part of the issue. He
| wants there to be no panics, not to have the panics handled in
| a special way.
| davidhyde wrote:
| I see, so anything that can possibly fail should return a
| result instead of "maybe" panicking as a side effect? So no
| panics at all in the language? That may adversely affect the
| ergonomics of the language for user space applications.
| Perhaps it would be better to build a completely fallible
| version of the standard library and use linting to enforce no
| panics.
| steveklabnik wrote:
| At the very least, no panics for allocations. Methods that
| return Result exist, the issue is that while they do, the
| panic-ing ones also exist. There are plans to address this,
| they just haven't been implemented yet.
| davidhyde wrote:
| Thanks, interesting to hear!
| jononor wrote:
| Is a deny(panics) lint-rule feasible, now or in the short
| term? Something that would make a project fail to compile if
| anything use in it could panic?
| steveklabnik wrote:
| I don't know enough about the specifics to really say. I do
| know there's interest.
| scoutt wrote:
| As a C developer for embedded, considering Rust for a long time
| now, the _panic_ thing is something that bothers me. I don 't
| want/I can't panic. I want to be returned _false_ , _null_ or
| whatever.
|
| Do I have to check if an external crate that I am using would
| _panic_? If so, how do I prevent the crate from panicking?
|
| From my perspective, Rust is kind-of designed to support code
| like this (non real Rust code follows):
| my_struct.do_something() .get_this()
| .get_that() .as_ref()
| .as_paper_airplane() .unwrap()
|
| What if some of those calls fail? How do I detect an error?
| Alright, it might panic, but what if I have to keep going
| forward, even in case of an error, in my embedded application?
| Should I split the different function calls and check for
| errors?
|
| Kernel development intersects with embedded development in many
| points. I'm sure I am not the only one with these doubts.
| whb07 wrote:
| It would really panic if it ran into an unhandled condition
| that you explicitly sent it to or left as is. What I mean is
| pretend you have some external reading/data coming in and you
| parse it to have the field "name" to always have a value of
| "scoutt".
|
| match data.name { "scoutt" => return True, _ => panic!(),
|
| }
|
| In that case, because you "know" that the name field cannot
| be anything but your specific value ,you don't really mind
| the matching pattern and Rust is satisfied. But then you get
| a runtime panic! because all of the sudden the value changed
| to "joe".
|
| Long story short is as long as you handle and declare ahead
| of time the proper fail conditions and paths to take at
| compile, you really shouldn't come across a panic.
| loeg wrote:
| Calls that fail return `Result<OkType, ErrorType>`. Here's
| something like what `unwrap()` does (actual implementation
| may have some nuances I didn't capture here):
| impl<OkType, ErrorType> Result<OkType, ErrorType> {
| fn unwrap(self) -> OkType { if Ok(x) = self {
| return x; } panic!("not ok");
| } }
|
| So, if you don't accept panic, don't call `unwrap()` unless
| you're very sure that the returned value couldn't fail in the
| way you're using the API. You could think of `unwrap()` in
| Rust as semantically similar to `assert(error == 0)` in C.
| Avoid or use in the same places you would avoid or use that
| assertion in C.
|
| That means you must handle the Err case, often by just
| returning it to the caller (like C). The ? operator is syntax
| sugar for this: return if err, or give me the ok value if
| non-err. fn my_fn() -> Result<(), Error> {
| // if this failed, returns some Err() let my_x =
| my_struct.do_something_that_can_fail()?; // ditto
| let my_y = my_x.something_else_that_can_fail()?;
| // no failures? Ok Ok(my_y) }
|
| Of course, these can be chained with the exact same
| characteristics: fn my_fn() -> Result<(),
| Error> { // Equivalent
| Ok(my_struct.do_something_that_can_fail()?
| .something_else_that_can_fail()?) }
| scoutt wrote:
| Thank you! Yes, the last 2 examples are much clearer to my
| eyes. And it's on the tutorial too:
| File::open("hello.txt")?.read_to_string(&mut s)?;
|
| I should pay more attention :)
|
| But I also deduce 2 things:
|
| - That _unwrap_ should not be used in the kernel, correct?
|
| - That I should be very careful about the external
| libraries I use and look for the presence of _panic!_ and
| /or _unwrap_ , or use https://docs.rs/no-
| panic/0.1.13/no_panic/ as _jononor_ commented here below.
| But I guess this might rule-out some useful libraries from
| a project.
| gpm wrote:
| > - That unwrap should not be used in the kernel,
| correct?
|
| Yes.
|
| I guess the exception is if it will only panic on a
| kernel bug and you're ok with that (that's what the bug
| macro is for, no)? E.g. if I have `let x = Some(2); let y
| = x.unwrap();`. The second statement panics if x is None,
| but that would only happen in the event of a bug, so you
| might be ok with it.
|
| This does happen sometimes in real code, `if
| some_list.len() > 2 { let x = some_list.pop().unwrap() }`
| will never panic, because pop() only returns None (think
| null) if the list is empty, but we just checked it has at
| least two elements. I'm not sure what the kernels stance
| on this sort of code will be.
|
| > - That I should be very careful about the external
| libraries I use and look for the presence of panic!
| and/or unwrap,
|
| In the context of the kernel, I imagine you just don't
| use external libraries at all. In a more general context,
| yes, you need to understand when any libraries you use
| might panic. The general convention that is honestly not
| very well followed is to document panics that result from
| usage errors, and to allow for panics caused by library
| bugs.
| loeg wrote:
| No problem!
|
| > That unwrap should not be used in the kernel, correct?
|
| More or less. There could be situations where you know
| the error is impossible and it is reasonable to unwrap.
| But it's a good rule of thumb.
|
| > That I should be very careful about the external
| libraries I use and look for the presence of panic!
| and/or unwrap, or use https://docs.rs/no-
| panic/0.1.13/no_panic/ as jononor commented here below.
| But I guess this might rule-out some useful libraries
| from a project.
|
| Absolutely! Use of unsafe is another potential watch-
| point.
| jononor wrote:
| Maybe something like this? https://docs.rs/no-
| panic/0.1.13/no_panic/
| davidhyde wrote:
| If do_something(), get_this(), get_that() and
| as_paper_airplane() all returned a Result then you could
| write your code as follows:
| my_struct.do_something()? .get_this()?
| .get_that()? .as_ref() .as_paper_airplane()?;
|
| If the error type of the result differs you can use map_err
| or write a converter by implementing the From trait for your
| the custom error struct you want to map to. The one thing you
| can't find out is if the calling function will panic in its
| function or a function that it calls. When you write embedded
| libraries you are never supposed to panic but there is
| nothing to enforce this which is, admittedly, not ideal.
| iknowstuff wrote:
| https://doc.rust-lang.org/std/result/
| angry_octet wrote:
| It seems a serious shortcoming if you can't allocate String or
| Vec using a custom allocator? Presumably you can pass an arena
| parameter to create the custom allocator?
|
| There's plenty of room between <unlimited heap> and
| <preallocated>, I don't really understand why, e.g., you
| couldn't have a pool for a particular usage of String, or fixed
| size objects?
|
| Also, introspection to ask which pool an object is allocated
| on, how much space is available (units for fixed size, largest
| hole for variable size)?
|
| Is it just a syntax choice or is there a rusty reason not to
| have a thread-local list of allocators?
| gpm wrote:
| There's nothing special about String other than it's in the
| standard library. It doesn't even have special cased syntax
| (&str does have special syntax, it doesn't allocate).
|
| String does need to know how to deallocate itself, so the
| compiler has to know what allocated it. It can be moved
| between threads, so a thread local allocator doesn't work.
|
| If you want a String type which is allocated differently, or
| which stores a pointer to its deallocate, you can easily
| implement that as a new type by hand.
|
| (The standard library types might some day become generic
| over allocator, which would allow statically using a
| different allocator with the same api. The compiler would
| then force you to keep track of which allocator your string
| was allocated with).
| steveklabnik wrote:
| It's just time; we have stabilized the interface for the
| single global allocator, but the more general allocation API
| is still being worked on.
|
| You _can_ do all of these things for your own data
| structures, but the ones in liballoc shipped without that
| support, because we had to get Rust 1.0 out the door. Now
| support is being retrofitted, and that 's part of why doing
| so was okay before; the plan to do so in a reasonable way
| existed at stabilization time.
___________________________________________________________________
(page generated 2021-04-16 22:02 UTC)