[HN Gopher] The real realtime preemption end game
       ___________________________________________________________________
        
       The real realtime preemption end game
        
       Author : chmaynard
       Score  : 446 points
       Date   : 2023-11-16 14:47 UTC (1 days ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | andy_ppp wrote:
       | What do other realtime OS kernels do when printing from various
       | places? It almost seems like this should be done in hardware
       | because it's such a difficult problem to not lose messages but
       | also have them on a different OS thread in most cases.
        
         | EdSchouten wrote:
         | Another option is simply to print less, but expose more events
         | in the form of counters.
         | 
         | Unfortunately, within a kernel that's as big as Linux, that
         | would leave you with many, many, many counters. All of which
         | need to be exported and monitored somehow.
        
           | taeric wrote:
           | This seems to imply you would have more counters than
           | messages? Why would that be?
           | 
           | That is, I would expect moving to counters to be less
           | information, period. That not the case?
        
             | nraynaud wrote:
             | My guess is that each counter would need to have a
             | discovery point, a regular update mechanism and a
             | documentation, while you can send obscure messages willy-
             | nilly in the log? And also they become an Application
             | Interface with a life cycle while (hopefully) not too many
             | people will go parse the log as an API.
        
               | taeric wrote:
               | I think that makes sense, though I would still expect
               | counters to be more dense than logs. I'm definitely
               | interested in any case studies on this.
        
         | ajross wrote:
         | It's just hard, and there's no single answer.
         | 
         | In Zephyr, we have a synchronous printk() too, as for low-level
         | debugging and platform bringup that's usually _desirable_ (i.e.
         | I 'd like to see the dump from just before the panic please!).
         | 
         | For production logging use, though, there is a fancier log
         | system[1] designed around latency boundaries that essentially
         | logs a minimally processed stream to a buffer than then gets
         | flushed from a low priority thread. And this works, and avoids
         | the kinds of problems detailed in the linked article. But it's
         | fiddly to configure, expensive in an RTOS environment (you need
         | RAM for that thread stack and the buffer), depends on having a
         | I/O backend that is itself async/low-latency, and has the
         | mentioned misfeature where when things blow up, it's usually
         | failed to flush the information you need out of its buffer.
         | 
         | [1] Somewhat but not completely orthogonal with printk. Both
         | can be implemented in terms of each others, mostly. Sometimes.
        
           | vlovich123 wrote:
           | What if the lower priority thread is starved and the buffer
           | is full? Do you start dropping messages? Or overwrite the
           | oldest ones and skip messages?
        
             | ajross wrote:
             | It drops messages. That's almost always the desired
             | behavior: you never want your logging system to be doing
             | work when the system is productively tasked with other
             | things.
             | 
             | I know there was some level of argument about whether it's
             | best to overwrite older content (ring-buffer-style,
             | probably keeps the most important stuff) or drop messages
             | at input time (faster, probably fewer messages dropped
             | overall). But logging isn't my area of expertise and I
             | forget the details.
             | 
             | But again, the general point being that this is a
             | complicated problem with tradeoffs, where most developers
             | up the stack tend to think of it as a fixed facility that
             | shouldn't ever fail or require developer bandwidth. And
             | it's not, it's hard.
        
         | xenadu02 wrote:
         | In many problem spaces you can optimize for the common success
         | and failure paths if you accept certain losses on long-tail
         | failure scenarios.
         | 
         | A common logging strategy is to use a ring buffer with a
         | separate isolated process reading from the ring. The vast
         | majority of the time the ring buffer handles temporary
         | disruptions (eg slow disk I/O to write messages to disk) but in
         | the rare failure scenarios you simply overwrite events in the
         | buffer and increment an atomic overwritten event counter.
         | Events do not get silently dropped but you prioritize forward
         | progress at the cost of data loss in rare scenarios.
         | 
         | Microkernels and pushing everything to userspace just moves the
         | tradeoffs around. If your driver is in userspace and blocks
         | writing a log message because the log daemon is blocked or the
         | I/O device it is writing the log to is overloaded it does the
         | same thing. Your realtime thread won't get what it needs from
         | the driver within your time limit.
         | 
         | It all comes down to CAP theorem stuff. If you always want the
         | kernel (or any other software) to be able to make forward
         | progress within specific time limits then you must be willing
         | to tolerate some data loss in failure scenarios. How much and
         | how often it happens depends on specific design factors, memory
         | usage, etc.
        
       | TeeMassive wrote:
       | It's kind of crazy that a feature necessitated 20 years of active
       | development to be somewhat called complete.
       | 
       | I hope it will be ready soon. I'm working in a project that has
       | strict serial communication requirements and it has caused us a
       | lot of headaches.
        
         | eisbaw wrote:
         | Zephyr RTOS.
        
         | worthless-trash wrote:
         | Can you expand on this, as I'm a little naive in this area, say
         | you isolated the cpus (isolcpus parameter) and then taskset
         | your task onto the isolated cpu, would not the scheduler no
         | longer be involved, and your task be the only thing serviced by
         | that CPU ?
         | 
         | Is it other interrupts on the CPU that break your process out
         | of the "real time" requirement, I find this all so interesting.
        
           | TeeMassive wrote:
           | It's an embedded system with two logical cores with at least
           | 4 other critical processes running. Doing that will only
           | displace the problem.
        
             | worthless-trash wrote:
             | I (incorrectly) assumed that serial port control was the
             | highly sensitive time problem that was being dealt with
             | here.
        
       | loeg wrote:
       | Synchronous logging strikes again! We ran into this some at work
       | with GLOG (Google's logging library), which can, e.g., block on
       | disk IO if stdout is a file or whatever. GLOG was like, 90-99% of
       | culprits when our service stalled for over 100ms.
        
         | cduzz wrote:
         | I have discussions with cow-orkers around logging;
         | 
         | "We have Best-Effort and Guaranteed-Delivery APIs"
         | 
         | "I want Guaranteed Delivery!!!"
         | 
         | "If the GD logging interface is offline or slow, you'll take
         | downtime; is that okay?"
         | 
         | "NO NO Must not take downtime!"
         | 
         | "If you need it logged, and can't log it, what do you do?"
         | 
         | These days I just point to the CAP theorem and suggest that
         | logging is the same as any other distributed system. Because
         | there's a wikipedia article with a triangle and the word
         | "theorem" people seem to accept that.
         | 
         | [edit: added "GD" to clarify that I was referring to the
         | guaranteed delivery logging api, not the best effort logging
         | API]
        
           | loeg wrote:
           | I read GD as "god damn," which also seems to fit.
        
             | rezonant wrote:
             | aw you beat me to it
        
           | msm_ wrote:
           | Interesting, I'd think logging is one of the clearest
           | situations when you want best effort. Logging is, almost by
           | definition, not the "core" of your application, so failure to
           | log properly should not prevent the core of the program from
           | working. Killing the whole program because logging server is
           | clearly throwing the baby out with the bathwater.
           | 
           | What people probably mean is "logging is important, let's
           | avoid losing log messages if possible", which is what "best"
           | in "best effort" stands for. For example it's often a good
           | idea to have a local log queue, to avoid data loss in case of
           | a temporary log server downtime.
        
             | cduzz wrote:
             | People use logging (appropriately or inappropriately; not
             | my bucket of monkeys) for a variety of things including
             | audit and billing records, which are likely a good case for
             | a guaranteed delivery API.
             | 
             | People often don't think precisely about what they say or
             | want, and also often don't think through corner cases such
             | as "what if XYZ breaks or gets slow?"
             | 
             | And don't get me started on "log" messages that are 300mb
             | events. Per log. Sigh.
        
             | insanitybit wrote:
             | If you lose logs when your service crashes you're losing
             | logs at the time they are most important.
        
               | tux1968 wrote:
               | That's unavoidable if the logging service is down when
               | your server crashes.
               | 
               | Having a local queue doesn't mean logging to the service
               | is delayed, it can be sent immediately. All the local
               | queue does is give you some resiliency, by being able to
               | retry if the first logging attempt fails.
        
               | insanitybit wrote:
               | If your logging service is down all bets are off. But by
               | buffering logs you're now accepting that problems _not_
               | related to the logging service will also cause you to
               | drop logs - as I mentioned, your service crashing, or
               | being OOM 'd, would be one example.
        
               | tux1968 wrote:
               | What's more likely? An intermittent network issue, the
               | logging service being momentarily down, or a local crash
               | that only affects your buffering queue?
               | 
               | If an OOM happens, all bets are off anyway, since it has
               | as much likelihood of taking out your application as it
               | does your buffering code. The local buffering code might
               | very well be part of the application in the first place,
               | so the fate of the buffering code is the same as the
               | application anyway.
               | 
               | It seems you're trying very hard to contrive a situation
               | where doing nothing is better than taking reasonable
               | steps to counter occasional network hiccups.
        
               | insanitybit wrote:
               | > It seems you're trying very hard to contrive a
               | situation where doing nothing is better than taking
               | reasonable steps to counter occasional network hiccups.
               | 
               | I think you've completely misunderstood me then. I
               | haven't taken a stance at all on what should be done. I'm
               | only trying to agree with the grandparent poster about
               | logging ultimately reflecting CAP Theorem.
        
               | andreasmetsala wrote:
               | No, you're losing client logs when your logging service
               | crashes. Your logging service should probably not be
               | logging through calls to itself.
        
               | tremon wrote:
               | But if your service has downtime because the logs could
               | not be written, that seems strictly inferior. As someone
               | else wrote upthread, you only want guaranteed delivery
               | for logs if they're required under a strict audit regime
               | and the cost of noncompliance is higher than the cost of
               | a service outage.
        
               | insanitybit wrote:
               | FWIW I agree, I'm just trying to be clear that you are
               | choosing one or the other, as the grandparent was
               | stating.
        
             | linuxdude314 wrote:
             | It's not the core of the application, but it can be the
             | core of the business.
             | 
             | For companies that sell API access logs in one form or
             | another are how bills are reconciled and usage metered.
        
             | wolverine876 wrote:
             | Logging can be essential to security (to auditing). It's
             | your record of what happened. If an attacker can cause
             | logging to fail, they can cover their tracks more easily.
        
               | deathanatos wrote:
               | To me audit logs aren't "logs" (in the normal sense),
               | despite the name. They tend to have different
               | requirements; e.g., in my industry, they must be
               | retained, by law, and for far longer than our normal
               | logs.
               | 
               | To me, those different requirements imply that they
               | _should_ be treated differently by the code, probably
               | even under distinct flows: synchronously, and ideally to
               | somewhere that I can later compress like hell and store
               | in some very cheap long term storage.
               | 
               | Whereas the debug logs that I use for debugging? Rotate
               | out after 30 to 90d, ... and yeah, best effort is fine.
               | 
               | (The audit logs might also end up in one's normal logs
               | too, for convenience.)
        
               | wolverine876 wrote:
               | While I generally agree, I'll add that the debug logs can
               | be useful in security incidents.
        
             | fnordpiglet wrote:
             | It depends.
             | 
             | Some systems the logs are journaled records for the
             | business or are discoverable artifacts for compliance. In
             | highly secure environments logs are not only durable but
             | measures are taken to fingerprint them and their ordering
             | (like ratchet hashing) to ensure integrity is invariant.
             | 
             | I would note that using disk based logging is generally
             | harmful in these situations IMO. Network based logging is
             | less likely to cause blocking at some OS level or other
             | sorts of jitter that's harder to mask. Typically I develop
             | logging as an in memory thing that offloads to a remote
             | service over the network. The durability of the memory
             | store can be an issue in highly sensitive workloads, and
             | you'll want to do synchronous disk IO for that case to
             | ensure durability and consistent time budgets, but for
             | almost all application disk less logging is preferable.
        
               | shawnz wrote:
               | If you're not waiting for the remote log server to write
               | the messages to its disk before proceeding, then it seems
               | like that's not guaranteed to me? And if you are, then
               | you suffer all the problems of local disk logging but
               | also all the extra failure modes introduced by the
               | network, too
        
               | fnordpiglet wrote:
               | The difference is that network IO can be more easily
               | masked by the operating system than block device IO. When
               | you offload your logging to another thread the story
               | isn't over because your disk logging can interfere at a
               | system level. Network IO isn't as noisy. If durability is
               | important you might still need to wait for an ACK before
               | freeing the buffer for the message which might lead to
               | more overall memory use, all the operations play nicely
               | in a preemptable scheduling system.
               | 
               | Also, the failure modes of _systems_ are very tied to
               | durable storage devices attached to the system and very
               | rarely to network devices. By reducing the number of
               | things that need a disk (ideally to zero) you can remove
               | disks from the system and its availability story. Once
               | you get to fully disk less systems the system failure
               | modes are actually almost nothing. But even with disks
               | attached reducing the times you interact with the disk
               | (especially for chatty things like logs!) reduces the
               | likelihood the entire system fails due to a disk issue.
        
               | lmm wrote:
               | > If you're not waiting for the remote log server to
               | write the messages to its disk before proceeding, then it
               | seems like that's not guaranteed to me?
               | 
               | Depends on your failure model. I'd consider e.g.
               | "received in memory by at least 3/5 remote servers in
               | separate datacenters" to be safer than "committed to
               | local disk".
        
               | cduzz wrote:
               | You're still on one side or another of the CAP triangle.
               | 
               | In a network partition, you are either offline or your
               | data is not consistent.
               | 
               | If you're writing local to your system, you're losing
               | data if there's a single device failure.
               | 
               | https://en.wikipedia.org/wiki/CAP_theorem
        
               | fnordpiglet wrote:
               | For logs, which are immutable time series journals, any
               | copy is entirely sufficient. The first write is a quorum.
               | Also from a systems POV reads are not a feature of logs.
        
               | lmm wrote:
               | CAP is irrelevant, consistency does not matter for logs.
        
               | cduzz wrote:
               | Consistency is a synonym for "guaranteed", and means
               | "written to 2 remote, reliable, append-only storage
               | endpoints" (for any reasonable definition of reliability)
               | 
               | So -- a single system collecting a log event -- it is not
               | reliable (guaranteed) if written just to some device on
               | that system. Instances can be de-provisioned (and logs
               | lost), filesystems or databases can be scrambled, badguys
               | can encrypt your data, etc.
               | 
               | In this context, a "network partition" prevents
               | consistency (data not written to reliable media) or
               | prevents availability (won't accept new requests until
               | their activity can be logged reliably).
               | 
               | If you define "reliably" differently, you may have a
               | different interpretation of log consistency.
        
               | fnordpiglet wrote:
               | I'm not sure I understand the way you're using the
               | vocabulary. Consistency is a read operation concept not
               | write. There is no online reads for logs.
               | 
               | Availability is achieved if at least one writer
               | acknowledges a write. In a partition, it means when you
               | have multiple parts of the system disagreeing about the
               | write contents due to a partition in the network. But
               | because logs are immutable and write only, this doesn't
               | happen in any situation. The only situation this might
               | occur is if you're maintaining a distributed ratchet with
               | in delivery order semantics rather than eventually
               | consistent temporal semantics- in which case you will
               | never have CAP. But that's an insanely rare edge case.
               | 
               | Note CAP doesn't ensure perfect durability. I feel like
               | you're confusing consistency with durability. Consistency
               | means after I've durably written something all nodes
               | agree on read it's been written. Since logs don't support
               | read on the online data plane this is trivially not an
               | issue. Any write acknowledgment is sufficient.
        
               | lmm wrote:
               | > Consistency is a synonym for "guaranteed", and means
               | "written to 2 remote, reliable, append-only storage
               | endpoints" (for any reasonable definition of reliability)
               | 
               | No it doesn't. Read your own wiki link.
               | 
               | > In this context, a "network partition" prevents
               | consistency (data not written to reliable media) or
               | prevents availability (won't accept new requests until
               | their activity can be logged reliably).
               | 
               | A network partition doesn't matter for a log system
               | because there is no way to have consistency issues with
               | logs. Even a single partitioned-off instance can accept
               | writes without causing any problem.
               | 
               | Of course if you cannot connect to any instance of your
               | log service then you cannot write logs. But that's got
               | nothing to do with the CAP theorem.
        
               | ReactiveJelly wrote:
               | If it's a journaled record for the business then I think
               | I'd write it to SQLite or something with good
               | transactions and not mix it in the debug logs
        
               | fnordpiglet wrote:
               | There are more logs than debug logs, and using SQLite as
               | the encoding store for your logs doesn't make it not
               | logging.
        
           | supriyo-biswas wrote:
           | The better way to do this is to write the logs to a file or
           | an in-memory ring buffer and have a separate thread/process
           | push logs from the file/ring-buffer to the logging service,
           | allowing for retries if the logging service is down or slow
           | (for moderately short values of down/slow).
           | 
           | Promtail[1] can do this if you're using Loki for logging.
           | 
           | [1] https://grafana.com/docs/loki/latest/send-data/promtail/
        
             | insanitybit wrote:
             | But that's still not guaranteed delivery. You're doing what
             | the OP presented - choosing to drop logs under some
             | circumstances when the system is down.
             | 
             | a) If your service crashes and it's in-memory, you lose
             | logs
             | 
             | b) If your service can't push logs off (upstream service is
             | down or slow) you either drop logs, run out of memory, or
             | block
        
               | hgfghui7 wrote:
               | You are thinking too much in terms of the stated
               | requirements instead of what people actually want: good
               | uptime and good debugability. Falling back to local
               | logging means a blip in logging availability doesn't turn
               | into all hands on deck everything is on fire. And it
               | means that logs will very likely be available for any
               | failures.
               | 
               | In other words it's good enough.
        
               | mort96 wrote:
               | "Good uptime and good reliability but no guarantees" is
               | just a good best effort system.
        
               | insanitybit wrote:
               | Good enough is literally "best effort delivery", you're
               | just agreeing with them that this is ultimately a
               | distributed systems problem and you either choose CP or
               | AP.
        
               | kbenson wrote:
               | Yeah, what the "best effort" actually means in practice
               | is usually a result of how much resources you want to
               | throw at the problem. Those give you runway on how much
               | of a problem you can withstand and perhaps recover from
               | without any loss of data (logs), but in the end you're
               | usually still just buying time. That's usually enough
               | though.
        
               | o11c wrote:
               | Logging to `mmap`ed files is resilient to service
               | crashes, just not hardware crashes.
        
             | sroussey wrote:
             | We did something like this at Weebly for stats. The app
             | sent the stats to a local service via UDP, so shoot and
             | forget. That service aggregated for 1s and then sent off
             | server.
        
               | laurencerowe wrote:
               | Why UDP for a local service rather than a unix socket?
        
               | sroussey wrote:
               | Send and forget. Did not want to wait on an ack from a
               | broken process.
        
           | rezonant wrote:
           | > "If the GD logging interface is offline or slow, you'll
           | take downtime; is that okay?"
           | 
           | > [edit: added "GD" to clarify that I was referring to the
           | guaranteed delivery logging api, not the best effort logging
           | API]
           | 
           | i read GD as god-damned :-)
        
             | salamanderman wrote:
             | me too [EDIT: and I totally empathized]
        
           | Zondartul wrote:
           | I have some wishfull thinking ideas on this, but it should be
           | possible to have both at least in an imaginary, theoretical
           | scenario.
           | 
           | You can have both guaranteed delivery and no downtime if your
           | whole system is so deterministic that anything that normally
           | would result in blocking just will not, cannot happen. In
           | other words it should be a hard real-time system that is
           | formally verified top to bottom, down to the last transistor.
           | Does anyone actually do that? Verify the program and the
           | hardware to prove that it will never run out of memory for
           | logs and such?
           | 
           | Continuing this thought, logs are probably generated
           | endlessly, so either whoever wants them has to also guarantee
           | that that they are processedand disposed of right after being
           | logged... or there is a finite ammount of log messages that
           | can be stored (arbitrary number like 10 000) but the user (of
           | logs) has to guarantee that they will take the "mail" out of
           | the box sooner than it overfills (at some predictable,
           | deterministic rate). So really that means even if OUR system
           | is mathematically perfect, we're just making the downtime
           | someone elses problem - namely, the consumer of the infinite
           | logs.
           | 
           | That, or we guarantee that the final resources of our self-
           | contained, verified system will last longer than the finite
           | shelf life of the system as a whole (like maybe 5 years for
           | another arbitrary number)
        
             | morelisp wrote:
             | PACELC says you get blocking or unavailability or
             | inconsistency.
        
             | ElectricalUnion wrote:
             | From a hardware point of view, this system is unlikely to
             | exist, because you need a system with components that never
             | have any reliability issues ever to have a totally
             | deterministic system.
             | 
             | From a software point of view, this system is unlikely to
             | exist as it doesn't matter that the cause of your downtime
             | is "something else that isn't our system". As a result,
             | you're gonna end up requiring infinite reliable storage to
             | upkeep your promises.
        
         | tuetuopay wrote:
         | We had prod halt once when the syslog server hanged. Logs were
         | pushed through TCP which propagated the blocking to the whole
         | of prod. We switched to UDP transport since: better to lose
         | some logs than the whole of prod.
        
           | tetha wrote:
           | Especially if some system is unhappy enough to log enough
           | volume to blow up the local log disk... you'll usually have
           | enough messages and clues in the bazillion other messages
           | that have been logged.
        
           | deathanatos wrote:
           | TCP vs. UDP and async best-effort vs. synchronous are
           | _completely_ orthogonal...
           | 
           | E.g., a service I wrote wrote logs to an ELK setup; we logged
           | over TCP. But the logging was async: we didn't wait for logs
           | to make it to ELK, and if the logging services went down, we
           | just queued up logs locally. (To a point; at some point, the
           | buffer fills up, and logs were discarded. The process would
           | make a note of this if it happened, locally.)
        
             | tuetuopay wrote:
             | > TCP vs. UDP and async best-effort vs. synchronous are
             | completely orthogonal...
             | 
             | I agree, when stuff is properly written. I don't remember
             | the exact details, but at least with UDP the asyncness is
             | built-in: there is no backpressure whatsoever. So poorly
             | written software can just send udp to heart's end.
        
         | lopkeny12ko wrote:
         | I would posit that if your product's availability hinges on +/-
         | 100ms, you are doing something deeply wrong, and it's not your
         | logging library's fault. Users are not going to care if a
         | button press takes 100 more ms to complete.
        
           | fnordpiglet wrote:
           | 100ms for something like say API authorization on a high
           | volume data plane service would be unacceptable. Exceeding
           | latencies like that can degrade bandwidth and cause workers
           | to exhaust connection counts. Likewise, even in humans
           | response space, 100ms is an enormous part of a budget for
           | responsiveness. Taking again authorization, if you spend
           | 100ms, you're exhausting the perceptible threshold for a
           | humans sense of responsiveness to do something that's of no
           | practical value but is entirely necessary. Your UI developers
           | will be literally camped outside your zoom room with virtual
           | pitch forks night and day.
        
             | loeg wrote:
             | Yes, and in fact the service I am talking about is a high
             | volume data plane service.
        
           | hamandcheese wrote:
           | Not every API is a simple CRUD app with a user at the other
           | end.
        
           | kccqzy wrote:
           | Add some fan out and 100ms could suddenly become 1s, 10s...
        
           | saagarjha wrote:
           | Core libraries at, say, Google, are supposed to be reliable
           | to several nines. If they go down for long enough for a human
           | to notice, they're failing SLA.
        
           | loeg wrote:
           | Our service is expected to respond to small reads at under
           | 1ms at the 99th percentile. >100ms stalls (which can go into
           | many seconds) are absolutely unacceptable.
        
         | oneepic wrote:
         | Oh, we had this type of issue ("logging lib breaks everything")
         | with a $MSFT logging library. Imagine having 100 threads each
         | with their own logging buffer of 300MB. Needless to say it
         | _annihilated_ our memory and our server crashed, even on the
         | most expensive sku of Azure App Service.
        
           | pests wrote:
           | Brilliant strategy.
           | 
           | Reminds me a litte of the oldtimers trick of adding a
           | sleep(1000) somewhere so they could later come back and have
           | some resources later, or if they needed a quick win with the
           | client.
           | 
           | Now cloud companies are using malloc(300000000) it to fake
           | resource usage. /s
        
         | RobertVDB wrote:
         | Ah, the classic GLOG-induced service stall - brings back
         | memories! I've seen similar scenarios where logging, meant to
         | be a safety net, turns into a trap. Your 90-99% figure
         | resonates with my experience. It's like opening a small window
         | for fresh air and having a storm barrel in. We eventually had
         | to balance between logging verbosity and system performance,
         | kind of like a tightrope walk over a sea of unpredictable IO
         | delays. Makes one appreciate the delicate art of designing
         | logging systems that don't end up hogging the spotlight (and
         | resources) themselves, doesn't it?
        
       | tyingq wrote:
       | I wonder if this being fixed will result in it displacing some
       | notable amount of made-for-realtime hardware/software combos.
       | Especially since there's now lots of cheap, relatively low power,
       | and high clock rate ARM and x86 chips to choose from. With the
       | clock rates so high, perfect real-time becomes less important as
       | you would often have many cycles to spare for misses.
       | 
       | I understand it's less elegant, efficient, etc. But sometimes
       | commodity wins over correctness.
        
         | foobarian wrote:
         | Ethernet nods in vigorous agreement
        
         | tuetuopay wrote:
         | The thing is, stuff that require hard realtime cannot satisfy
         | with "many cycles to spare for misses". And CPU cycles is not
         | the whole story. A badly made task could lock down the kernel
         | not doing anything useful. The point of hard realtime is
         | "nothing cannot prevent this critical task from running".
         | 
         | For automotive and aerospace, you really want the control
         | systems to be able to run no matter what.
        
           | tyingq wrote:
           | Yes, there are parts of the space that can't be displaced
           | with this.
           | 
           | I'm unclear on why you put "many cycles to spare for misses"
           | in quotes, as if it's unimportant. If a linux/arm (or x86)
           | solution is displacing a much lower speed "real real time"
           | solution, that's the situation...the extra cycles mean you
           | can tolerate some misses while still being as granular as
           | what you're replacing. Not for every use case, but for many.
        
             | bee_rider wrote:
             | It is sort of funny that language has changed to the point
             | where quotes are assumed to be dismissive or sarcastic.
             | 
             | Maybe they used the quotes because they were quoting you,
             | haha.
        
               | tuetuopay wrote:
               | it's precisely why I quoted the text, to quote :)
        
             | archgoon wrote:
             | I'm pretty sure they were just putting it in quotes because
             | it was the expression you used, and they thus were
             | referencing it.
        
             | tuetuopay wrote:
             | You won't be saved from two tasks deadlocking with
             | cycles/second. _this_ is what hard realtime systems are
             | about. However, I do agree that not all systems have a real
             | hard realtime requirements. But those usually can handle a
             | non-rt kernel.
             | 
             | As for the quotes, it was a direct citation, not a way to
             | dismiss what you said.
        
               | tremon wrote:
               | I don't think realtime anything has much to do with mutex
               | deadlocks, those are pretty much orthogonal concepts. In
               | fact, I would make a stronger claim: if your "realtime"
               | system can deadlock, it's either not really realtime or
               | it has a design flaw and should be sent back to the
               | drawing board. It's not like you can say "oh, we have a
               | realtime kernel now, so deadlocks are the kernel's
               | problem".
               | 
               | Actual realtime systems are about workload scheduling
               | that takes into account processing deadlines. Hard
               | realtime systems can make guarantees about processing
               | latencies, and can preemptively kill or skip tasks if the
               | result would arrive too late. But this is not something
               | that the Linux kernel can provide, because it is a system
               | property rather than about just the kernel: you can't
               | provide any hard guarantees if you have no time bounds
               | for your data processing workload. So any discussion
               | about -rt in the context of the Linux kernel will always
               | be about soft realtime only.
        
               | tuetuopay wrote:
               | much agreed. I used deadlocks as an extreme example
               | that's easy to reason about and straight to the point of
               | "something independent of cpu cycles". something more
               | realistic would be IO operations taking more time than
               | expected. you would not want this to be blocking
               | execution for hard rt tasks.
               | 
               | In the case of the kernel, it is indeed too large to be
               | considered hard realtime. Best case we can make it into a
               | firmer realtime than it currently is. But I would place
               | it nowhere near avionics flight calculators (like fly-by-
               | wire systems).
        
               | hamilyon2 wrote:
               | I had an introductory course on OS and learned about hard
               | real-time systems. I had impression hard real-time is
               | about memory, deadlocks, livelocks, starvation, and so
               | on. And in general about how to design system that moves
               | forward even in presence of serious bugs and unplanned-
               | for circumstances.
        
               | syntheweave wrote:
               | Bugs related to concurrency - which is where you get race
               | conditions and deadlocks - tend to pop up wherever
               | there's an implied sequence of dependencies to complete
               | the computation, and the sequence is determined
               | dynamically by an algorithm.
               | 
               | For example, if I have a video game where there's
               | collision against the walls, I can understand this as
               | potentially colliding against "multiple things
               | simultaneously", since I'm likely to describe the scene
               | as a composite of bounding boxes, polygons, etc.
               | 
               | But to get an answer for what to do in response when I
               | contact a wall, I have to come up with an algorithm that
               | tests all the relevant shapes or volumes.
               | 
               | The concurrency bug that appears when doing this in a
               | naive way is that I test one, give an answer to that,
               | then modify the answer when testing the others. That can
               | lead to losing information and "popping through" a wall.
               | And the direction in which I pop through depends on which
               | one is tested first.
               | 
               | The conventional gamedev solution to that is to define
               | down the solution set so that it no longer matters which
               | order I test the walls in: with axis aligned boxes, I can
               | say "move only the X axis first, then move only the Y
               | axis". Now there is a fixed order, and a built-in bias to
               | favor one or the other axis. But this is enough for the
               | gameplay of your average platforming game.
               | 
               | The generalization on that is to describe it as a
               | constraint optimization problem: there are some number of
               | potential solutions, and they can be ranked relative to
               | the "unimpeded movement" heuristic, which is usually
               | desirable when clipping around walls. That solution set
               | is then filtered down through the collision tests, and
               | the top ranked one becomes the answer for that timestep.
               | 
               | Problems of this nature come up with resource allocation,
               | scheduling, etc. Some kind of coordinating mechanism is
               | needed, and OS kernels tend to shoulder a lot of the
               | burden for this.
               | 
               | It's different from real-time in that real-time is a
               | specification of what kind of performance constraint you
               | are solving for, vs allowing any kind of performance
               | outcome that returns acceptable concurrent answers.
        
             | nine_k wrote:
             | How much more expensive and power-hungry an ARM core would
             | be, if it displaces a lower-specced core?
             | 
             | I bet there are hard-realtime (commercial) OSes running on
             | ARM, and the ability to use a lower-specced (cheaper,
             | simpler, consuming less power) core may be seen as an
             | advantage enough to pay for the OS license.
        
               | lmm wrote:
               | > How much more expensive and power-hungry an ARM core
               | would be, if it displaces a lower-specced core?
               | 
               | The power issue is real, but it might well be the same
               | price or cheaper - a standard ARM that gets stamped out
               | by the million can cost less than a "simpler"
               | microcontroller with a smaller production run.
        
           | zmgsabst wrote:
           | What's an example of a system that requires hard real time
           | and couldn't cope with soft real time on a 3GHz system having
           | 1000 cycle misses costing 0.3us?
        
             | lelanthran wrote:
             | > What's an example of a system that requires hard real
             | time and couldn't cope with soft real time on a 3GHz system
             | having 1000 cycle misses costing 0.3us?
             | 
             | Any system that deadlocks.
        
             | LeifCarrotson wrote:
             | We've successfully used a Delta Tau real-time Linux motion
             | controller to run a 24 kHz laser galvo system. It's
             | ostensibly good for 25 microsecond loop rates, and pretty
             | intolerant of jitter (you could delay a measurement by a
             | full loop period if you're early). And the processor is a
             | fixed frequency Arm industrial deal that only runs at 1.2
             | GHz.
             | 
             | Perhaps even that's not an example of such a system, 0.3
             | microseconds is close to the allowable real-time budget,
             | and QC would probably not scrap a $20k part if you were off
             | by that much once.
             | 
             | But in practice, every time I've heard "soft real time"
             | suggested, the failure mode is not a sub-microsecond miss
             | but a 100 millisecond plus deadlock, where a hardware
             | watchdog would be needed to drop the whole system offline
             | and probably crash the tool (hopefully fusing at the tool
             | instead of destroying spindle bearings, axis ball screws,
             | or motors and gearboxes) and scrap the part.
        
               | zmgsabst wrote:
               | Thanks for the detailed reply!
               | 
               | I'm trying to understand where the roadblock on a rPi +
               | small FPGA hybrid board for $50 fails at the task... and
               | it sounds like the OS/firmware doesn't suffice. (Or a
               | SoC, like a Zynq.)
               | 
               | Eg, if we could guarantee that the 1.5GHz core won't "be
               | off" by more than 1us on responding and the FPGA can
               | manage IO directly to buffer out (some of) the jitter,
               | then the cost of many hobby systems with "(still not
               | quite) hard" real time systems would come down to
               | reasonable.
        
               | rmu09 wrote:
               | You can get pretty far nowadays with preempt rt and an
               | FPGA. Maybe you even can get near 1us max jitter. One
               | problem with the older RPis was unpredictable (to me)
               | behaviour of the hardware, i.e. "randomly" changing SPI
               | clocks, and limited bandwidth.
               | 
               | Hobby systems like a small CNC mill or lathe usually
               | don't need anything near 1us (or better) max jitter.
               | LinuxCNC (derived from NIST's Enhanced Machine
               | Controller, name changed due to legal threats) runs fine
               | on preempt-rt with control loops around 1kHz, with some
               | systems you can also run a "fast" thread with say 20kHz
               | and more to generate stepper motor signals, but that job
               | is best left for the FPGA or an additional uC IMHO.
        
             | krylon wrote:
             | I suspect a fair amount of hard real time applications are
             | not running on 3GHz CPUs. A 100MHz CPU (or lower) without
             | an MMU or FPU is probably more representative.
             | 
             | But it's not really so much about being fast, it's about
             | being able to _guarantee_ that your system can respond to
             | an event within a given amount of time _every time_. (At
             | least that is how a friend who works in embedded /real time
             | explained it to me.)
        
         | imtringued wrote:
         | Sure, but this won't magically remove the need for dedicated
         | cores. What will probably happen is that people will tell the
         | scheduler to exclusively put non-premptible real time tasks on
         | one of the LITTLE cores.
        
         | binary132 wrote:
         | I get the sense that applications with true realtime
         | requirements generally have hard enough requirements that they
         | cannot allow even the remote possibility of failure. Think
         | avionics, medical devices, automotive, military applications.
         | 
         | If you really need realtime, then you _really need_ it and
         | "close enough" doesn't really exist.
         | 
         | This is just my perception as an outsider though.
        
           | calvinmorrison wrote:
           | If you really need realtime, and you really actually need it,
           | should you be using a system like Linux at all?
        
             | refulgentis wrote:
             | ...yes, after realtime support lands
        
               | lumb63 wrote:
               | A lot of realtime systems don't have sufficient resources
               | to run Linux. Their hardware is much less powerful than
               | Linux requires.
               | 
               | Even if a system can run (RT-)Linux, it doesn't mean it's
               | suitable for real-time. Hardware for real-time projects
               | needs much lower interrupt latency than a lot of hardware
               | provides. Preemption isn't the only thing necessary to
               | support real-time requirements.
        
               | skyde wrote:
               | what kind of Hardware is considered to have "lower
               | interrupt latency"? Is there some kind of Arduino board I
               | could get that fit those lower interrupt latency required
               | for real-time but still support things like Bluetooth?
        
               | lumb63 wrote:
               | Take a look at the Cortex R series. The Cortex M series
               | still has lower interrupt latency than the A series, but
               | lower processing power. I imagine for something like
               | Bluetooth that an M is more than sufficient.
        
               | refulgentis wrote:
               | Sure but that was already mentioned before the comment I
               | was replying to. Standard hardware not being great for
               | realtime has nothing to do with hypothetical realtime
               | Linux.
        
               | rcxdude wrote:
               | realtime just means execution time is bounded. It doesn't
               | necessarily mean the latency is low. Though, in this
               | sense RT-linux should probably be mostly thought of as
               | low-latency linux, and the improvement in realtime
               | guarantees is mostly in reducing the amount of things
               | that can cause you to miss a deadline as opposed to
               | allowing you to guarantee any particular deadline, even a
               | long one.
        
             | tyingq wrote:
             | I'm guessing it's not that technical experts will be
             | choosing this path, but rather companies. Once it's "good
             | enough", and much easier to hire for, etc...you hire non-
             | experts because it works _most_ of the time. I 'm not
             | saying it's good, just that it's a somewhat likely outcome.
             | And not for everything, just the places where they can get
             | away with it.
        
               | froh wrote:
               | nah. when functional safety enters the room (as it does
               | for hard real time) then engineers go to jail if they
               | sign off something unsafe and people die because of that.
               | since the challenger disaster there is an awareness that
               | not listening to engineers can be expensive and cost
               | lifes.
        
             | synergy20 wrote:
             | no you don't, you use a true RTOS instead.
             | 
             | linux RTOS is at microseconds granularity but it still can
             | not 100% guarantee it, anything in cache nature (L2 cache,
             | TLB miss) are hard for hard real time.
             | 
             | a dual kernel with xenomai could improve it, but it is not
             | widely used somehow, only used in industrial controls I
             | think.
             | 
             | linux RT is great for audio, multimedia etc as well, where
             | real-time is crucial, but not a MUST.
        
               | froh wrote:
               | > anything in cache nature (L2 cache, TLB miss) are hard
               | for hard real time
               | 
               | yup that's why you'd pin the memory and the core for the
               | critical task. which, alas, will affect performance of
               | the other cores and all other tasks. and whoosh there
               | goes the BOM...
               | 
               | which again as we both probably are familiar with leads
               | to the SoC designs with a real time core microcontroller
               | and a HPC microprocessor on the same package. which leads
               | to the question how to architect the combined system of
               | real-time microcontroller and compute power but soft real
               | time microprocessor such that the overall system remains
               | sufficiently reliable...
               | 
               | oh joy and fun!
        
               | synergy20 wrote:
               | that's indeed the trend, i.e. put a small RTOS core along
               | with a normal CPU for non-real-time tasks, in the past
               | it's done on two boards: one is a MCU another is a
               | typical CPU, now it's one one, very important for where
               | RTOS is a must, e.g. robotics.
               | 
               | How the CPU and MCU communicate is a good question to
               | tackle, typically chip vendors provide some solutions, I
               | think OpenAMP is for this.
        
             | snickerbockers wrote:
             | Pretty sure most people who think they need a real-time
             | thread actually don't tbh.
        
             | rcxdude wrote:
             | really depends on your paranoia level and the consequences
             | for failure. soft to hard realtime is a bit of a spectrum
             | in terms of how hard of a failure missing a deadline
             | actually is, and therefore how much you try to verify that
             | you will actually meet that deadline.
        
             | eschneider wrote:
             | The beauty of multicore/multi-cpu systems is that you can
             | dedicate cores to running realtime OSs and leave the non-
             | hard realtime stuff to an embedded linux on it's own core.
        
             | snvzz wrote:
             | This is why the distinction between soft and hard realtime
             | exists.
             | 
             | Linux-rt makes linux actually decent at soft realtime.
             | PREEMPT_RT usually results on measured peak latency for
             | realtime tasks (SCHED_RR/SCHED_FIFO) on the order of a few
             | hundred usec.
             | 
             | Standard Linux lets latency go to tens of milliseconds,
             | easily verifiable by running cyclictest from rt-tests for a
             | few hours while using the computer. Needless to say, this
             | is unacceptable for many user cases, including pro audio,
             | videoconference and even gaming.
             | 
             | In contrast, AmigaOS's exec.library had no trouble yielding
             | solid sub-millisecond behaviour in 1985, on a relatively
             | slow 7MHz 68000.
             | 
             | No amount of patching Linux can give you hard realtime, as
             | it is about hard guarantees, backed up by proofs built from
             | formal verification, which Linux is excluded from due to
             | its sheer size.
             | 
             | There's a few RTOSs that are formally verified, but I only
             | know one that provides process isolation via the usual
             | supervisor vs user CPU modes virtualization model: seL4.
        
           | cptaj wrote:
           | Unless its just music
        
             | itishappy wrote:
             | It may not be safety critical, but remember that people can
             | and will purchase $14k power chords to (ostensibly) improve
             | the experience of listening to "just music".
             | 
             | https://www.audioadvice.com/audioquest-nrg-dragon-high-
             | curre...
        
               | cwillu wrote:
               | FWIW, a power chord is a _very_ different thing than a
               | power cord.
        
               | itishappy wrote:
               | LOL, what a typo! Good catch!
        
             | binary132 wrote:
             | what if your analog sampler ruins the only good take you
             | can get? What if it's recording a historically important
             | speech? Starting to get philosophical here...
        
             | duped wrote:
             | Unless that music is being played through a multi kW
             | amplifier into a stadium and an xrun causes damage to the
             | drivers and/or audience (although, they should have hearing
             | protection anyway).
        
               | beiller wrote:
               | Your talk of xrun is giving me anxiety. When I was
               | younger I dreamed of having a linux audio effects stack
               | with cheap hardware on stage and xruns brought my dreams
               | crashing down.
        
               | robocat wrote:
               | xrun definition:
               | https://unix.stackexchange.com/questions/199498/what-are-
               | xru...
               | 
               | (I didn't know the term, trying to be helpful if others
               | don't)
        
               | tinix wrote:
               | just a buffer under/overrun
        
               | snvzz wrote:
               | it's cross-run aka xrun because these buffers are
               | circular.
               | 
               | Depending on implementation, it will either pause or play
               | the old sample where the new one isn't yet but should be.
        
               | spacechild1 wrote:
               | > and an xrun causes damage to the drivers and/or
               | audience
               | 
               | An xrun typically manifests itself as a (very short)
               | discontinuity or gap in the audio signal. It might sound
               | unpleasant, but there's nothing dangerous about it.
        
           | dripton wrote:
           | You can divide realtime applications into safety-critical and
           | non-safety-critical ones. For safety-critical apps, you're
           | totally right. For non-critical apps, if it's late and
           | therefore buggy once in a while, that sucks but nobody dies.
           | 
           | Examples of the latter include audio and video playback and
           | video games. Nobody wants pauses or glitches, but if you get
           | one once in a while, nobody dies. So people deliver these on
           | non-RT operating systems for cost reasons.
        
             | binary132 wrote:
             | This kind of makes the same point I made though -- apps
             | without hard realtime requirements aren't "really realtime"
             | applications
        
               | duped wrote:
               | The traditional language is "hard" vs "soft" realtime
        
               | binary132 wrote:
               | RTOS means hard realtime.
        
               | pluto_modadic wrote:
               | I sense that people will insist on their requirements
               | being hard unnecessarily... and that the bug is the fault
               | of it being on a near-realtime system instead of it being
               | faulty even on a realtime one.
        
               | tremon wrote:
               | No -- soft realtime applications are things like video
               | conferencing, where you care mostly about low latency in
               | the audio/video stream but it's ok to drop the occasional
               | frame. These are still realtime requirements, different
               | from what your typical browser does (for example): who
               | cares if a webpage is rendered in 100ms or 2s? Hard
               | realtime is more like professional audio/video recording
               | where you want hard guarantees that each captured frame
               | is stored and processed within the alotted time.
        
               | atq2119 wrote:
               | > who cares if a webpage is rendered in 100ms or 2s?
               | 
               | Do you really stand by the statement of this rhetorical
               | question? Because if yes: this attitude is a big reason
               | for why web apps are so unpleasant to work with compared
               | to locally running applications. Depending on the
               | application, even 16ms vs 32ms can make a big difference.
        
               | tremon wrote:
               | Yes I do, because I don't think the attitude is the
               | reason, the choice of technology is the reason. If you
               | want to control for UI latency, you don't use a generic
               | kitchen-sink layout engine, you write a custom interface.
               | You can't eat your cake and have it too, even though most
               | web developers want to disagree.
        
             | lll-o-lll wrote:
             | > You can divide realtime applications into safety-critical
             | and non-safety-critical ones.
             | 
             | No. This is a common misconception. The distinction between
             | a hard realtime system and a soft realtime system is simply
             | whether missing a timing deadline leads to a) failure of
             | the system or b) degradation of the system (but the system
             | continues to operate). Safety is not part of it.
             | 
             | Interacting with the real physical world often imposes
             | "hard realtime" constraints (think signal processing).
             | Whether this has safety implications simply depends on the
             | application.
        
             | jancsika wrote:
             | Your division puts audio _performance_ applications in a
             | grey area.
             | 
             | On the one hand they aren't safety critical.
             | 
             | On the other, I can imagine someone getting chewed out or
             | even fired for a pause or a glitch in a professional
             | performance.
             | 
             | Probably the same with live commercial video compositing.
        
               | eschneider wrote:
               | Audio is definitely hard realtime. The slightest delays
               | are VERY noticeable.
        
               | jancsika wrote:
               | I mean, it should be.
               | 
               | But there are plenty of performers who apparently rely on
               | Linux boxes and gumption.
        
           | wongarsu wrote:
           | There is some amount of realtime in factory control where
           | infrequent misses will just increase your reject rate in QA.
        
           | abe_m wrote:
           | Having worked on a number of "real time" machine control
           | applications:
           | 
           | 1) There is always a possibility that something fails to run
           | by its due date. Planes crash sometimes. Cars won't start
           | some times. Factory machinery makes scrap parts sometimes. In
           | a great many applications, missing a real time deadline
           | results in degraded quality, not end of life, or regional
           | catastrophy. The care that must be taken to lower the
           | probability of failure needs to be in proportion to the
           | consequence of the failure. Airplanes have redundant systems
           | to reduce (but not eliminate) possibility of failure, while
           | cars and trucks generally don't.
           | 
           | 2) Even in properly working real time systems, there is a
           | tolerance window on execution time. As machines change modes
           | of operation, the amount of calculation effort to complete a
           | cycle changes. If the machine is in a warm up phase, it may
           | be doing minimal calculations, and the scan cycle is fast.
           | Later it may be doing a quality control function that needs
           | to do calculations on inputs from numerous sensors, and the
           | scan cycle slows down. So long as the scan cycle doesn't
           | exceed the limit for the process, the variation doesn't cause
           | problems.
        
             | mlsu wrote:
             | That is true, but generally not acceptable to a regulating
             | body for these critical applications. You would need to
             | design and implement a validation test to prove timing in
             | your system.
             | 
             | Much easier to just use an RTOS and save the expensive
             | testing.
        
               | vlovich123 wrote:
               | But you still need to implement the validation test to
               | prove that the RTOS has these requirements...
        
               | mlsu wrote:
               | You do not, if you use an RTOS that is already certified
               | by the vendor. This saves not only a lot of time and
               | effort for verification and validation, but also a lot of
               | risk, since validation is unpredictable and extremely
               | expensive.
               | 
               | Therefore it'd be remarkable not to see a certified RTOS
               | in such industries and applications where that validation
               | is required, like aerospace or medical.
        
             | blt wrote:
             | How is your point 2) a response to any of the earlier
             | points? Hard realtime systems don't care about variation,
             | only the worst case. If your code does a single multiply-
             | add most of the time but calls `log` every now and then,
             | hard realtime requirement is perfectly satisfied if the
             | bound on the worst-case runtime of `log` is small enough.
        
               | abe_m wrote:
               | I suppose it isn't, but I bristle when I see someone
               | tossing around statements like "close enough doesn't
               | really exist". In my experience when statements like that
               | start up, there are people involved that don't understand
               | variation is a part of every real process. My point is
               | that if you're going to get into safety critical systems,
               | there is always going to be some amount of variation, and
               | there is always a "close enough", as there is never an
               | "exact" in real systems.
        
               | jancsika wrote:
               | The point is to care about the _worst case_ within that
               | variation.
               | 
               | Most software cares about the average case, or, in the
               | case of the Windows 10/11 start menu animation, the
               | average across all supported machines apparently going 20
               | years into the future.
        
           | moffkalast wrote:
           | I feel like at this point we have enough cores (or will soon,
           | anyway) that you could dedicate one entirely to one process
           | and have it run realtime.
        
             | KWxIUElW8Xt0tD9 wrote:
             | That's one way to run DPDK processes under LINUX -- you get
             | the whole processor for doing whatever network processing
             | you want to do -- no interruptions from anything.
        
           | ajross wrote:
           | > Think avionics, medical devices, automotive, military
           | applications.
           | 
           | FWIW by-device/by-transistor-count, the bulk of "hard
           | realtime systems" with millisecond-scale latency requirements
           | are just audio.
           | 
           | The sexy stuff are all real applications too. But mostly we
           | need this just so we don't hear pops and echos in our video
           | calls.
        
             | binary132 wrote:
             | Nobody thinks Teams is a realtime application
        
               | ajross wrote:
               | No[1], but the people writing the audio drivers and DSP
               | firmware absolutely do. Kernel preemption isn't a feature
               | for top-level apps.
               | 
               | [1] Actually even that's wrong: for sure there are teams
               | of people within MS (and Apple, and anyone else in this
               | space) measuring latency behavior at the top-level app
               | layer and doing tuning all the way through the stack. App
               | latency excursions can impact streams too, though ideally
               | you have some layer of insulation there.
        
           | lmm wrote:
           | Like many binary distinctions, when you zoom in on the
           | details hard-versus-soft realtime is really more of a
           | spectrum. There's "people will die if it's late". "The line
           | will have to stop for a day if it's late". "If it's late,
           | it'll wreck the part currently being made". Etc.
           | 
           | Even hard-realtime systems have a failure rate, in practice
           | if not in theory - even a formally verified system might
           | encounter a hardware bug. So it's always a case of tradeoffs
           | between failure rate and other factors (like cost). If
           | commodity operating systems can push their failure rate down
           | a few orders of magnitude, that moves the needle, at least
           | for some applications.
        
         | JohnFen wrote:
         | When I'm doing realtime applications using cheap, low-power,
         | high-clockrate ARM chips (I don't consider x86 chips for those
         | sorts of applications), I'm not using an operating system at
         | all. An OS interferes too much, even an RTOS. I don't see how
         | this changes anything.
         | 
         | But it all depends on what your application is. There are a lot
         | of applications that are "almost real-time" in need. For those,
         | this might be useful.
        
         | PaulDavisThe1st wrote:
         | CPU speed and clock rate has absolutely nothing to do with
         | realtime anything.
        
       | eisbaw wrote:
       | Great to hear. However even if Linux the kernel is real-time,
       | likely the hardware won't be due to caches and internal magic CPU
       | trickery.
       | 
       | Big complex hardware is a no-no for true real-time.
       | 
       | That's why AbsInt and WCET tools mainly has simple CPU
       | architectures. 8051 will truly live forever.
       | 
       | btw, Zephyr RTOS.
        
         | wholesomepotato wrote:
         | Features of modern CPUs don't really prevent them from real
         | time usage, afaik. As long as something is bounded and can be
         | reasoned about it can be used to build a real time system. You
         | can always assume no cache hits and alikes, maximum load etc
         | and as long as you can put a bound on the time it will take,
         | you're good to go.
        
           | synergy20 wrote:
           | mlock your memory, test with cache miss and cache
           | invalidation scenarios will help, using no heap for memory
           | allocation, but it's a bit hard
        
             | jeffreygoesto wrote:
             | Right. But still possible.
             | 
             | https://www.etas.com/en/applications/etas-middleware-
             | solutio...
        
             | eschneider wrote:
             | Does anyone use paged memory in hard realtime systems?
        
           | SAI_Peregrinus wrote:
           | Exactly. "Real-time" is a misnomer, it should be called
           | "bounded-time". As long as the bound is deterministic, known
           | in advance, and guaranteed, it's "real-time". For it to be
           | useful it also must be under some application-specific
           | duration.
           | 
           | The bounds are usually in CPU cycles, so a faster CPU can
           | sometimes be used even if it takes more cycles. CPUs capable
           | of running Linux usually have higher latency (in cycles) than
           | microcontrollers, but as long as that can be kept under the
           | (wall clock) duration limits with bounded-time it's fine.
           | There will still be cases where the worst-case latency to
           | fetch from DRAM in an RT-Linux system will be higher than a
           | slower MCU fetching from internal SRAM, so RT-Linux won't
           | take over all these systems.
        
           | bloak wrote:
           | So the things that might prevent you are:
           | 
           | 1. Suppliers have not given you sufficient information for
           | you to be able to prove an upper bound on the time taken.
           | (That must happen a lot.)
           | 
           | 2. The system is so complicated that you are not totally
           | confident of the correctness of your proof of the upper
           | bound.
           | 
           | 3. The only upper bound that can prove with reasonable
           | confidence is so amazingly bad that you'd be better off with
           | cheaper, simpler hardware.
           | 
           | 4. There really isn't a worst case. There might, for example,
           | be a situation equivalent to "roll the dice until you don't
           | get snake eyes". In networking, for example, sometimes after
           | a collision both parties try again after a random delay so
           | the situation is resolved eventually with probability one but
           | there's no actual upper bound. A complex CPU and memory
           | system might have something like that? Perhaps you'd be happy
           | with "the probability of this operation taking more than 2000
           | clock cycles is less than 10^-13" but perhaps not.
        
             | formerly_proven wrote:
             | You're probably thinking about bus arbiters in 4.), which
             | are generally fast but have no bounded settling time.
        
           | dooglius wrote:
           | System management mode is one example of a feature on modern
           | CPUs that prevents real-time usage https://wiki.linuxfoundati
           | on.org/realtime/documentation/howt...
        
         | nraynaud wrote:
         | I think it's really useful on 'big' MCU, like the raspberry pi.
         | There exists an entire real time spirit there, where you don't
         | really use the CPU to do any bit banging but everything is on
         | time as seen from the outside. You have timers that receive the
         | quadrature encoders inputs, and they just send interrupt when
         | they wrap, the GPIO system can be plugged to the DMA, so you
         | can stream the memory to the output pins without involving the
         | CPU (again, interrupts at mid-buffer and empty buffer). You can
         | stream to a DAC, stream from a ADC to memory with the DMA. A
         | lot of that stuff bypasses the caches to get a predictable
         | latency.
        
           | stefan_ wrote:
           | Nice idea but big chip design strikes again: on the latest
           | Raspberry Pi, GPIO pins are handled by the separate IO chip
           | connected over PCI Express. So now all your GPIO stuff needs
           | to traverse a shared serial bus (that is also doing bulk
           | stuff like say raw camera images).
           | 
           | And already on many bigger MCUs, GPIOs are just separate
           | blocks on a shared internal bus like AHB/APB that connects
           | together all the chip IP, causing unpredictable latencies.
        
         | 0xDEF wrote:
         | >Big complex hardware is a no-no for true real-time.
         | 
         | SpaceX uses x86 processors for their rockets. That small drone
         | copter NASA put on Mars uses "big-ish" ARM cores that can
         | probably run older versions of Android.
        
           | ska wrote:
           | Does everything runs on those CPUs though? Hard realtime
           | control is often done on much simpler MCU at the lowest
           | level, with oversight/planning for a high level system....
        
             | zokier wrote:
             | In short, no. For Ingenuity (the Mars2020 helicopter) the
             | flight computer runs on pair of hard-realtime Cortex R5
             | MCUs paired with a FPGA. The non-realtime Snapdragon SoC
             | handles navigation/image processing duties.
             | 
             | https://news.ycombinator.com/item?id=26907669
        
               | ska wrote:
               | That's basically what I expected, thanks.
        
         | SubjectToChange wrote:
         | _Big complex hardware is a no-no for true real-time._
         | 
         | There are advanced real time cores like the Arm Coretex-R82. In
         | fact many real time systems are becoming quite powerful due to
         | the need to process and aggregate ever increasing amounts of
         | sensor data.
        
         | snvzz wrote:
         | >8051 will truly live forever.
         | 
         | 68000 is the true king of realtime.
        
       | Aaargh20318 wrote:
       | What does this mean for the common user? Is this something you
       | would only enable in very specific circumstances or can it also
       | bring a more responsive system to the general public?
        
         | stavros wrote:
         | As far as I can understand, this is for Linux becoming an
         | option when you need an RTOS, so for critical things like
         | aviation, medical devices, and other such systems. It doesn't
         | do anything for the common user.
        
           | SubjectToChange wrote:
           | The Linux kernel, real-time or not, is simply too large and
           | too complex to realistically certify for anything safety
           | critical.
        
           | ska wrote:
           | For the parts of such systems that you would need an RTOS for
           | this isn't really a likely replacement because the OS is way
           | too complex.
           | 
           | The sort of thing it could help with is servicing hardware
           | that _does_ run hard realtime. For example, you have an RTOS
           | doing direct control of a robot or medical device or
           | whatever, and you have a UI pendant or the like that a user
           | is interacting with. If linux on that pendant can make some
           | realtime latency guarantees, you may be able to simplify
           | communication between the two without risking dropping bits
           | on the floor.
           | 
           | Conversely, for the common user it could improve things like
           | audio/video streaming, in theory but I haven't looked into
           | details or how much trouble there is currently.
        
             | elcritch wrote:
             | It depends on the field. I know of one robots control
             | software company planning to switch to a RT Linux stack.
             | Their current one is a *BSD derived rtos that runs, kid you
             | not, alongside windows.
             | 
             | RT Linux might not pass on some certifications, but there's
             | likely many systems where it would be sufficient.
        
               | ska wrote:
               | With robots a lot depends on the scope of movement and
               | speed, and whether or not it interacts with
               | environment/people. For some applications the controller
               | is already dedicated hardware on the joint module anyway
               | with some sophistication, connected to an CAN (or
               | etherCAT) bus or something like that - so no OS is the
               | tightest loop - I could see the high level control
               | working on a RT linux or whatever if you wanted too, lots
               | of tradeoffs. Mainly though it's the same argument, you
               | probably don't want a complex OS involved in the lowest
               | level/finest time tick updates. Hell some of the encoders
               | are spewing enough data you probably end up with the
               | first thing it hits being an ASIC anyway, then a MCU
               | dealing with control updates/fusion etc., then a higher
               | level system for planning.
        
         | ravingraven wrote:
         | If by "common" user you mean the desktop user, not much. But
         | this is a huge deal for embedded devices like industrial
         | control and communication equipment, as their devs will be able
         | to use the latest mainline kernel if they need real-time
         | scheduling.
        
         | fbdab103 wrote:
         | My understanding is that real-time makes a system _slower_. To
         | be real-time, you have to put a time allocation on everything.
         | Each operation is allowed X budget, and will not deviate. This
         | means if the best-case operation is fast, but the worst case is
         | slow, the system has to always assume worst case.
        
           | sesm wrote:
           | It's a classic latency-throughout trade off: smaller latency
           | - lower throughout. Doing certain operations in bulk (like
           | GC) increases latency, but is also more efficient and
           | increases throughput.
        
           | snvzz wrote:
           | >real-time makes a system slower.
           | 
           | linux-rt's PREEMPT_RT has a negligible impact. It is there,
           | but it is negligible. It does, however, enable a lot of use
           | cases where Linux fails otherwise, such as pro audio.
           | 
           | In modern usage, it even helps reduce input jitter with
           | videogames and enable lower latency videoconference.
           | 
           | I am hopeful most distributions will turn it on by default,
           | as it benefits most users, and causes negligible impact on
           | throughput-centric workloads.
        
         | dist-epoch wrote:
         | It could allow very low latency audio (1-2 ms). Not a huge
         | thing, but nice for some audio people.
        
           | snvzz wrote:
           | s/nice/needed/g
        
         | andrewaylett wrote:
         | RT doesn't necessarily improve latency, it gives it a fixed
         | upper bound for _some_ operations. But the work needed to allow
         | RT can definitely improve latency in the general case -- the
         | example of avoiding synchronous printk() calls is a case in
         | point. It should improve latency under load even when RT isn 't
         | even enabled.
         | 
         | I think I'm right in asserting that a fully-upstreamed RT
         | kernel won't actually do anything different from a normal one
         | unless you're actually running RT processes on it. The reason
         | it's taken so long to upstream has been the trade-offs that
         | have been needed to enable RT, and (per the article) there
         | aren't many of those left.
        
         | rcxdude wrote:
         | the most common desktop end-user that might benefit from this
         | is those doing audio work: latency and especially jitter can be
         | quite a pain there.
        
       | knorker wrote:
       | I just want SCHED_IDLEPRIO to actually do what it says.
        
       | deepsquirrelnet wrote:
       | What a blast from the past. I compiled a kernel for Debian with
       | RT_PREEMPT about 17-18 years ago to use with scientific equipment
       | that needed tighter timings. I was very impressed at the
       | latencies and jitter.
       | 
       | I haven't really thought about it since then, but I can imagine
       | lots of used cases for something like an embedded application
       | with raspberry pi where you don't quite want to make the leap
       | into a microcontroller running an RTOS.
        
         | HankB99 wrote:
         | Interesting to mention the Raspberry Pi. I saw an article just
         | a day or two ago that claimed that the RpiOS was stated by and
         | ran on top of RTOS. That's particularly interesting because at
         | one time years ago, I saw suggestions that Linux could run as a
         | task on an RTOS. Things that required hard real time deadlines
         | could run on the RTOS and not be subject to the delays that a
         | virtual memory system could entail.
         | 
         | I don't recall if this was just an idea or was actually
         | implemented. I also have seen only the one mention of RpiOS on
         | an RTOS so I'm curious about that.
        
           | rsaxvc wrote:
           | >That's particularly interesting because at one time years
           | ago, I saw suggestions that Linux could run as a task on an
           | RTOS.
           | 
           | I've worked with systems that ran Linux as a task of uITRON
           | as well as threadX, both on somewhat obscure ARM hardware.
           | Linux managed the MMU but had a large carveout for the RTOS
           | code. They had some strange interrupt management so that
           | Linux could 'disable interrupts' but while Linux IRQs were
           | disabled, an RTOS IRQ could still fire and context switch
           | back to an RTOS task. I haven't seen anything like this on
           | RPi though, but it's totally doable.
        
             | HankB99 wrote:
             | Interesting to know that it was more than just an idea -
             | thanks!
        
       | 0xDEF wrote:
       | What do embedded real-time Linux people use for bootloader, init
       | system, utilities, and C standard library implementation? Even
       | Android that does not have real-time constraints ended up using
       | Toybox for utilities and rolling their own C standard library
       | (Bionic).
        
         | rcxdude wrote:
         | You aren't likely to need to change a lot of these: the whole
         | point is basically making it so that all that can run as normal
         | but won't really get in the way of your high-priority process.
         | It's just that your high-priority process needs to be careful
         | not to block on anything that might take too long due to some
         | other stuff running. In which case you may need to avoid
         | certain C standard library calls, but not replace it entirely.
        
         | jovial_cavalier wrote:
         | I use u-boot for a boot loader. As for init and libc, I just
         | use systemd and glibc.
         | 
         | Boot time is not a bottleneck for my application (however long
         | it takes, the client will take longer...), and I'm sure there's
         | some more optimal libc to use, but I'm not sure the juice is
         | worth the squeeze.
         | 
         | I'm also interested in what others are doing.
        
         | shandor wrote:
         | I guess U-boot, uclibc, and busybox is quite common starting
         | point.
         | 
         | Of course, this varies immensely between different use cases,
         | as "embedded Linux" spans such a huge swath of different kinds
         | of systems from very cheap and simple to complex and powerful.
        
       | salamanderman wrote:
       | I had a frustrating number of job interviews in my early career
       | where the interviewers didn't know what realtime actually was.
       | That "and predictable delay" concept from the article frequently
       | seemed to be lost on many folks, who seemed to think realtime
       | just meant fast, whatever that means.
        
         | mort96 wrote:
         | I would even remove the "minimum" part altogether; the point of
         | realtime is that operations have predictable upper bounds. That
         | might even mean slower average cases than in non-realtime
         | systems. If you're controlling a car's braking system, "the
         | average delay is 50ms but might take up to 80ms" might be
         | acceptable, whereas "the average delay is 1ms but it might take
         | arbitrarily long, possibly multiple seconds" isn't.
        
         | ska wrote:
         | The old saying "real time" /= "real fast". Hard vs "soft"
         | realtime muddies things a bit, but I think it's probably the
         | majority of software developers don't really understand what
         | realtime actually is either.
        
       | NalNezumi wrote:
       | Slightly tangential, but does anyone know good learning material
       | to understand real-time (Linux) kernel more? For someone with
       | rudimentary Linux knowledge.
       | 
       | I've had to compile&install real-time kernel as a requirement for
       | a robot arm (franka) control computer. It would be nice to know a
       | bit more than just how to install the kernel.
        
         | ActorNightly wrote:
         | https://www.freertos.org/implementation/a00002.html
         | 
         | Generally, having experience with Greenhills in a previous job,
         | for personal projects like robotics or control systems I would
         | recommend programming a microcontroller directly rather than
         | dealing with SoC with RTOS. Modern STM32s with Cortex chips
         | have enough processing power to run pretty much anything.
        
       | alangibson wrote:
       | Very exiting news for those of us building CNC machines with
       | LinuxCNC. The end of kernel patches is nigh!
        
       | Tomte wrote:
       | OSADL runs a cool QA farm: https://www.osadl.org/OSADL-QA-Farm-
       | Real-time.linux-real-tim...
        
       | Animats wrote:
       | QNX had this right decades ago. The microkernel has upper bounds
       | on everything it does. There are only a few tens of thousands of
       | lines of microkernel code. All the microkernel does is allocate
       | memory, dispatch the CPU, and pass messages between processes.
       | Everything else, including drivers and loggers, is in user space
       | and can be preempted by higher priority threads.
       | 
       | The QNX kernel doesn't do anything with strings. No parsing, no
       | formatting, no messages.
       | 
       | Linux suffers from being too bloated for real time. Millions of
       | lines of kernel, all of which have to be made preemptable. It's
       | the wrong architecture for real time. So it took two decades to
       | try to fix this.
        
         | vacuity wrote:
         | For a modern example, there's seL4. I believe it does no
         | dynamic memory allocation. It's also formally verified for
         | various properties. (Arguably?) its biggest contribution to
         | kernel design is the pervasive usage of capabilities to
         | securely but flexibly export control to userspace.
        
           | adastra22 wrote:
           | Capabilities are important, but I don't think that was
           | introduced by seL4. Mach (which underlies macOS) has the same
           | capability-based system.
        
             | vacuity wrote:
             | I didn't say seL4 introduced capabilities. However, to my
             | knowledge, seL4 was the first kernel to show that
             | _pervasive_ usage of capabilities is both feasible and
             | beneficial.
        
               | monocasa wrote:
               | The other L4s before it showed that caps are useful and
               | can be implemented efficiently.
        
               | vacuity wrote:
               | https://dl.acm.org/doi/pdf/10.1145/2517349.2522720
               | 
               | " We took a substantially different approach with seL4;
               | its model for managing kernel memory is seL4's main
               | contribution to OS design. Motivated by the desire to
               | reason about resource usage and isolation, we subject all
               | kernel memory to authority conveyed by capabili- ties
               | (except for the fixed amount used by the kernel to boot
               | up, including its strictly bounded stack). "
               | 
               | I guess I should've said seL4 took capabilities to the
               | extreme.
        
               | naasking wrote:
               | seL4 was heavily inspired by prior capability based
               | operating systems like EROS (now CapROS) and Coyotos.
               | Tying all storage to capabilities was core to those
               | designs.
        
               | adastra22 wrote:
               | There's quite a history of capabilities-based research
               | OS's that culminated in, but did not start with L4 (of
               | which seL4 is a later variant).
        
               | vacuity wrote:
               | Yes, but I believe seL4 took it to the max. I may be
               | wrong on that count, but I think seL4 is unique in that
               | it leverages capabilities for pretty much everything
               | except the scheduler. (There was work in that area, but
               | it's incomplete.)
        
               | adastra22 wrote:
               | L4 was developed in the 90's. Operating Systems like
               | Amoeba, which were fundamentally capability-based to a
               | degree that even exceeds L4, were a hot research topic in
               | the 80's.
               | 
               | L4's contribution was speed. It was assumed that
               | microkernels, and especially capability-based
               | microkernels were fundamentally slower than monolithic
               | kernels. This is why Linux (1991) is monolithic. Yet L4
               | (1994) was the fastest operating system in existence at
               | the time, despite being a microkernel and capability
               | based. It's too bad those dates aren't reversed, or we
               | might have had a fast, capability-based, microkernel
               | Linux :(
        
               | josephg wrote:
               | How did it achieve its speed? My understanding was that
               | microkernel architectures were fundamentally slower than
               | monolithic kernels because context switching is slower
               | than function calls. How did L4 manage to be the fastest?
        
               | vacuity wrote:
               | Two factors are the small working set (fit the code and
               | data in cache) and heavily optimized IPC. The IPC in L4
               | kernels is "a context switch with benefits": in the best
               | case, the arguments are placed in registers and the
               | context is switched. Under real workloads, microkernels
               | probably will be slower, but not by much.
        
               | mananaysiempre wrote:
               | IIRC the KeyKOS/EROS/CapROS tradition used capabilities
               | for everything _including_ the scheduler. Of course,
               | pervasive persistence makes those systems somewhat
               | esoteric (barring fresh builds, they never shut down or
               | boot up, only go to sleep and wake up in new bodies;
               | compare Smalltalk, etc.).
        
               | vacuity wrote:
               | Guess I'm too ignorant. I need to read up on these. I did
               | know about the persistence feature. I think it's not
               | terrible but also not great, and systems should be
               | _designed_ for being shut down and apps being closed.
        
               | naasking wrote:
               | > I think it's not terrible but also not great, and
               | systems should be designed for being shut down and apps
               | being closed.
               | 
               | The problem with shutdowns and restarts is the secure
               | bootstrapping problem. The boot process must be within
               | the trusted computing base, so how do you minimize the
               | chance of introducing vulnerabilities? With
               | checkpointing, if you start in a secure state, you're
               | guaranteed to have a secure state after a reboot. This is
               | not the case with other any other form of reboot,
               | particularly ones that are highly configurable and so
               | easy for the user to introduce an insecure configuration.
               | 
               | In any case, many apps are now designed to restore their
               | state on restart, so they are effectively checkpointing
               | themselves, so there's clearly value to checkpointing. In
               | systems with OS-provided checkpointing it's a central
               | shared service and doesn't have to be replicated in every
               | program. That's a significant reduction in overall system
               | code that can go wrong.
        
               | vacuity wrote:
               | It's fallacious to assume that the persistence model of
               | the system can't enter an invalid state and thus cause
               | issues similar to bootstrapping. The threat model also
               | doesn't make sense to me: if an attacker can manipulate
               | the boot process, I feel like they would be able to
               | attack the overall system just fine. Also, there's the
               | bandwidth usage, latency, and whatnot. I think
               | persistence is a strictly less powerful, although
               | certainly convenient, design for an OS.
        
               | naasking wrote:
               | > The threat model also doesn't make sense to me: if an
               | attacker can manipulate the boot process, I feel like
               | they would be able to attack the overall system just
               | fine.
               | 
               | That's not true actually. These capability systems have
               | the principle of least privilege right down to their
               | core. The checkpointing code is in the kernel which only
               | calls out to the disk driver in user space. The
               | checkpointing code itself is basically just "flush these
               | cached pages to their corresponding locations on disk,
               | then update a boot sector pointer to the new checkpoint",
               | and booting a system is "read these pages pointed to by
               | this disk pointer sequentially into memory and resume".
               | 
               | The attack surface in this system is incomparably small
               | compared to the boot process of a typical OS, which run
               | user-defined scripts and scripts written by completely
               | unknown people from software you downloaded from the
               | internet, often with root or other broad sets of
               | privileges.
               | 
               | I really don't think you can appreciate how this system
               | works without digging into it a little. EROS was built
               | from the design of KeyKOS that ran transactional bank
               | systems back in the 80s. KeyKOS pioneered this kind of
               | checkpointing system, so it saw real industry use in
               | secure systems for years. I recommend at least reading an
               | overview:
               | 
               | https://flint.cs.yale.edu/cs428/doc/eros-ieee.pdf
               | 
               | EROS is kind of like what you'd get it if you took
               | Smalltalk and tried to push it into the hardware as an
               | operating system, while removing all sources of ambient
               | authority. It lives on as CapROS:
               | 
               | https://www.capros.org/
        
               | vacuity wrote:
               | I don't deny that bootstrapping in current systems is
               | ridiculous, but I don't see why it can't be improved.
               | It's not like EROS is a typical OS either. In any case,
               | I'll read up on those OSes.
        
               | adastra22 wrote:
               | Amoeba was my favorite, as it was a homogeneous,
               | decentralized operating system. Different CPU
               | architectures spread across different data centers, and
               | it was all homogenized together into a single system
               | image. You had a shell prompt where you typed commands
               | and the OS could decide to spawn your process on your
               | local device, in the server room rack, or in some
               | connected datacenter in Amsterdam, it didn't make a
               | difference. From the perspective of you, your program, or
               | the shell, it's just a giant many-core machine with weird
               | memory and peripheral access latencies that the OS
               | manages.
               | 
               | Oh, and anytime as needed the OS could serialize out your
               | process, pipe it across the network to another machine,
               | and resume. Useful for load balancing, or relocating a
               | program to be near the data it is accessing. Unless your
               | program pays special attention to the clock, it wouldn't
               | notice.
               | 
               | I still think about Amoeba from time to time, and imagine
               | what could have been if we had gone down that route
               | instead.
        
               | vacuity wrote:
               | Wouldn't there be issues following from distributed
               | systems and CAP? Admittedly, I know nothing about Amoeba.
               | 
               | E.g. You spawn a process on another computer and then the
               | connection drops.
        
               | adastra22 wrote:
               | There's no free lunch of course, so you would have
               | circumstances where a network partition at a bad time
               | would result in a clone instead of a copy. I don't know
               | what, if anything, Amoeba did about this.
               | 
               | In practice it might not be an issue. The reason you'd
               | typically do something like move processes across a WAN
               | is because you want it to operate next to data it is
               | making heavy use of. The copy that booted up local to the
               | data would continue operating, while the copy at the
               | point of origin would suddenly see the data source go
               | offline.
               | 
               | Now of course more complex schemes can be devised, like
               | if the data source is replicated and so both copies
               | continue operating. Maybe a metric could be devised for
               | detecting these instances when the partition is healed,
               | and one or both processes are suspended for manual
               | resolution? Or maybe programs just have to be written
               | with the expectation that their capabilities might
               | suddenly become invalid at any time, because the
               | capability sides with the partition that includes the
               | resource? Or maybe go down the route of making the entire
               | system transactional, so that partition healing can
               | occur, and only throw away transaction deltas once
               | receipts are received for all nodes ratcheting state
               | forward?
               | 
               | It'd be an interesting research area for sure.
        
               | Animats wrote:
               | No, that was KeyKOS, which was way ahead of its time.[1]
               | Norm Hardy was brilliant but had a terrible time getting
               | his ideas across.
               | 
               | [1] https://en.wikipedia.org/wiki/KeyKOS
        
           | _kb wrote:
           | And unfortunately had its funding dumped because it wasn't
           | shiny AI.
        
             | snvzz wrote:
             | Its old source of funding. And it was much more complex[0]
             | than that.
             | 
             | seL4 is now a healthy non-profit, seL4 foundation[1].
             | 
             | 0. https://microkerneldude.org/2022/02/17/a-story-of-
             | betrayal-c...
             | 
             | 1. https://microkerneldude.org/2022/03/22/ts-in-2022-were-
             | back/
        
               | Animats wrote:
               | The trouble with L4 is that it's so low-level you have to
               | put another OS on top of it to do anything Which usually
               | means a bloated Linux. QNX offers a basic POSIX
               | interface, implemented mostly as libraries.
        
               | snvzz wrote:
               | Note that L4 and seL4 are very different kernels. They
               | represent the 2nd generation and 3rd generation of
               | microkernels respectively.
               | 
               | With that out of the way, you're right in that the
               | microkernel doesn't present a posix interface.
               | 
               | But, like QNX, there are libraries for that, seL4
               | foundation itself maintains some.
               | 
               | They have a major ongoing effort on system servers,
               | driver APIs and ways to deploy system scenarios. Some of
               | them were talked about in a recent seL4 conference.
               | 
               | And then there's third party efforts like the amazing
               | Genode[0], which supports dynamic scenarios with the same
               | drivers and userspace binaries across multiple
               | microkernels.
               | 
               | They even have a modern webbrowser, 3d acceleration as
               | well as providing a virtualbox box that runs inside
               | Genode, so the dogfooding developers are be able to run
               | e.g. Linux inside a virtualbox to bridge the gap.
               | 
               | 0. https://www.genode.org/
        
         | bregma wrote:
         | The current (SDP 8) kernel has 15331 lines of code, including
         | comments and Makefiles.
        
         | gigatexal wrote:
         | QNX is used in vehicle infotainment systems no? Where else?
         | 
         | I'm not bothered by the kernel bloat. There's a lot of dev time
         | being invested in Linux and while the desktop is not as much of
         | a priority as say the server space a performant kernel on
         | handhelds and other such devices and the dev work to get it
         | there will benefit the desktop users like myself.
        
           | bkallus wrote:
           | I went to a conference at GE Research where I spoke to some
           | QNX reps from Blackberry for a while. Seemed like they were
           | hinting that some embedded computers in some of GE'S
           | aerospace and energy stuff relies on QNX.
        
           | lmm wrote:
           | > QNX is used in vehicle infotainment systems no? Where else?
           | 
           | A bunch of similar embedded systems. And blackberry, if
           | anyone's still using them.
        
           | tyfon wrote:
           | It was used in my old toyota avensis from 2012. The
           | infotainment was so slow you could measure performance in
           | seconds pr frame instead of frames pr second :)
           | 
           | In the end, all I could practically use it for was as a
           | bluetooth audio connector.
        
           | notrom wrote:
           | I've worked with it in industrial automation systems in large
           | scale manufacturing plants where it was pretty rock solid.
           | And I'm aware of it's use in TV production and transmissions
           | systems.
        
           | Cyph0n wrote:
           | Cisco routers running IOS-XR, until relatively recently.
        
           | SubjectToChange wrote:
           | Railroads/Positive Train Control, emergency call centers,
           | etc. QNX is used all over the place. If you want an even more
           | impressive Microkernel RTOS, then Green Hills INTEGRITY is a
           | great example. It's the RTOS behind the B-2, F-{16,22,35},
           | Boeing 787, Airbus A380, Sikorsky S-92, etc.
        
             | yosefk wrote:
             | "Even more impressive" in what way? I haven't used
             | INTEGRITY but used the Green Hills compiler and debugger
             | extensively for years and they're easily the most buggy
             | development tools I've ever had the misfortune to use. To
             | me the "impressive" thing is their ability to lock safety
             | critical software developers into using this garbage.
        
           | dilyevsky wrote:
           | Routers, airplanes, satellites, nuclear power stations, lots
           | of good stuff
        
         | gigatexal wrote:
         | > QNX had this right decades ago. The microkernel has upper
         | bounds on everything it does. There are only a few tens of
         | thousands of lines of microkernel code. All the microkernel
         | does is allocate memory, dispatch the CPU, and pass messages
         | between processes. Everything else, including drivers and
         | loggers, is in user space and can be preempted by higher
         | priority threads.
         | 
         | So much like a well structured main method in a C program or
         | other C like language where main just orchestrates the calling
         | of other functions and such. In this case main might initialize
         | different things where the QNX kernel doesn't but the idea or
         | general concept remains.
         | 
         | I'm no kernel dev but this sounds good to me. Keeps things
         | simple.
        
           | vacuity wrote:
           | Recently, I've been thinking that we need a microkernel
           | design in applications. You have the core and then services
           | that can integrate amongst each other and the core that
           | provide flexibility. Like the "browser as an OS" kind of
           | things but applied more generally.
        
             | galdosdi wrote:
             | Yes! This reminds me strongly of the core/modules
             | architecture of the apache httpd, as described by the
             | excellent O'Reilly book on it.
             | 
             | The process of serving an HTTP request is broken into a
             | large number of fine grained stages and plugin modules may
             | hook into any or all of these to modify the input and
             | output to each stage.
             | 
             | The same basic idea makes it easy to turn any application
             | concept into a modules-and-core architecture. From the day
             | I read (skimmed) that book a decade or two ago this pattern
             | has been burned into my brain
        
             | blackth0rn wrote:
             | ECS systems for the gaming world are somewhat like this.
             | There is the core ECS framework and then the systems and
             | entity's integrate with each other
        
               | spookie wrote:
               | ECS is incredible. Other areas should take notice
        
               | whstl wrote:
               | Agreed. I find that we're going in this direction in many
               | areas, games just got there much faster.
               | 
               | Pretty much everywhere there is some undercurrent of "use
               | this ultra-small generic interface for everything and
               | life will be easier". With games and ECS, microkernels
               | and IPC-for-everything, with frontend frameworks and
               | components that only communicate between themselves via
               | props and events, with event sourcing and CQRS backends,
               | Actors in Erlang, with microservices only communicating
               | via the network to enforce encapsulation... Perhaps even
               | Haskell's functional-core-imperative-shell could count as
               | that?
               | 
               | I feel like OOP _tried_ to get to this point, with
               | dependency injection and interface segregation, but
               | didn't quite get there due to bad ergonomics, verbosity
               | and because it was still too easy to break the rules. But
               | it was definitely an attempt at improving things.
        
             | vbezhenar wrote:
             | COM, OSGI, Service architecture, microservice architecture
             | and countless other approaches. This is correct way to
             | build applications, because it gets reinvented over and
             | over again.
        
             | elcritch wrote:
             | That's pretty much what Erlang/OTP is, and it's like a
             | whole OS. Though it lacks capabilities.
        
         | js2 wrote:
         | VxWorks is what's used on Mars and it's a monolithic kernel, so
         | there's more than one way to do it. :-)
        
           | dilyevsky wrote:
           | I think RT build also had to disable mmu
        
         | signa11 wrote:
         | this feels like tannenbaum-torvalds debate once again.
        
         | creshal wrote:
         | > Millions of lines of kernel, all of which have to be made
         | preemptable.
         | 
         | ~90% of those are device drivers, which you'd still need with a
         | microkernel if you want it to run or arbitrary hardware.
        
           | dontlaugh wrote:
           | But crucially, drivers in a microkernel run in user space and
           | are thus pre-emptible by default. Then the driver itself only
           | has to worry about dealing with hardware timing when pre-
           | empted.
        
             | creshal wrote:
             | Sure, but who's going to write the driver in the first
             | place? Linux's "millions of lines of code" are a really
             | underappreciated asset, there's tons of obscure hardware
             | that is no longer supported by any other actively
             | maintained OS.
        
               | dontlaugh wrote:
               | I also don't see how we could transition to a
               | microkernel, indeed.
        
               | naasking wrote:
               | The very first "hypervisors" were actually microkernels
               | that ran Linux as a guest. This was done with Mach on
               | PowerPC/Mac systems, and also the L4 microkernel. That's
               | one way.
               | 
               | The only downside of course, is that you don't get the
               | isolation benefits of the microkernel for anything
               | depending on the Linux kernel process.
        
         | matheusmoreira wrote:
         | And yet it's getting done! It's very impressive work.
        
         | AndyMcConachie wrote:
         | Linus Torvalds and Andrew Tannenbaum called. They want their
         | argument back!
        
       | the8472 wrote:
       | For an example how far the kernel goes to get log messages out
       | even on a dying system and how that's used in real deployments:
       | 
       | https://netflixtechblog.com/kubernetes-and-kernel-panics-ed6...
        
       | rwmj wrote:
       | About printk, the backported RT implementation of printk added to
       | the RHEL 9.3 kernel has deadlocks ...
       | https://issues.redhat.com/browse/RHEL-15897 &
       | https://issues.redhat.com/browse/RHEL-9380
        
       | w10-1 wrote:
       | There is no end game until there are end users beating on the
       | system. That would put the 'real' in 'real-time'.
       | 
       | But who using a RTOS now would take the systems-integration
       | cost/risk of switching? Would this put Android closer to Metal
       | performance?
        
       | sesm wrote:
       | IMO if you really care about certain process being responsive,
       | you should allocate dedicated CPU cores and a contiguous region
       | of memory to it, that shouldn't be touched by the rest of OS. Oh,
       | and also give a it direct access to a separate network card. I'm
       | not sure if Linux supports this.
        
       | pardoned_turkey wrote:
       | The conversation here focuses on a distinction between "hard"
       | real-time applications, where you probably don't want a general-
       | purpose OS like Linux no matter what; and "soft" real-time
       | applications like videoconferencing or audio playback, where you
       | nothing terrible happens if you get a bit of stuttering or drop a
       | couple of frames every now and then. The argument is that RT
       | Linux would be a killer solution for that.
       | 
       | But you can do all these proposed "soft" use cases with embedded
       | Linux today. It's not like low-latency software video or audio
       | playback is not possible, or wasn't possible twenty years ago.
       | You only run into problems on busy systems where non-preemptible
       | I/O could regularly get in the way. That's seldom a concern in
       | embedded environments.
       | 
       | I think there are compelling reasons for making the kernel fully-
       | preemptible, giving people more control over scheduling, and so
       | forth. But these reasons have relatively little to do with
       | wanting Linux to supersede minimalistic realtime OSes or bare-
       | metal code. It's just good hygiene that will result in an OS
       | that, even in non-RT applications, behaves better under load.
        
       | jovial_cavalier wrote:
       | does HN have any thoughts on Xenomai[1]? I've been using it for
       | years without issue.
       | 
       | On a BeagleBone Black, it typically gives jitter on the order of
       | hundreds of nanoseconds. I would consider it "hard" real-time (as
       | do they). I'm able to schedule tasks periodically on the scale of
       | tens of microseconds, and they never get missed.
       | 
       | It differs from this in that Real-Time Linux attempts to make
       | Linux itself preemptive, whereas Xenomai is essentially its own
       | kernel, running Linux as a task on top. It provides an ABI which
       | allows you to run your own tasks alongside or at higher prio than
       | Linux. This sidesteps the `printk()` issue, for instance, since
       | Xenomai doesn't care. It will gladly context switch out of printk
       | in order to run your tasks.
       | 
       | The downside is that you can't make normal syscalls while inside
       | of the Xenomai context. Well... you can, but obviously this
       | invalidates the realtime model. For example, calling `printf()`
       | or `malloc()` inside of a xenomai task is not preemptable. The
       | Xenomai ABI does its best to replicate everything you may need as
       | far as syscalls, which works great as long as you're happy doing
       | your own heap allocations.
       | 
       | [1]: https://xenomai.org/
        
       | dataflow wrote:
       | I feel like focusing on the kernel side misses CPU level issues.
       | 
       | Is there any known upper bound on, say, how long a memory access
       | instruction takes on x86?
        
         | rsaxvc wrote:
         | I don't know for x86.
         | 
         | But for things that really matter, I've tested by configuring
         | the MMU to disable caching for the memory that the realtime
         | code lives in and uses to emulate 0% hitrate. And there's
         | usually still a fair amount of variance on top of that
         | depending on if the memory controller has a small cache, and
         | where the memory controller is in its refresh cycle.
        
           | dataflow wrote:
           | Yeah. And I'm not sure that even _that_ would give you the
           | worst case as far as the cache is concerned. Of course I don
           | 't know how these implementations work, but it seems
           | plausible that code that directly uses memory could run
           | faster than code that encounters a cache miss beforehand (or
           | contention, if you're using multiple cores). Moreover there's
           | also the instruction cache, and I'm not sure if you can
           | disable caching for that in a meaningful way?
           | 
           | For soft real time, I don't see a problem. But for hard real
           | time, it seems a bit scary.
        
             | rsaxvc wrote:
             | You're right! I can think of two cases I've run into where
             | bypassing the cache can be faster compared to a miss.
             | 
             | On some caches the line must be filled before allowing a
             | write(ignoring any write buffer at the interface above the
             | cache) - those basically halve the memory bandwidth when
             | writing to a lot of cache lines. Some systems now have
             | instructions for filling a cache line directly to avoid
             | this. And some CPUs have bit-per-byte validity tracking to
             | avoid this too.
             | 
             | Even on caches with hit-during-fill, a direct read from an
             | address near the last-to-be-filled end of a cacheline can
             | sometimes be a little faster than a cache miss, since the
             | miss will fill the rest of the line first.
        
             | rsaxvc wrote:
             | > Moreover there's also the instruction cache, and I'm not
             | sure if you can disable caching for that in a meaningful
             | way?
             | 
             | Intels used to boot with their caches disabled, but I
             | haven't worked with them in forever, and never multicore.
             | 
             | I worked with a lot of microcontrollers, and it's not
             | uncommon to be able to disable the instruction cache there.
             | 
             | There are a few things that require the data caches too,
             | like atomic accesses on ARM. Usually we were doing
             | something fairly short though in our realtime code, so it
             | was easy enough to map just the memory it needed as
             | uncacheable.
        
         | saagarjha wrote:
         | You can continually take page faults in a Turing complete way
         | without executing any code, so I would guess this is unbounded?
        
           | dataflow wrote:
           | I almost mentioned page faults, but that's something the
           | kernel has control over. It could just make sure everything
           | is in memory so there aren't any faults. So it's not really
           | an issue I think.
        
       ___________________________________________________________________
       (page generated 2023-11-17 23:01 UTC)