[HN Gopher] The weirdest QNX bug I've encountered (2021)
       ___________________________________________________________________
        
       The weirdest QNX bug I've encountered (2021)
        
       Author : fanf2
       Score  : 106 points
       Date   : 2024-06-30 14:42 UTC (8 hours ago)
        
 (HTM) web link (mental-reverb.com)
 (TXT) w3m dump (mental-reverb.com)
        
       | ragnot wrote:
       | Every developer I know (myself included) that has worked with QNX
       | has a story about some insane bug that took significant effort to
       | uncover. At this point, I would say the only reason one should
       | look at QNX is for cost since it is pretty cheap. The low jitter
       | on context-switching to the highest priority thread is a nice
       | thing but the dev process is absolute garbage.
        
         | fargle wrote:
         | yep. it's really trash. used it 20-25 years ago when they were
         | just introducing "neutrino" vs. the classic QNX4. the former
         | had a good rep with auto and medical usage.
         | 
         | - very bad port of GCC was buggy and generated bad code. it was
         | the result of some idiot blindly haphazardly applying a ton
         | random incoherent patches to try to get it to build instead of
         | porting it properly. (to be clear mainline GCC at that time was
         | fine; we re-ported it ourselves instead) - and, of course, they
         | used their own faulty compiler to build their libraries and
         | services ;) causing unknown carnage waiting to be discovered.
         | 
         | - malloc broken (heavy use under multiple threads causes heap
         | corruption). replaced with dlmalloc.
         | 
         | - serial port driver broken. rewrote a new one.
         | 
         | - intel network card driver crashes. replaced hardware with
         | 3com to survive.
         | 
         | - certain math library functions broken (iirc, fmod). replaced.
         | 
         | and so on and on.
         | 
         | it doesn't matter _what_ certification your RTOS (or whatever)
         | has. if you cannot examine the source, rebuild it, etc. (oss or
         | private source available) - it _cannot_ be trusted. this was
         | one of the worse examples, but it 's _always_ like this with
         | "proprietary" OS/toolkits.
        
         | nubinetwork wrote:
         | When you can tell qnx to give you a root shell as an
         | unprivileged user, ps being slow doesn't surprise me...
         | https://www.juliandunn.net/2006/08/21/on-hacking-the-unisys-...
        
       | bxparks wrote:
       | I counted 417 comments on that page and scrolled through a few
       | dozen. Every one of them was spam. That's pretty much the
       | internet these days isn't it.
       | 
       | Other than that, the blog post was very interesting, I learned a
       | bit of history of QNX, and concluded that I should avoid it.
        
         | OnionBlender wrote:
         | Is it though? I don't remember the last time I saw so many spam
         | messages. Most sites I visit do a better job of preventing or
         | removing spam.
        
         | ronsor wrote:
         | The spambots are literally having conversations with each
         | other.
        
       | nrclark wrote:
       | QNX really needs to modernize if they want to survive. Their
       | tooling ecosystem is stuck in 2008, and their kernel's
       | performance is pretty low. IIRC, the kernel itself is also
       | single-threaded, and can't take advantage of multiple CPUs (even
       | if tasks can be SMP scheduled).
       | 
       | Their moat is supposedly their ASIL certification, but I see that
       | value shrinking more and more over time for the following
       | reasons:
       | 
       | 1. If your product has a software-related failure, customers
       | won't care about all of your certifications. Only the end
       | product.
       | 
       | 2. I'm not convinced that the QNX kernel is less buggy than the
       | Linux kernel. Also, most failures don't tend to be kernel
       | related.
        
         | burstmode wrote:
         | >If your product has a software-related failure, customers
         | won't care about all of your certifications. Only the end
         | product.
         | 
         | If you're in a market where a ASIL certification is needed, the
         | customers ONLY care about this certifications. I keeps them out
         | of jail.
        
           | nrclark wrote:
           | Can you point me at some more detailed rules that support
           | your assertion? Not trying to argue - I'm actually interested
           | to read more details on that.
        
             | throwaway173738 wrote:
             | It's not really a rule, but rather in some environments you
             | have to be able to say in court that you did everything you
             | could to make sure your software worked safely and
             | correctly. Sometimes you will be risking criminal charges
             | if you can't.
        
             | rwmj wrote:
             | It's the reason why some companies, like IBM [disclosure: I
             | work for Red Hat], seem to sell products even though there
             | seems to be little rational reason why customers would buy
             | them, as in they have poorer performance or quality at a
             | much greater price. Those products are certified against
             | dozens of financial, safety, security or other standards,
             | and customers in certain markets (government, military,
             | nuclear, automotive etc) simply have to buy the certified
             | products. The consequences of not doing so range from
             | products not being supported, all the way to going to jail
             | for gross negligence.
             | 
             | Edit: I wrote a rather highly rated HN comment about why
             | Red Hat makes money last year:
             | https://news.ycombinator.com/item?id=35588297
        
               | jeffrallen wrote:
               | Another example of this is FIPS-140 crypto. It is
               | objectively bad crypto in the 2020's. But it's mandated
               | in some settings for either bureaucratic reasons or due
               | to regulatory capture.
        
             | f1shy wrote:
             | The truth is, too many managers have never read the ISO
             | document, and follow the CYA methodology, and ask for
             | everything to be certified. The ISO just says (bare with me
             | with this stupid simplification) "do whatever you want, but
             | make sure p(disaster)<1e-20. You have to be able to justify
             | decisions, but will not helt having certified frameworks,
             | os, and tools, if you did a bad FMEDA
        
               | szundi wrote:
               | Following this logic it seems to be a good choice to buy
               | RHEL because you have no chance running linux with those
               | probability margins that you just wrote. Electronic
               | components might have those. So stay out of jail
        
           | f1shy wrote:
           | There is NO market where "ASIL" is required. Of course if
           | something happens you better have a safety case as described
           | in the ISO26262, or a good excuse. That being said, that a
           | system has a safety case according to ISO26262 ASIL D, does
           | bot mean at all that all pieces must be certified.
           | 
           | Currently working in a project where ASIL D is reached by
           | having an independent microcontroller, whatching out the
           | whole QM mess.
        
             | foooorsyth wrote:
             | >There is NO market where "ASIL" is required
             | 
             | Define "required". If every single legal department at
             | every single major automotive company says "we must obtain
             | ASIL-B certification for our gauge cluster software or we
             | can't sell cars", does it matter if regulators don't
             | overtly mandate it? The legal environments of all of the
             | major automotive markets make it a de facto requirement.
        
               | f1shy wrote:
               | The ISO26262 was defined by the automakers themselves
               | (almost all were represented in the committee) so yes,
               | they want to follow it. There is no legal requirement. It
               | does not specially help in case of litigation either.
        
         | cbsks wrote:
         | QNX Muon removed the global kernel lock so it has much better
         | multi-core performance.
        
           | jeffreygoesto wrote:
           | https://www.qnx.com/developers/articles/rel_7063_0.html#what.
           | ..
        
         | foundry27 wrote:
         | What kind of tooling modernization would even make a
         | difference?
         | 
         | At least SDP 8.0 overhauled the kernel to not be locked to a
         | single thread anymore, which is nice IMO
        
         | agustamir wrote:
         | > kernel's performance is pretty low
         | 
         | Can you elaborate on this? How is it "low"
        
       | lfkdev wrote:
       | What is going on with the comment section on this post?
        
         | Jolter wrote:
         | Clearly they are not implementing good bot protection. The
         | results are not very surprising IMO.
        
         | ralferoo wrote:
         | I actually thought some of the comments were funny. Especially
         | the one about the crab in the shell! No idea why they thought
         | it was related to QNX, but an insight into the mind of spammers
         | nonetheless.
        
       | torginus wrote:
       | Honestly this kinda shows me that no matter what degree of
       | robustness we design into our systems (null saftety, memory
       | safety, thread safety etc.), some types of system breaking bugs
       | are unavoidable (such as DOSing the system by calling a system
       | API function in an infinite loop), and are often impossible to
       | distingushing from desired behavior.
        
         | smaudet wrote:
         | How about no infinite loops (as a start)?
         | 
         | Unless you are the kernel, and you can demonstrate that your
         | loop is "safe" via some set of static analysis.
        
           | banish-m4 wrote:
           | Achievable with formal verification and soft- and hard-
           | realtime worst case timing validation. It's not impossible
           | but also not easy. It requires significant engineering
           | investment.
        
         | banish-m4 wrote:
         | That's shrugging off inferior processes and substandard work.
         | Bugs are unnecessary because there is a finite amount of code
         | and they can all be eliminated if the correct eyeballs spend
         | sufficient time reviewing, testing, and simplifying behavior to
         | focus on robust reliability and correctness. Breaking changes
         | can be allowed in dev packs with semver release notes. There
         | are no excuses for sloppy engineering.
        
       | Animats wrote:
       | _" At this point, an intermezzo with some QNX history is in
       | order. A bit more than a decade ago, the QNX source code was
       | available to the public. Back then, QNX had a vibrant open source
       | community. People would experiment with the kernel, write various
       | useful utilities and help each other in forums. QNX even had a
       | fully featured Desktop GUI, ran Firefox and was self-hosting, so
       | you could develop for QNX right on QNX itself with full IDE and
       | compiler support. It was beautiful."_
       | 
       |  _" Then QNX was bought, source code access was revoked and the
       | community largely withered away. Questions were increasingly
       | asked via private support tickets directly to QNX, locked away
       | from the public. QNX know-how becomes harder and harder to
       | acquire, open source software for modern QNX releases is
       | essentially non-existent and the driver situation is a
       | catastrophe. The QNX kernel is the most beautiful and interesting
       | kernel I have ever had the pleasure of working with, but it lies
       | in the shackles of corporate ownership."_
       | 
       | It's sad.
       | 
       | QNX was originally an independent company. During that period,
       | anyone could get a free copy of QNX for personal use. It wasn't
       | open source, but it was available. It's POSIX-compatible, so it
       | was a supported target for Gnu, Firefox, and Eclipse. We used QNX
       | for our DARPA Grand Challenge vehicle in 2003-2005, and all that
       | code was developed on desktop QNX.
       | 
       | Then QNX was acquired by Harmon, the successor to Harmon-Kardon,
       | which once made home audio components and pivoted to car audio.
       | They were thinking car infotainment. Harmon didn't really know
       | what to do with an operating system, especially since the big
       | market was systems for industrial control and point of sale. So
       | eventually they opened the source.
       | 
       | Then QNX was acquired by Blackberry, the early smartphone
       | company. They closed the source, very suddenly. They even killed
       | off the free version for personal and educational use. So all
       | third party open source development stopped. Blackberry
       | eventually shipped a phone that ran QNX, but they were not
       | powerful enough as a company to keep a third phone standard
       | going. So Blackberry went to Android.
       | 
       | Blackberry killed off the self-hosted desktop environment, and
       | users now had to cross-compile from Windows.
       | 
       | And QNX became more of a niche product than ever.
        
         | akira2501 wrote:
         | > but they were not powerful enough as a company to keep a
         | third phone standard going.
         | 
         | They absolutely were, which is the tragedy of the whole thing,
         | people absolutely loved their products and strongly preferred
         | them to everything else on the market.
         | 
         | Instead of recognizing the game changer that the iPhone was
         | they slept on the market and didn't do much to bring big touch
         | screens and rich internet applications to their platform.
         | 
         | It was a slow and agonizing death.
        
           | asveikau wrote:
           | > Instead of recognizing the game changer that the iPhone was
           | they slept on the market
           | 
           | But they made that error long before buying QNX. By the time
           | they bought QNX they had probably lost too much ground to
           | turn it around.
        
         | fouc wrote:
         | I've always thought QNX 4.24 w/ Photon microGUI should've been
         | fully open sourced, even a decade later. It would've been
         | competitive with linux in the desktop OS arena.
        
         | mavhc wrote:
         | The only way a second standard can survive is by being open
         | source. A 3rd? Very unlikely
        
         | kragen wrote:
         | it's unfortunate that qnx wasn't open-source; revoking source
         | code access would have been impossible
        
       | arsome wrote:
       | I actually ran across this issue myself, SIGQUIT'd the process,
       | loaded it into a debugger and found the exact same problem. I can
       | confirm the problem still exists on QNX 7.1. Fortunately we were
       | moving off it, so I didn't think much more about it, but glad
       | someone wrote it up.
        
       | banish-m4 wrote:
       | What I like about seL4 (although not a complete embedded dev
       | platform) is formally-verification. QNX might have EAL4 in some
       | configurations, but like most every other operating system on the
       | planet, they haven't bothered to up their game by formally
       | verifying it for correctness. This is a shame and entirely
       | preventable with greater attention to testing and verification.
        
         | kragen wrote:
         | what sel4 shows is that it's entirely preventable with a
         | rewrite from scratch by a team of formal-methods ph.d.s over
         | many years, if they invent a design that allows the kernel to
         | only be a few thousand lines of code so that the titanic effort
         | of formal verification becomes feasible, barely. it's not
         | something you can do with a codebase of millions of lines of
         | code or a codebase that wasn't written from scratch with formal
         | verification in mind. yet, anyway
        
       ___________________________________________________________________
       (page generated 2024-06-30 23:00 UTC)