[HN Gopher] The weirdest QNX bug I've encountered (2021)
___________________________________________________________________
The weirdest QNX bug I've encountered (2021)
Author : fanf2
Score : 106 points
Date : 2024-06-30 14:42 UTC (8 hours ago)
(HTM) web link (mental-reverb.com)
(TXT) w3m dump (mental-reverb.com)
| ragnot wrote:
| Every developer I know (myself included) that has worked with QNX
| has a story about some insane bug that took significant effort to
| uncover. At this point, I would say the only reason one should
| look at QNX is for cost since it is pretty cheap. The low jitter
| on context-switching to the highest priority thread is a nice
| thing but the dev process is absolute garbage.
| fargle wrote:
| yep. it's really trash. used it 20-25 years ago when they were
| just introducing "neutrino" vs. the classic QNX4. the former
| had a good rep with auto and medical usage.
|
| - very bad port of GCC was buggy and generated bad code. it was
| the result of some idiot blindly haphazardly applying a ton
| random incoherent patches to try to get it to build instead of
| porting it properly. (to be clear mainline GCC at that time was
| fine; we re-ported it ourselves instead) - and, of course, they
| used their own faulty compiler to build their libraries and
| services ;) causing unknown carnage waiting to be discovered.
|
| - malloc broken (heavy use under multiple threads causes heap
| corruption). replaced with dlmalloc.
|
| - serial port driver broken. rewrote a new one.
|
| - intel network card driver crashes. replaced hardware with
| 3com to survive.
|
| - certain math library functions broken (iirc, fmod). replaced.
|
| and so on and on.
|
| it doesn't matter _what_ certification your RTOS (or whatever)
| has. if you cannot examine the source, rebuild it, etc. (oss or
| private source available) - it _cannot_ be trusted. this was
| one of the worse examples, but it 's _always_ like this with
| "proprietary" OS/toolkits.
| nubinetwork wrote:
| When you can tell qnx to give you a root shell as an
| unprivileged user, ps being slow doesn't surprise me...
| https://www.juliandunn.net/2006/08/21/on-hacking-the-unisys-...
| bxparks wrote:
| I counted 417 comments on that page and scrolled through a few
| dozen. Every one of them was spam. That's pretty much the
| internet these days isn't it.
|
| Other than that, the blog post was very interesting, I learned a
| bit of history of QNX, and concluded that I should avoid it.
| OnionBlender wrote:
| Is it though? I don't remember the last time I saw so many spam
| messages. Most sites I visit do a better job of preventing or
| removing spam.
| ronsor wrote:
| The spambots are literally having conversations with each
| other.
| nrclark wrote:
| QNX really needs to modernize if they want to survive. Their
| tooling ecosystem is stuck in 2008, and their kernel's
| performance is pretty low. IIRC, the kernel itself is also
| single-threaded, and can't take advantage of multiple CPUs (even
| if tasks can be SMP scheduled).
|
| Their moat is supposedly their ASIL certification, but I see that
| value shrinking more and more over time for the following
| reasons:
|
| 1. If your product has a software-related failure, customers
| won't care about all of your certifications. Only the end
| product.
|
| 2. I'm not convinced that the QNX kernel is less buggy than the
| Linux kernel. Also, most failures don't tend to be kernel
| related.
| burstmode wrote:
| >If your product has a software-related failure, customers
| won't care about all of your certifications. Only the end
| product.
|
| If you're in a market where a ASIL certification is needed, the
| customers ONLY care about this certifications. I keeps them out
| of jail.
| nrclark wrote:
| Can you point me at some more detailed rules that support
| your assertion? Not trying to argue - I'm actually interested
| to read more details on that.
| throwaway173738 wrote:
| It's not really a rule, but rather in some environments you
| have to be able to say in court that you did everything you
| could to make sure your software worked safely and
| correctly. Sometimes you will be risking criminal charges
| if you can't.
| rwmj wrote:
| It's the reason why some companies, like IBM [disclosure: I
| work for Red Hat], seem to sell products even though there
| seems to be little rational reason why customers would buy
| them, as in they have poorer performance or quality at a
| much greater price. Those products are certified against
| dozens of financial, safety, security or other standards,
| and customers in certain markets (government, military,
| nuclear, automotive etc) simply have to buy the certified
| products. The consequences of not doing so range from
| products not being supported, all the way to going to jail
| for gross negligence.
|
| Edit: I wrote a rather highly rated HN comment about why
| Red Hat makes money last year:
| https://news.ycombinator.com/item?id=35588297
| jeffrallen wrote:
| Another example of this is FIPS-140 crypto. It is
| objectively bad crypto in the 2020's. But it's mandated
| in some settings for either bureaucratic reasons or due
| to regulatory capture.
| f1shy wrote:
| The truth is, too many managers have never read the ISO
| document, and follow the CYA methodology, and ask for
| everything to be certified. The ISO just says (bare with me
| with this stupid simplification) "do whatever you want, but
| make sure p(disaster)<1e-20. You have to be able to justify
| decisions, but will not helt having certified frameworks,
| os, and tools, if you did a bad FMEDA
| szundi wrote:
| Following this logic it seems to be a good choice to buy
| RHEL because you have no chance running linux with those
| probability margins that you just wrote. Electronic
| components might have those. So stay out of jail
| f1shy wrote:
| There is NO market where "ASIL" is required. Of course if
| something happens you better have a safety case as described
| in the ISO26262, or a good excuse. That being said, that a
| system has a safety case according to ISO26262 ASIL D, does
| bot mean at all that all pieces must be certified.
|
| Currently working in a project where ASIL D is reached by
| having an independent microcontroller, whatching out the
| whole QM mess.
| foooorsyth wrote:
| >There is NO market where "ASIL" is required
|
| Define "required". If every single legal department at
| every single major automotive company says "we must obtain
| ASIL-B certification for our gauge cluster software or we
| can't sell cars", does it matter if regulators don't
| overtly mandate it? The legal environments of all of the
| major automotive markets make it a de facto requirement.
| f1shy wrote:
| The ISO26262 was defined by the automakers themselves
| (almost all were represented in the committee) so yes,
| they want to follow it. There is no legal requirement. It
| does not specially help in case of litigation either.
| cbsks wrote:
| QNX Muon removed the global kernel lock so it has much better
| multi-core performance.
| jeffreygoesto wrote:
| https://www.qnx.com/developers/articles/rel_7063_0.html#what.
| ..
| foundry27 wrote:
| What kind of tooling modernization would even make a
| difference?
|
| At least SDP 8.0 overhauled the kernel to not be locked to a
| single thread anymore, which is nice IMO
| agustamir wrote:
| > kernel's performance is pretty low
|
| Can you elaborate on this? How is it "low"
| lfkdev wrote:
| What is going on with the comment section on this post?
| Jolter wrote:
| Clearly they are not implementing good bot protection. The
| results are not very surprising IMO.
| ralferoo wrote:
| I actually thought some of the comments were funny. Especially
| the one about the crab in the shell! No idea why they thought
| it was related to QNX, but an insight into the mind of spammers
| nonetheless.
| torginus wrote:
| Honestly this kinda shows me that no matter what degree of
| robustness we design into our systems (null saftety, memory
| safety, thread safety etc.), some types of system breaking bugs
| are unavoidable (such as DOSing the system by calling a system
| API function in an infinite loop), and are often impossible to
| distingushing from desired behavior.
| smaudet wrote:
| How about no infinite loops (as a start)?
|
| Unless you are the kernel, and you can demonstrate that your
| loop is "safe" via some set of static analysis.
| banish-m4 wrote:
| Achievable with formal verification and soft- and hard-
| realtime worst case timing validation. It's not impossible
| but also not easy. It requires significant engineering
| investment.
| banish-m4 wrote:
| That's shrugging off inferior processes and substandard work.
| Bugs are unnecessary because there is a finite amount of code
| and they can all be eliminated if the correct eyeballs spend
| sufficient time reviewing, testing, and simplifying behavior to
| focus on robust reliability and correctness. Breaking changes
| can be allowed in dev packs with semver release notes. There
| are no excuses for sloppy engineering.
| Animats wrote:
| _" At this point, an intermezzo with some QNX history is in
| order. A bit more than a decade ago, the QNX source code was
| available to the public. Back then, QNX had a vibrant open source
| community. People would experiment with the kernel, write various
| useful utilities and help each other in forums. QNX even had a
| fully featured Desktop GUI, ran Firefox and was self-hosting, so
| you could develop for QNX right on QNX itself with full IDE and
| compiler support. It was beautiful."_
|
| _" Then QNX was bought, source code access was revoked and the
| community largely withered away. Questions were increasingly
| asked via private support tickets directly to QNX, locked away
| from the public. QNX know-how becomes harder and harder to
| acquire, open source software for modern QNX releases is
| essentially non-existent and the driver situation is a
| catastrophe. The QNX kernel is the most beautiful and interesting
| kernel I have ever had the pleasure of working with, but it lies
| in the shackles of corporate ownership."_
|
| It's sad.
|
| QNX was originally an independent company. During that period,
| anyone could get a free copy of QNX for personal use. It wasn't
| open source, but it was available. It's POSIX-compatible, so it
| was a supported target for Gnu, Firefox, and Eclipse. We used QNX
| for our DARPA Grand Challenge vehicle in 2003-2005, and all that
| code was developed on desktop QNX.
|
| Then QNX was acquired by Harmon, the successor to Harmon-Kardon,
| which once made home audio components and pivoted to car audio.
| They were thinking car infotainment. Harmon didn't really know
| what to do with an operating system, especially since the big
| market was systems for industrial control and point of sale. So
| eventually they opened the source.
|
| Then QNX was acquired by Blackberry, the early smartphone
| company. They closed the source, very suddenly. They even killed
| off the free version for personal and educational use. So all
| third party open source development stopped. Blackberry
| eventually shipped a phone that ran QNX, but they were not
| powerful enough as a company to keep a third phone standard
| going. So Blackberry went to Android.
|
| Blackberry killed off the self-hosted desktop environment, and
| users now had to cross-compile from Windows.
|
| And QNX became more of a niche product than ever.
| akira2501 wrote:
| > but they were not powerful enough as a company to keep a
| third phone standard going.
|
| They absolutely were, which is the tragedy of the whole thing,
| people absolutely loved their products and strongly preferred
| them to everything else on the market.
|
| Instead of recognizing the game changer that the iPhone was
| they slept on the market and didn't do much to bring big touch
| screens and rich internet applications to their platform.
|
| It was a slow and agonizing death.
| asveikau wrote:
| > Instead of recognizing the game changer that the iPhone was
| they slept on the market
|
| But they made that error long before buying QNX. By the time
| they bought QNX they had probably lost too much ground to
| turn it around.
| fouc wrote:
| I've always thought QNX 4.24 w/ Photon microGUI should've been
| fully open sourced, even a decade later. It would've been
| competitive with linux in the desktop OS arena.
| mavhc wrote:
| The only way a second standard can survive is by being open
| source. A 3rd? Very unlikely
| kragen wrote:
| it's unfortunate that qnx wasn't open-source; revoking source
| code access would have been impossible
| arsome wrote:
| I actually ran across this issue myself, SIGQUIT'd the process,
| loaded it into a debugger and found the exact same problem. I can
| confirm the problem still exists on QNX 7.1. Fortunately we were
| moving off it, so I didn't think much more about it, but glad
| someone wrote it up.
| banish-m4 wrote:
| What I like about seL4 (although not a complete embedded dev
| platform) is formally-verification. QNX might have EAL4 in some
| configurations, but like most every other operating system on the
| planet, they haven't bothered to up their game by formally
| verifying it for correctness. This is a shame and entirely
| preventable with greater attention to testing and verification.
| kragen wrote:
| what sel4 shows is that it's entirely preventable with a
| rewrite from scratch by a team of formal-methods ph.d.s over
| many years, if they invent a design that allows the kernel to
| only be a few thousand lines of code so that the titanic effort
| of formal verification becomes feasible, barely. it's not
| something you can do with a codebase of millions of lines of
| code or a codebase that wasn't written from scratch with formal
| verification in mind. yet, anyway
___________________________________________________________________
(page generated 2024-06-30 23:00 UTC)