hngopher.com

       [HN Gopher] How to avoid a BSOD on your 2B dollar spacecraft
       ___________________________________________________________________
        
       How to avoid a BSOD on your 2B dollar spacecraft
        
       Author : linebeck
       Score  : 172 points
       Date   : 2024-09-25 18:40 UTC (1 days ago)
        
 (HTM) web link (clarkwakeland.com)
 (TXT) w3m dump (clarkwakeland.com)
        
       | sharpshadow wrote:
       | One must have balls of steel to run windows on a spaceship.
        
         | perching_aix wrote:
         | Personally, I wouldn't be stoked to run Linux on them either to
         | be honest. But both are being done. Practicality rules I
         | suppose.
        
           | anonzzzies wrote:
           | Having sourcecode to everything would make me trust things
           | more as at least we could fix things without calling MS.
           | 
           | But what would you run? QNX? BSD?
        
             | withinboredom wrote:
             | Just because you have the source code, doesn't mean you
             | have the knowledge to fix it before you die.
        
               | anonzzzies wrote:
               | Sure, but without it, you stand no chance at all.
        
               | exe34 wrote:
               | windows is source-available if you have deep enough
               | pockets: https://www.microsoft.com/en-
               | us/sharedsource/enterprise-sour...
        
               | anonzzzies wrote:
               | Anyway, I was responding to someone who wouldn't run
               | either on a spaceship so I still would like to know what
               | they would want to run. I am from a formal verification
               | school of thought, so I would want something sel4.
        
               | exe34 wrote:
               | I'd want manual backup pushbuttons.
        
               | foobar1962 wrote:
               | And another box close enough so the CD tray can press
               | them when it opens.
        
               | 0cf8612b2e1e wrote:
               | Does Microsoft also provide you the tools to build it? I
               | assume there are many Microsoft internal tools,
               | libraries, etc required to compile anything of note.
               | Presumably it has been dog fooded for so long it would be
               | impossible to bootstrap without some number of binary
               | artifacts in hand.
        
               | withinboredom wrote:
               | I feel like this misses the point so much that it might
               | as well be nonsense.
               | 
               | A better way to put my argument is: could an average mom
               | build Linux on specialized hardware in space? If the
               | answer is "yes", then you may have a point.
               | 
               | I don't think the answer is yes.
        
               | johnisgood wrote:
               | No, but you can review the source code before (or at any
               | point), whereas with Windows, you cannot even do that.
        
               | withinboredom wrote:
               | I've occasionally worked on drivers for windows and
               | linux. In either case, I didn't really need to read the
               | source code; neither was it a valuable proposition. If
               | the advertised API didn't do what it was supposed to, I
               | likely wouldn't have understood enough to fix it: and
               | this is my point.
               | 
               | Just because you can read it, doesn't mean you can or
               | will be able to actually fix it; not because of
               | technicality, but because of personal knowledge.
               | 
               | In this case, they are both black boxes.
        
               | johnisgood wrote:
               | I mean, I agree, I am just saying that it is better (in
               | general) to have the source than not having it.
        
               | withinboredom wrote:
               | How so?
               | 
               | I once spent three days trying to figure out an issue,
               | stepping line by line through hadoop (after figuring out
               | the issue was in hadoop and not my own code). Yay, I
               | proved the issue was actually in Java itself. Guess what
               | happened next? We avoided the bug. Why?
               | 
               | - We couldn't update Java.
               | 
               | - We couldn't change hadoop because we were using a
               | packaged solution. So, we just filed a bug with them.
               | 
               | Had the source not been available, we would have just
               | skipped all of that, and it would have been our vendor's
               | problem 3 days earlier.
        
               | icehawk wrote:
               | For something like the spacecraft in the article, you
               | absolutely would have the ability to get access to the
               | Windows source code.
        
               | exe34 wrote:
               | windows is source-available if you have deep enough
               | pockets: https://www.microsoft.com/en-
               | us/sharedsource/enterprise-sour...
        
               | johnisgood wrote:
               | I have not heard of anyone building their own, custom
               | Windows though, how common is it? I do not see Windows
               | forks around either (I get it, it would not be legal).
        
               | pjmlp wrote:
               | What do you think some OEMs do with Windows on their
               | custom deployment devices?
               | 
               | This happens less nowadays, because stuff like Android
               | and ChromeOS, so why pay Microsoft when free beer OSes
               | exist.
        
             | GlenTheMachine wrote:
             | VxWorks, usually
        
             | redleader55 wrote:
             | If you ever count how many patches are in between stable
             | point releases - eg. 6.10.5 to 6.10.6, you'll see having
             | the source code is not enough. All of these patches are
             | fixes for something or another, not features, but fixes.
             | 
             | If you look at an LTS branch, you'll see there are hundreds
             | of point releases. Usually a point release is created once
             | every 5-10 days. I interpret that to mean bugs were not
             | found until many weeks after the LTS branch was cut.
             | Obviously, not all of them affect you, but many patches
             | apply important subsystems which affect you.
        
             | ssrc wrote:
             | VxWorks, LynxOS or RTEMS. RTEMS is open source.
        
               | inamberclad wrote:
               | +1 for RTEMS! Great fully featured RTOS
        
               | morcheeba wrote:
               | +1 for RTEMS, another happy user!
        
               | BSDobelix wrote:
               | RTEMS is great, especially impressed with the reliability
               | and performance of the RFS Filesystem:
               | 
               | https://gedare.github.io/pdf/agarwal_comparison_2019.pdf
        
             | perching_aix wrote:
             | Maybe sel4? I'm not in embedded so maybe I'm being pretty
             | silly here, but I think being formally proven and realtime
             | are pretty good things to have in the case of an expensive,
             | long lived space project.
             | 
             | That said, this is on the scale that I'm surprised off-the-
             | shelf things are even being considered. I'd have thought
             | they'd just roll something bespoke and that's the end of
             | that.
        
           | wubrr wrote:
           | What's the practical benefit of using Windows here?
        
             | wil421 wrote:
             | The new galactic empire has an enterprise licensing
             | agreement with Microsoft.
        
             | jandrese wrote:
             | Easier to find developers who are comfortable with Windows.
        
               | wubrr wrote:
               | Probably not the case in spacecraft engineering.
               | 
               | https://news.ycombinator.com/item?id=41650534#41651310
        
             | gosub100 wrote:
             | Maybe technically you could save $89 and run pirated
             | windows because of (possibly?) no jurisdiction in space?
        
           | jmclnx wrote:
           | Really, with Linux (or a BSD), you can make tweaks for free
           | to save memory and to only allow specific tasks and hardware.
           | Plus you could publish these changes and maybe they will be
           | accepted by upstream.
           | 
           | With Windows, you need to beg and pay Microsoft for
           | customizations and hope these changes will not cause other
           | issues.
           | 
           | Plus, most space projects are on a tight and limited budgets
           | where management would rather spend on hardware than
           | software.
        
         | dylan604 wrote:
         | Well, if your account is only a Home Edition, you will not get
         | the same support as if you upgrade to Universal Galactic
         | Edition which has a LTS measured in generations.
        
           | freedomben wrote:
           | Also, if you want RDP then I think you need to upgrade to
           | Universal Galactic Edition
        
             | pjmlp wrote:
             | Only if the ground control station has more than two
             | persons.
        
         | hypeatei wrote:
         | Wonder if it's LTSC or the standard image with candy crush,
         | windows store, and Xbox game app bloat?
        
           | lpribis wrote:
           | Don't know if it's publically available, but MS makes Windows
           | IoT which is a stripped down distro for embedded systems.
        
         | geepytee wrote:
         | Genuine question, what would you use instead and why?
        
           | MPSimmons wrote:
           | Depending on the complexity of the satellite, a linux node,
           | several linux nodes, or a LOT of linux nodes. Or a very small
           | embedded SoC if your satellite is very simple or has a
           | segregated payload.
        
           | jasonwatkinspdx wrote:
           | Real Time Operating System(s) like VxWorks.
           | 
           | Commodity equipment running specialized distros of Linux is a
           | growing thing however.
        
             | MPSimmons wrote:
             | Linux just mainlined the RTOS patch that a lot of people
             | have used. Including SpaceX - https://ntrs.nasa.gov/api/cit
             | ations/20200002390/downloads/20...
             | 
             | Linux RT merge: https://git.kernel.org/pub/scm/linux/kernel
             | /git/torvalds/lin...
        
             | gosub100 wrote:
             | What parts would actually need realtime response? Only
             | thing I can think of would be thrusters, but wouldn't that
             | be solved by an asic? Essentially a network command where
             | you say "I don't care specifically when you do this, but
             | I'm requesting a correction of M-magnitude in this
             | 3-vector. If I don't get an ack within 500ms cancel. "
        
               | jasonwatkinspdx wrote:
               | Navigation / attitude control needs to be real time. Even
               | if there's a microcontroller at the device, the guidance
               | system receives telemetry and then sends commands. If the
               | send is arbitrarily delayed then the resulting motion
               | won't be what the system desired. The microcontroller
               | can't do the complex calculations on its own because it
               | doesn't have all the telemetry from multiple sensors.
               | 
               | Also RTOS tend to have a lot of verification work applied
               | so that you can be confident some facilities just won't
               | fail.
        
           | scottyah wrote:
           | I'm forgetting the name of it, but there's a special OS
           | designed for space. The main issue is that bits get flipped
           | all the time once you leave the protection of the
           | magnetosphere. On the surface of the earth, we generally
           | trust memory a lot more than you can in space, even with
           | special chips and shielding. It causes all sorts of weird
           | problems and slow-downs.
        
           | cogman10 wrote:
           | If I had barrels full of money to waste, probably a
           | microkernel architecture like fuchsia [1]. The barrels full
           | of money would be turning it into a real time OS. The benefit
           | of such an OS is if there's a bug in the drivers, the kernel
           | itself keeps on plugging along, it can dump and reload the
           | misbehaving driver without crashing.
           | 
           | [1] https://fuchsia.dev/
        
         | wongarsu wrote:
         | The article doesn't actually involve Windows. They wanted to
         | avoid a satellite going into safemode, which they describe as
         | "the satellite equivalent of a blue screen of death". That's
         | honestly not even a good analogy. The headline is just bad
        
         | hunter2_ wrote:
         | For anyone reading only headlines and comments, from TFA:
         | 
         | > If the watchdog timer has not been restarted and instead
         | times out after ~30 seconds, the satellite enters something
         | called safemode. Safemode is when all non critical functions
         | are automatically shut down and the satellite becomes entirely
         | focused on generating power by pointing its solar panels
         | towards the Sun and trying to reestablish any communication
         | that was lost. It's a state the vehicle goes into when
         | something bad happens [...] the satellite equivalent of a blue
         | screen of death.
         | 
         | If only Windows would be so kind!
        
         | neuralRiot wrote:
         | Installing update 1 of 456. Please don't turn off your
         | spaceship.
        
         | lisper wrote:
         | They're not running Windows.
         | 
         | https://news.ycombinator.com/item?id=41651715
        
         | dekhn wrote:
         | 1998: Windows 95 on Thinkpads (not running system-critical)
         | https://www.nytimes.com/1998/11/05/technology/laptops-on-the...
         | 
         | Detailed writeup by project manager:
         | https://forum.nasaspaceflight.com/index.php?topic=27043.0
         | (doesn't mention the OS)
         | 
         | Some indication the laptops on ISS run linux on HP ZBooks:
         | https://training.linuxfoundation.org/solutions/corporate-sol...
        
       | GlenTheMachine wrote:
       | Thee are a bunch of comments here asking why one would run
       | Windows on a spacecraft.
       | 
       | I am a spacecraft engineer. I don't see anything in the linked
       | article indicating that they are actually running Windows - the
       | BSOD claim is tongue-in-cheek, or at least that's how I read it.
       | I also don't know of anyone anywhere that runs Windows on a
       | spacecraft, with the exception of laptops used by astronauts.
       | Typically one runs vxWorks, or maybe QNX. Some experimental (high
       | risk, low cost) systems run Linux. Older spacecraft don't run any
       | OS at all, everything is running on bare metal, and that may be
       | true for a handful of current spacecraft as well.
       | 
       | Windows is used in some places by ground controllers, but these
       | days they tend to be running Linux a lot more often.
        
         | TrueDuality wrote:
         | Seconding the vxWorks and bare metal. Never seen Windows or
         | Linux on a satellite bus. Haven't really touched payloads but
         | I've seen some wonky things shipped to orbit by universities
         | and not all them have been cubesat student projects.
        
           | nicce wrote:
           | Every Starlink runs with Linux.
           | 
           | The license list is a bit long:
           | 
           | https://www.starlink.com/assets/pdfs/Starlink-Open-Source-
           | Co...
        
             | MichaelZuo wrote:
             | It's surprising they didn't pick a RTOS.
        
               | gosub100 wrote:
               | Why? What benefit would that give them?
        
               | karlgkk wrote:
               | I can't tell if this a serious comment or not, but they
               | need an OS with realtime guarantees. They have claimed to
               | use linux in a rtos configuration, also probably have a
               | redundancy and voting-based failover system.
        
               | gosub100 wrote:
               | I can't tell if this is a serious answer to the question
               | or not, as you have answered it in terms of the question:
               | "why the need for RTOS? -> because they need RTOS". Stop
               | wasting time with replies like this that add no value.
        
               | karlgkk wrote:
               | They did.
        
               | MichaelZuo wrote:
               | How? That was a very recent change to linux.
        
               | wrs wrote:
               | It was merged to _mainline_ Linux last week. The patches
               | have been usable for many years if you compile a custom
               | kernel.
        
             | cobalt wrote:
             | that doesn't state what uses linux, angular.io is also on
             | there
        
             | mrpippy wrote:
             | That's likely software for the receiver/router/CPE that
             | customers use, since it's being distributed they have to
             | satisfy license obligations for it.
             | 
             | Software actually running on satellites isn't being
             | distributed, so there's no license obligations there
             | (unless it's AGPL, ha!)
        
               | nicce wrote:
               | It includes that software too, but they have claimed that
               | they use Linux, so I would assume that some software is
               | from the satellite itsef as well. It is still not clear
               | "what" obligations you have for a piece in the space.
        
               | lofaszvanitt wrote:
               | Yeah, why would a satellite have angular...
        
           | inamberclad wrote:
           | I've used Linux on the payload processor computer of a
           | spacecraft so I know it happens :P
           | 
           | I've also worked with a payload running Windows Embedded.
        
           | jdiez17 wrote:
           | Linux on a satellite bus is definitely up and coming. For
           | example ESA's OPS-SAT ran the NanoSat MO Framework
           | (https://nanosat-mo-framework.github.io/), written in Java
           | and running on an ARM Linux OBC. It also makes a lot of sense
           | for payload computers.
           | 
           | I'm working on a Linux distribution for space applications
           | that addresses some of the pain points (application
           | deployment, software update & memory-safe implementations of
           | typical space protocols like CCSDS/PUS). If all goes to plan
           | it will fly on a 1U CubeSat tech demonstrator this year, a
           | cybersecurity research 1U CubeSat next year, and a 2U high
           | performance satellite ... later. :)
        
         | zanthras wrote:
         | Linux(with realtime patch) is used very heavily in spacecraft
         | by Spacex. So both in terms of high visibility/important/danger
         | (dragon 2) and high count (starlink) it is very widely used.
         | 
         | citation
         | https://old.reddit.com/r/spacex/comments/ncj4vz/we_are_the_s...
        
           | XorNot wrote:
           | I wonder how the integration of PREEMPT_RT is going to affect
           | that technology stack going forwards (I imagine slowly, but
           | it's there now).
        
             | yndoendo wrote:
             | Save costs by integration with the new feature or
             | increasing cost with maintaining a custom kernel branch in
             | the long run.
        
             | chupasaurus wrote:
             | It was merged to mainline a week ago.
        
           | bboygravity wrote:
           | And apparently the astronaut !touchscreen! GUI is written in
           | Javascript (not a joke).
        
             | DrammBA wrote:
             | > And apparently the astronaut !touchscreen! GUI is written
             | in Javascript (not a joke).
             | 
             | why would that be mistaken for a joke?
        
               | vasco wrote:
               | Because they don't want people confusing the launch pad
               | with left-pad and having the spaceship slowing down
               | because of Facebook Like button embeds.
        
               | weard_beard wrote:
               | Attempt to give a serious answer: Javascript, as a
               | language, has some bizarre return types that can make the
               | kind of thorough testing required in spaceflight
               | difficult. (See: https://github.com/denysdovhan/wtfjs) It
               | also has a reputation, like PHP, as being utilized by
               | inexperienced programmers who write poorly structured,
               | poorly test, bloated, and slow code that often crashes
               | and fails. If you want fast, light weight, reliable, well
               | structured, testable, and ultimately very stable code
               | Javascript would seem to be a poor choice within the
               | parameters required for a space vehicle.
               | 
               | (Maybe this is a good place to ask, anyone have a
               | recommendation for static testing of JS?)
        
               | electrosphere wrote:
               | I believe it's because in JavaScript some values or
               | expressions are "truthy" or "falsey" depending on how
               | they are evaluated.
        
             | cliff wrote:
             | I think I was the person who originally proposed to
             | implement the crew control UI in a web browser, and I
             | participated in a week-long retreat in beautiful Bend,
             | Oregon where we implemented the first prototype.
             | 
             | At the time, some very good flight software engineers had
             | been working diligently on a new UI framework that was
             | written in the same code style and process as the rest of
             | our flight software. However, I noticed a classic problem -
             | we were working on the UI platform at the same time that we
             | were trying to design and prototype the actual UI.
             | 
             | I made some observations:
             | 
             | 1) We can create a prototype right now in Chrome, with its
             | incumbent versatility.
             | 
             | 2) The chip running the UI can actually reasonably run
             | Chrome.
             | 
             | 3) Web browsers are historically known for crashing, but
             | that's partly because they have to handle every page on the
             | whole Internet. A static system with the same browser
             | running a single website, heavily tested, may be reliable
             | enough for our needs.
             | 
             | 4) We can always go back and reimplement the UI on top of
             | the space-grade UI platform, and actually it'll be a lot
             | easier because we will know exactly functionality we need
             | out of that platform.
             | 
             | The prototype was a great success; we were able to
             | implement a lot of interesting UI in just a week.
             | 
             | I left SpaceX before Crew Dragon launched, so I'm not sure
             | what ended up launching or what the state of affairs is
             | today. I remember hearing some feedback from testing
             | sessions that the astronauts were pleasantly surprised when
             | we were able to live edit a button when they commented it
             | was too hard to reliably press it with their gloved finger.
             | 
             | As for reliability, to do a fair analysis you need to
             | understand the requirements of the mission. Only then can
             | you start thinking about faults and how to mitigate them.
             | This isn't like Apollo where the astronauts had to
             | physically reconfigure the spacecraft for each phase of the
             | mission -- to an exceptionally large extent, Dragon flies
             | itself. As a minor example of systemic fault tolerance,
             | each display is individually controlled by its own
             | processor. If a display fails, whether due to Chrome or
             | cosmic radiation, an astronaut can simply use a different
             | display.
             | 
             | Also, as a side note regarding "touchscreens" -- I believe
             | some (very important) buttons did launch with Crew Dragon,
             | but buttons and wiring are heavy, and weight is the enemy.
             | If you're going to have a screen anyways, making it a
             | touchscreen adds relatively trivial weight.
        
               | tobylane wrote:
               | When would the Chrome version be frozen? Once you've
               | completed the UI?
        
               | jve wrote:
               | So the implementation speed came solely from developer
               | experience and not someone pushing away this custom UI
               | framework implementation aside?
        
               | ramesh31 wrote:
               | >I think I was the person who originally proposed to
               | implement the crew control UI in a web browser, and I
               | participated in a week-long retreat in beautiful Bend,
               | Oregon where we implemented the first prototype.
               | 
               | Please tell me you have a blog
        
               | telotortium wrote:
               | https://hnrss.org/user?id=cliff
        
               | doctorpangloss wrote:
               | Why doesn't anyone at Boeing make these observations? I
               | don't think anyone needs to be persuaded that a browser
               | is a good UI middleware.
        
               | rblatz wrote:
               | I suspect that Boeing has a lot of momentum and the
               | risk/reward for pushing an initiative like that doesn't
               | make sense in that org.
        
               | mixmastamyk wrote:
               | At a minimum it should use typescript, no? Also web pages
               | get out of sync sometimes, and need to be reloaded, which
               | doesn't sound great for mission critical reliability.
               | Compiled, typed UI lib sounds like a better fit.
        
               | bearjaws wrote:
               | Wow a real SWE showing up and explaining how you can
               | actually approach a real problem using a browser. Instead
               | of just going "well its not compiled so clearly it will
               | just randomly explode".
               | 
               | I am always amazed how HN doesn't realize many mission &
               | life critical systems are powered by JS - especially as a
               | front end through a browser.
        
         | lisper wrote:
         | The author of TFA clarifies here:
         | 
         | https://news.ycombinator.com/item?id=41651715
         | 
         | TL;DR: the spacecraft is indeed not running Windows. It's
         | running a custom OS written in C.
        
         | yashap wrote:
         | Indeed, and it's clearly stated in the article:
         | 
         | > Safemode is the satellite equivalent of a blue screen of
         | death.
         | 
         | It's about avoiding safemode, and more generally about the end-
         | to-end QA/testing process for satellites before they're sent up
         | into orbit. It's very clearly not about actual Windows BSODs,
         | it's just written in a tongue-in-cheek style. Those commenting
         | about "wtf windows on a spacecraft" clearly didn't read the
         | article, just read the title.
         | 
         | FWIW I found the writing style engaging and the content
         | interesting. I guess the title is a little click-bait-y, but
         | not in a way that I minded much, and I probably wouldn't have
         | read an article titled "How to avoid safemode on a satellite."
         | It's a fine line, but titles DO have to draw you in, otherwise
         | you'll never read the article.
         | 
         | Re: the article itself, I did think it was pretty wild that
         | customers have to be informed of every incident where a
         | satellite flips into safemode in TESTING! In real operations,
         | sure, but in testing, that's wild. Feels like having to report
         | bugs caught in my local dev environment, that were never
         | deployed to prod.
        
           | GlenTheMachine wrote:
           | This would be during formal testing, which is similar to what
           | you might know as "acceptance testing". The spacecraft
           | doesn't "enter safe mode" during development.
           | 
           | If you're paying two billion $ for something you become very
           | very interested in test design and test results.
           | 
           | Also, safe mode isn't really the same as a BSOD. It's a mode
           | where the spacecraft decides something is wrong and disables
           | a lot of functionality and focuses on pointing the solar
           | panels at the sun and the antennas at the ground. It does not
           | cease functioning - if that happens, you've probably lost
           | your spacecraft. It is therefore VITALLY IMPORTANT that safe
           | mode works, and a smart program manager tests the hell out of
           | it.
        
             | Brian_K_White wrote:
             | So it's a bsod that switches to safe mode instead of
             | halting.
             | 
             | We already got that it's not actually Windows and so not
             | literally identical to bsod in every detail.
             | 
             | It's not the same as a common os safe mode either because
             | it happens by itself as the last resort response to a
             | problem, like a bsod. Not just on command.
        
           | sandworm101 wrote:
           | >> customers have to be informed of every incident where a
           | satellite flips into safemode in TESTING!
           | 
           | Because the customers are almost certainly running their own
           | metrics, tracking failure rates over time. An increasing rate
           | of failures across a program is probably a sign of something
           | going wrong at a higher level. Remember too that there is
           | "testing" and _testing_. One is you playing around with the
           | software at your workstation, the other is the more formal
           | testing as monitored by the acceptance and standards people.
        
         | firecall wrote:
         | So could this finally be the year of Linux on the Moon? ;-)
        
           | Fnoord wrote:
           | Which OS do the Mars rovers run?
        
             | GlenTheMachine wrote:
             | vxWorks
        
               | garaetjjte wrote:
               | The helicopter ran Linux.
        
         | freedomben wrote:
         | I worked on UAVs in the late 00s and early 10s and it was all
         | VxWorks as well. We were playing around with some embedded
         | Linux but it wasn't used on anything "serious," despite some of
         | our dreams.
        
         | rob74 wrote:
         | Well, it's a fair question, because the article jumps right
         | into extremely detailed specifics like "Closed Loop Tests" that
         | probably just people working in this domain are familiar with,
         | without first making clear _what exactly it 's talking about_.
         | "The lifecycle of most spacecraft consists of a final phase" -
         | as someone with only a cursory knowledge of spacecraft, at this
         | point I assumed he would be talking about deorbiting?
        
         | TacticalCoder wrote:
         | > Thee are a bunch of comments here asking why one would run
         | Windows on a spacecraft.
         | 
         | Because TFA is highly misleading
         | 
         | > I don't see anything in the linked article indicating that
         | they are actually running Windows
         | 
         | TFA literally begins with a picture of a Windows BSOD with a
         | Windows error message.
        
         | metadat wrote:
         | Is there any open-source equivalent to QNX?
        
       | farceSpherule wrote:
       | Or you can avoid contracting with Boeing.
        
       | rdist wrote:
       | And here I thought we were going to rehash Crowdstrike ;-)
        
         | TrueDuality wrote:
         | Just a tactful reference hahah
         | 
         | > the US government isn't burning taxpayer dollars on a ten
         | figure spaceship just to have us push a Crowdstrike update on
         | it.
        
         | gosub100 wrote:
         | I know you kid, but theoretically running crowdstrike-
         | susceptible windows on a spacecraft would work fine (at least I
         | claim so), because you'd need a robust backdoor / OOB into it
         | anywa (And I'm no windows fanboy, I hate them just as much as
         | the next guy). Crowdstrike bug would cause an N-day loss of
         | comms just like a thousand other things they plan for in a
         | spacecraft.
        
       | linebeck wrote:
       | Author here: I should clarify the satellite is not running
       | Windows. Instead, it's running its own custom OS written in C
       | called Flight Software (FSW) specifically designed for the
       | satellite onboard computer.
       | 
       | Re-reading the post, I see how the title, my analogies, and poor
       | attempts at humor would give the incorrect description of what's
       | happening with the satellite when it enters safemode. I'll amend
       | the post soon.
       | 
       | Thanks for the feedback, I'll be better next time.
        
         | barbegal wrote:
         | Could I ask you to clarify why avoiding safemode is so
         | important? In a non satellite system safemode means everything
         | is driven to a safe state which is fine during testing in the
         | lab.
         | 
         | Also do you not run these tests in an even more simulated
         | environment where there is only the flight computer and no real
         | hardware at all?
        
           | linebeck wrote:
           | Having discussed this same question with the more experienced
           | members of my team, the only conclusion I can draw is that
           | the customer (US Government) is incredibly risk averse. Any
           | unexpected entry into safemode would require a report,
           | multiple meetings with the customer, and them being pretty
           | angry. Their line of reasoning seems to be
           | "Safemode->Something is wrong->Why is something wrong? We're
           | not paying you to be wrong". I'm personally of the opinion
           | that safemode isn't that bad. It's fully recoverable and
           | shows the system is working properly.
           | 
           | We normally have a Functional Test Assembly (real computer
           | and some other hardware for testing) to run our tests
           | against, but we only have one setup and it is consistently
           | unreliable. This particular CLT was unable to get a clean run
           | in the lab but it was decided that the issues were related to
           | the lab setup rather than the actual test, so we moved
           | forward to run on the satellite (against our team's
           | protests).
           | 
           | This to me is the real crux of the issue: if we can't even
           | trust our own testing environment, what's the point of having
           | it at all? If the customer is so risk averse, why would we
           | take this chance? Needless to say, I don't think we'll be
           | running anything on the satellite without full FTA vetting
           | anytime in the near future.
        
             | Jtsummers wrote:
             | > Any unexpected entry into safemode would require a
             | report, multiple meetings with the customer, and them being
             | pretty angry. Their line of reasoning seems to be
             | "Safemode->Something is wrong->Why is something wrong?
             | We're not paying you to be wrong". I'm personally of the
             | opinion that safemode isn't that bad. It's fully
             | recoverable and shows the system is working properly.
             | 
             | To the last part first: Good that safe mode kicked in and
             | did the right thing, but now what? What _caused_ it to
             | enter safe mode in the first place?
             | 
             | That's why they care when it happens. If they don't know
             | why it's entering safe mode, they can't correct the actual
             | problems in the system.
        
               | axus wrote:
               | "Safemode is when all non critical functions are
               | automatically shut down and the satellite becomes
               | entirely focused on generating power by pointing its
               | solar panels towards the Sun and trying to reestablish
               | any communication that was lost."
               | 
               | The non-critical functions are all the things the
               | customer actually bought the satellite for. Cool that
               | it's still alive, but now the Space Internet / death
               | lasers / etc. are offline.
        
               | linebeck wrote:
               | There are faults IDs that trip if certain telemetry goes
               | outside of a normal range. If a safemode were to occur,
               | we would investigate which faults tripped and at what
               | time, and use those to construct a "story" of what
               | happened on the satellite before it entered safemode.
               | We're also constantly recording every telemetry that
               | comes down, so we could reference any telemetry we wanted
               | as far back as months in the past.
               | 
               | To your point, yes you're correct. The cause of the
               | safemode is much more interesting than the fact we
               | entered it.
        
             | minetest2048 wrote:
             | > We normally have a Functional Test Assembly (real
             | computer and some other hardware for testing) to run our
             | tests against, but we only have one setup and it is
             | consistently unreliable
             | 
             | Its interesting to see that someone with a 2B budget have
             | the same problem as someome with 5 million budget... we
             | have an engineering model for our cubesats but its flaky
        
         | topspin wrote:
         | I understood you were using an analogy. Didn't even occur to me
         | that Windows was actually being used.
         | 
         | However, I did come away thinking there are other dysfunctions
         | at play in all of this. Perhaps an excessive amount of wheel
         | re-inventing.
        
         | yashap wrote:
         | I enjoyed the humour, and the content. Personally I wouldn't
         | change it - it's kind of a click-bait title, but I never would
         | have read the article if it had a boring title, and I am glad I
         | read it.
        
         | akira2501 wrote:
         | Can you speak at all as to how the development on this software
         | is done? Is it distributed with centralized version control?
         | Does release and engineering process interact with the version
         | control at all? Are there mechanisms that link defect reports,
         | corrections, and sign offs back to version control and into the
         | build system?
         | 
         | I got lost recently in how the Shuttle software was managed,
         | mostly through IBM mainframes, and z/OSs facilities for all the
         | above. I'm curious how modern development looks in comparison.
        
           | linebeck wrote:
           | FSW development is done by a different team than mine but I
           | believe it's just managed through gitlab. Releases are done
           | through tags, and any updates that need to be made have
           | tickets created for them and are developed by the FSW team.
           | Final approval is given by certified product engineers and
           | then a new tag is created for that release. Like I said this
           | is a different team but from what I've seen the process is
           | fairly modern given how old our hardware is. I'm not sure of
           | the exact process of how it's loaded onto the satellite
           | through.
        
           | jdiez17 wrote:
           | > I got lost recently in how the Shuttle software was
           | managed, mostly through IBM mainframes, and z/OSs facilities
           | for all the above. I'm curious how modern development looks
           | in comparison.
           | 
           | Do you have any references for this? I also recently went
           | down a research rabbit hole of the history of computing on
           | Earth and in space - super interesting stuff. And the
           | parallels are quite obvious when you look at it.
        
         | wrs wrote:
         | Technical blog pro tip: Assume that many of your readers are
         | VERY literal-minded, and many of your other readers like their
         | humor obscure and as deadpan as possible. Sorry.
        
       | dangoodmanUT wrote:
       | Step 1: Use linux
        
         | ksajh wrote:
         | Step 1: Read and understand the article
        
         | imoverclocked wrote:
         | Step 2: install vxworks
        
       | jesprenj wrote:
       | Was the spacecraft from the event described in the article an
       | actual spacecraft in space or a simulation of a space mission on
       | the ground?
        
         | MadnessASAP wrote:
         | A simulation of a space mission on the ground with a satellite
         | that will eventually be in space.
         | 
         | Take your satellite, replace it's navigation/communication
         | inputs with ones generated by a reasonably high fidelity real
         | time physical simulation. Feed it's outputs back into said
         | simulation. Ensure the satellite does the right things at the
         | right time.
        
       | LorenPechtel wrote:
       | Why is it using memory-mapped stuff in the first place rather
       | than some sort of messaging system that would allow more
       | defensive programming?
        
         | gavinsyancey wrote:
         | If I were to guess -- At the lowest level, that's what the
         | hardware does. I wouldn't be surprised if they have a library
         | to wrap that with something more friendly, but something has to
         | translate that into writes to memory-mapped addresses, and
         | whatever that is was configured with the wrong addresses...
        
         | jdiez17 wrote:
         | To the shock and horror of many programming-inclined people, it
         | turns out that having "arbitrary memory read/write" commands on
         | a remote computer that *must* keep the mission going is quite
         | useful.
         | 
         | I can tell you a little first-hand account of where this helped
         | a satellite formation flight mission I worked on. The
         | communication system was working fine in terms of signal
         | strength, but many command packets were ignored (no response).
         | We were able to figure out that a message queue in the
         | processing pipeline was considered full, by strategically
         | reading certain memory locations. We then sent a memory patch
         | to the satellite to skip some of the processing steps and this
         | improved the communication system dramatically.
         | 
         | We also brought back to life the first CubeSat launched by my
         | university, BEESAT-1, by using arbitrary memory write to patch
         | some telemetry collection software to avoid a damaged section
         | of the onboard flash. Pretty cool story actually.
        
           | 0xDEAFBEAD wrote:
           | It occurs to me that writing software for spacecraft could
           | demand an entirely different paradigm than writing software
           | for traditional applications.
           | 
           | For example, you could use an OS that is deterministic down
           | to the last detail, and have a "digital twin" / virtual
           | machine of the spacecraft computer here on Earth, kept in
           | sync with all sensor and actuator activity out in space.
           | Before issuing any command to the spacecraft, you branch the
           | digital twin, issue the command on the branch, and make sure
           | everything looks good.
           | 
           | With this method, you wouldn't even need to read memory
           | locations on the spacecraft, you could just read them on the
           | digital twin. Then test the memory patch on the digital twin
           | and make very sure that it won't brick your spacecraft before
           | you transmit it.
        
             | jdiez17 wrote:
             | Interesting idea. If you had control over the whole
             | software stack down to the hardware, you could organize it
             | in such a way that all meaningful mutable state is kept in
             | a contiguous region of memory - and then you could just
             | download or sync that.
             | 
             | I can imagine that these "state dumps" would be quite big
             | though. And there is definitely some hidden state in the
             | hardware blocks themselves (SPI, I2C, whatever...).
        
           | LorenPechtel wrote:
           | Sounds like a very, very useful capability to put in for
           | diagnostic capability, but it's not exactly what I would like
           | for routine operations.
           | 
           | Things happen, and the electronics has to take radiation
           | hits. I'd make the software as defensive as feasible so
           | failures have less potential to cascade.
        
       | joelkevinjones wrote:
       | As much as I hate writing "getter" functions for referencing
       | global variables, I would when I knew I didn't have the right
       | address yet. Write them first to error out loudly, then when you
       | have the actual addresses replace the error out code.
        
       | jwrallie wrote:
       | I would bet the schedule didn't allow much time to doing
       | subsystem level test with on-board computer, so everyone went to
       | the big test praying for the best.
       | 
       | That or inexperienced programmers were involved, assuming they
       | were not scared of modifying memory addresses directly.
       | 
       | As for the safe-mode, if it happened maybe you could say you were
       | randomly injecting errors in the memory during runtime and
       | spacecraft entered safe mode as expected, would not be far off
       | from the truth, just do not mention it was unintended :)
        
       | taspeotis wrote:
       | https://www.fastcompany.com/28121/they-write-right-stuff
        
       | PoignardAzur wrote:
       | > _I think what surprised me the most was how nonchalant the
       | response was. We had documented all of our actions, so other
       | people had read what happened and knew something had gone on. I
       | wasn't expecting any fanfare but we weren't even debriefed on
       | what happened._
       | 
       | That's... Concerning. No root cause analysis? Not even an
       | internal one?
        
       | bronlund wrote:
       | Clickbait. Unlike british missile submarines, they are not using
       | Windows.
        
       | pif wrote:
       | Very simple: just _Write the Right Stuff_!
       | 
       | https://www.eng.auburn.edu/~kchang/comp6710/readings/They%20...
        
       ___________________________________________________________________
       (page generated 2024-09-26 23:02 UTC)