[HN Gopher] How to avoid a BSOD on your 2B dollar spacecraft
___________________________________________________________________
How to avoid a BSOD on your 2B dollar spacecraft
Author : linebeck
Score : 55 points
Date : 2024-09-25 18:40 UTC (4 hours ago)
(HTM) web link (clarkwakeland.com)
(TXT) w3m dump (clarkwakeland.com)
| sharpshadow wrote:
| One must have balls of steel to run windows on a spaceship.
| perching_aix wrote:
| Personally, I wouldn't be stoked to run Linux on them either to
| be honest. But both are being done. Practicality rules I
| suppose.
| anonzzzies wrote:
| Having sourcecode to everything would make me trust things
| more as at least we could fix things without calling MS.
|
| But what would you run? QNX? BSD?
| withinboredom wrote:
| Just because you have the source code, doesn't mean you
| have the knowledge to fix it before you die.
| anonzzzies wrote:
| Sure, but without it, you stand no chance at all.
| exe34 wrote:
| windows is source-available if you have deep enough
| pockets: https://www.microsoft.com/en-
| us/sharedsource/enterprise-sour...
| anonzzzies wrote:
| Anyway, I was responding to someone who wouldn't run
| either on a spaceship so I still would like to know what
| they would want to run. I am from a formal verification
| school of thought, so I would want something sel4.
| exe34 wrote:
| I'd want manual backup pushbuttons.
| foobar1962 wrote:
| And another box close enough so the CD tray can press
| them when it opens.
| 0cf8612b2e1e wrote:
| Does Microsoft also provide you the tools to build it? I
| assume there are many Microsoft internal tools,
| libraries, etc required to compile anything of note.
| Presumably it has been dog fooded for so long it would be
| impossible to bootstrap without some number of binary
| artifacts in hand.
| withinboredom wrote:
| I feel like this misses the point so much that it might
| as well be nonsense.
|
| A better way to put my argument is: could an average mom
| build Linux on specialized hardware in space? If the
| answer is "yes", then you may have a point.
|
| I don't think the answer is yes.
| johnisgood wrote:
| No, but you can review the source code before (or at any
| point), whereas with Windows, you cannot even do that.
| withinboredom wrote:
| I've occasionally worked on drivers for windows and
| linux. In either case, I didn't really need to read the
| source code; neither was it a valuable proposition. If
| the advertised API didn't do what it was supposed to, I
| likely wouldn't have understood enough to fix it: and
| this is my point.
|
| Just because you can read it, doesn't mean you can or
| will be able to actually fix it; not because of
| technicality, but because of personal knowledge.
|
| In this case, they are both black boxes.
| johnisgood wrote:
| I mean, I agree, I am just saying that it is better (in
| general) to have the source than not having it.
| withinboredom wrote:
| How so?
|
| I once spent three days trying to figure out an issue,
| stepping line by line through hadoop (after figuring out
| the issue was in hadoop and not my own code). Yay, I
| proved the issue was actually in Java itself. Guess what
| happened next? We avoided the bug. Why?
|
| - We couldn't update Java.
|
| - We couldn't change hadoop because we were using a
| packaged solution. So, we just filed a bug with them.
|
| Had the source not been available, we would have just
| skipped all of that, and it would have been our vendor's
| problem 3 days earlier.
| icehawk wrote:
| For something like the spacecraft in the article, you
| absolutely would have the ability to get access to the
| Windows source code.
| exe34 wrote:
| windows is source-available if you have deep enough
| pockets: https://www.microsoft.com/en-
| us/sharedsource/enterprise-sour...
| johnisgood wrote:
| I have not heard of anyone building their own, custom
| Windows though, how common is it? I do not see Windows
| forks around either (I get it, it would not be legal).
| GlenTheMachine wrote:
| VxWorks, usually
| redleader55 wrote:
| If you ever count how many patches are in between stable
| point releases - eg. 6.10.5 to 6.10.6, you'll see having
| the source code is not enough. All of these patches are
| fixes for something or another, not features, but fixes.
|
| If you look at an LTS branch, you'll see there are hundreds
| of point releases. Usually a point release is created once
| every 5-10 days. I interpret that to mean bugs were not
| found until many weeks after the LTS branch was cut.
| Obviously, not all of them affect you, but many patches
| apply important subsystems which affect you.
| ssrc wrote:
| VxWorks, LynxOS or RTEMS. RTEMS is open source.
| wubrr wrote:
| What's the practical benefit of using Windows here?
| wil421 wrote:
| The new galactic empire has an enterprise licensing
| agreement with Microsoft.
| jandrese wrote:
| Easier to find developers who are comfortable with Windows.
| jmclnx wrote:
| Really, with Linux (or a BSD), you can make tweaks for free
| to save memory and to only allow specific tasks and hardware.
| Plus you could publish these changes and maybe they will be
| accepted by upstream.
|
| With Windows, you need to beg and pay Microsoft for
| customizations and hope these changes will not cause other
| issues.
|
| Plus, most space projects are on a tight and limited budgets
| where management would rather spend on hardware than
| software.
| dylan604 wrote:
| Well, if your account is only a Home Edition, you will not get
| the same support as if you upgrade to Universal Galactic
| Edition which has a LTS measured in generations.
| hypeatei wrote:
| Wonder if it's LTSC or the standard image with candy crush,
| windows store, and Xbox game app bloat?
| lpribis wrote:
| Don't know if it's publically available, but MS makes Windows
| IoT which is a stripped down distro for embedded systems.
| geepytee wrote:
| Genuine question, what would you use instead and why?
| MPSimmons wrote:
| Depending on the complexity of the satellite, a linux node,
| several linux nodes, or a LOT of linux nodes. Or a very small
| embedded SoC if your satellite is very simple or has a
| segregated payload.
| jasonwatkinspdx wrote:
| Real Time Operating System(s) like VxWorks.
|
| Commodity equipment running specialized distros of Linux is a
| growing thing however.
| MPSimmons wrote:
| Linux just mainlined the RTOS patch that a lot of people
| have used. Including SpaceX - https://ntrs.nasa.gov/api/cit
| ations/20200002390/downloads/20...
|
| Linux RT merge: https://git.kernel.org/pub/scm/linux/kernel
| /git/torvalds/lin...
| scottyah wrote:
| I'm forgetting the name of it, but there's a special OS
| designed for space. The main issue is that bits get flipped
| all the time once you leave the protection of the
| magnetosphere. On the surface of the earth, we generally
| trust memory a lot more than you can in space, even with
| special chips and shielding. It causes all sorts of weird
| problems and slow-downs.
| cogman10 wrote:
| If I had barrels full of money to waste, probably a
| microkernel architecture like fuchsia [1]. The barrels full
| of money would be turning it into a real time OS. The benefit
| of such an OS is if there's a bug in the drivers, the kernel
| itself keeps on plugging along, it can dump and reload the
| misbehaving driver without crashing.
|
| [1] https://fuchsia.dev/
| wongarsu wrote:
| The article doesn't actually involve Windows. They wanted to
| avoid a satellite going into safemode, which they describe as
| "the satellite equivalent of a blue screen of death". That's
| honestly not even a good analogy. The headline is just bad
| hunter2_ wrote:
| For anyone reading only headlines and comments, from TFA:
|
| > If the watchdog timer has not been restarted and instead
| times out after ~30 seconds, the satellite enters something
| called safemode. Safemode is when all non critical functions
| are automatically shut down and the satellite becomes entirely
| focused on generating power by pointing its solar panels
| towards the Sun and trying to reestablish any communication
| that was lost. It's a state the vehicle goes into when
| something bad happens [...] the satellite equivalent of a blue
| screen of death.
|
| If only Windows would be so kind!
| neuralRiot wrote:
| Installing update 1 of 456. Please don't turn off your
| spaceship.
| lisper wrote:
| They're not running Windows.
|
| https://news.ycombinator.com/item?id=41651715
| GlenTheMachine wrote:
| Thee are a bunch of comments here asking why one would run
| Windows on a spacecraft.
|
| I am a spacecraft engineer. I don't see anything in the linked
| article indicating that they are actually running Windows - the
| BSOD claim is tongue-in-cheek, or at least that's how I read it.
| I also don't know of anyone anywhere that runs Windows on a
| spacecraft, with the exception of laptops used by astronauts.
| Typically one runs vxWorks, or maybe QNX. Some experimental (high
| risk, low cost) systems run Linux. Older spacecraft don't run any
| OS at all, everything is running on bare metal, and that may be
| true for a handful of current spacecraft as well.
|
| Windows is used in some places by ground controllers, but these
| days they tend to be running Linux a lot more often.
| TrueDuality wrote:
| Seconding the vxWorks and bare metal. Never seen Windows or
| Linux on a satellite bus. Haven't really touched payloads but
| I've seen some wonky things shipped to orbit by universities
| and not all them have been cubesat student projects.
| nicce wrote:
| Every Starlink runs with Linux.
|
| The license list is a bit long:
|
| https://www.starlink.com/assets/pdfs/Starlink-Open-Source-
| Co...
| zanthras wrote:
| Linux(with realtime patch) is used very heavily in spacecraft
| by Spacex. So both in terms of high visibility/important/danger
| (dragon 2) and high count (starlink) it is very widely used.
|
| citation
| https://old.reddit.com/r/spacex/comments/ncj4vz/we_are_the_s...
| XorNot wrote:
| I wonder how the integration of PREEMPT_RT is going to affect
| that technology stack going forwards (I imagine slowly, but
| it's there now).
| yndoendo wrote:
| Save costs by integration with the new feature or
| increasing cost with maintaining a custom kernel branch in
| the long run.
| lisper wrote:
| The author of TFA clarifies here:
|
| https://news.ycombinator.com/item?id=41651715
|
| TL;DR: the spacecraft is indeed not running Windows. It's
| running a custom OS written in C.
| aghilmort wrote:
| Wendy's tablet menus in NYC are windows and lits like whyyyyy
| just make them android web browser$$$$$$$
| farceSpherule wrote:
| Or you can avoid contracting with Boeing.
| rdist wrote:
| And here I thought we were going to rehash Crowdstrike ;-)
| TrueDuality wrote:
| Just a tactful reference hahah
|
| > the US government isn't burning taxpayer dollars on a ten
| figure spaceship just to have us push a Crowdstrike update on
| it.
| linebeck wrote:
| Author here: I should clarify the satellite is not running
| Windows. Instead, it's running its own custom OS written in C
| called Flight Software (FSW) specifically designed for the
| satellite onboard computer.
|
| Re-reading the post, I see how the title, my analogies, and poor
| attempts at humor would give the incorrect description of what's
| happening with the satellite when it enters safemode. I'll amend
| the post soon.
|
| Thanks for the feedback, I'll be better next time.
| barbegal wrote:
| Could I ask you to clarify why avoiding safemode is so
| important? In a non satellite system safemode means everything
| is driven to a safe state which is fine during testing in the
| lab.
|
| Also do you not run these tests in an even more simulated
| environment where there is only the flight computer and no real
| hardware at all?
| linebeck wrote:
| Having discussed this same question with the more experienced
| members of my team, the only conclusion I can draw is that
| the customer (US Government) is incredibly risk averse. Any
| unexpected entry into safemode would require a report,
| multiple meetings with the customer, and them being pretty
| angry. Their line of reasoning seems to be
| "Safemode->Something is wrong->Why is something wrong? We're
| not paying you to be wrong". I'm personally of the opinion
| that safemode isn't that bad. It's fully recoverable and
| shows the system is working properly.
|
| We normally have a Functional Test Assembly (real computer
| and some other hardware for testing) to run our tests
| against, but we only have one setup and it is consistently
| unreliable. This particular CLT was unable to get a clean run
| in the lab but it was decided that the issues were related to
| the lab setup rather than the actual test, so we moved
| forward to run on the satellite (against our team's
| protests).
|
| This to me is the real crux of the issue: if we can't even
| trust our own testing environment, what's the point of having
| it at all? If the customer is so risk averse, why would we
| take this chance? Needless to say, I don't think we'll be
| running anything on the satellite without full FTA vetting
| anytime in the near future.
| Jtsummers wrote:
| > Any unexpected entry into safemode would require a
| report, multiple meetings with the customer, and them being
| pretty angry. Their line of reasoning seems to be
| "Safemode->Something is wrong->Why is something wrong?
| We're not paying you to be wrong". I'm personally of the
| opinion that safemode isn't that bad. It's fully
| recoverable and shows the system is working properly.
|
| To the last part first: Good that safe mode kicked in and
| did the right thing, but now what? What _caused_ it to
| enter safe mode in the first place?
|
| That's why they care when it happens. If they don't know
| why it's entering safe mode, they can't correct the actual
| problems in the system.
| axus wrote:
| "Safemode is when all non critical functions are
| automatically shut down and the satellite becomes
| entirely focused on generating power by pointing its
| solar panels towards the Sun and trying to reestablish
| any communication that was lost."
|
| The non-critical functions are all the things the
| customer actually bought the satellite for. Cool that
| it's still alive, but now the Space Internet / death
| lasers / etc. are offline.
| topspin wrote:
| I understood you were using an analogy. Didn't even occur to me
| that Windows was actually being used.
|
| However, I did come away thinking there are other dysfunctions
| at play in all of this. Perhaps an excessive amount of wheel
| re-inventing.
| dangoodmanUT wrote:
| Step 1: Use linux
| ksajh wrote:
| Step 1: Read and understand the article
| imoverclocked wrote:
| Step 2: install vxworks
___________________________________________________________________
(page generated 2024-09-25 23:00 UTC)