[HN Gopher] How to avoid a BSOD on your 2B dollar spacecraft
___________________________________________________________________
How to avoid a BSOD on your 2B dollar spacecraft
Author : linebeck
Score : 172 points
Date : 2024-09-25 18:40 UTC (1 days ago)
(HTM) web link (clarkwakeland.com)
(TXT) w3m dump (clarkwakeland.com)
| sharpshadow wrote:
| One must have balls of steel to run windows on a spaceship.
| perching_aix wrote:
| Personally, I wouldn't be stoked to run Linux on them either to
| be honest. But both are being done. Practicality rules I
| suppose.
| anonzzzies wrote:
| Having sourcecode to everything would make me trust things
| more as at least we could fix things without calling MS.
|
| But what would you run? QNX? BSD?
| withinboredom wrote:
| Just because you have the source code, doesn't mean you
| have the knowledge to fix it before you die.
| anonzzzies wrote:
| Sure, but without it, you stand no chance at all.
| exe34 wrote:
| windows is source-available if you have deep enough
| pockets: https://www.microsoft.com/en-
| us/sharedsource/enterprise-sour...
| anonzzzies wrote:
| Anyway, I was responding to someone who wouldn't run
| either on a spaceship so I still would like to know what
| they would want to run. I am from a formal verification
| school of thought, so I would want something sel4.
| exe34 wrote:
| I'd want manual backup pushbuttons.
| foobar1962 wrote:
| And another box close enough so the CD tray can press
| them when it opens.
| 0cf8612b2e1e wrote:
| Does Microsoft also provide you the tools to build it? I
| assume there are many Microsoft internal tools,
| libraries, etc required to compile anything of note.
| Presumably it has been dog fooded for so long it would be
| impossible to bootstrap without some number of binary
| artifacts in hand.
| withinboredom wrote:
| I feel like this misses the point so much that it might
| as well be nonsense.
|
| A better way to put my argument is: could an average mom
| build Linux on specialized hardware in space? If the
| answer is "yes", then you may have a point.
|
| I don't think the answer is yes.
| johnisgood wrote:
| No, but you can review the source code before (or at any
| point), whereas with Windows, you cannot even do that.
| withinboredom wrote:
| I've occasionally worked on drivers for windows and
| linux. In either case, I didn't really need to read the
| source code; neither was it a valuable proposition. If
| the advertised API didn't do what it was supposed to, I
| likely wouldn't have understood enough to fix it: and
| this is my point.
|
| Just because you can read it, doesn't mean you can or
| will be able to actually fix it; not because of
| technicality, but because of personal knowledge.
|
| In this case, they are both black boxes.
| johnisgood wrote:
| I mean, I agree, I am just saying that it is better (in
| general) to have the source than not having it.
| withinboredom wrote:
| How so?
|
| I once spent three days trying to figure out an issue,
| stepping line by line through hadoop (after figuring out
| the issue was in hadoop and not my own code). Yay, I
| proved the issue was actually in Java itself. Guess what
| happened next? We avoided the bug. Why?
|
| - We couldn't update Java.
|
| - We couldn't change hadoop because we were using a
| packaged solution. So, we just filed a bug with them.
|
| Had the source not been available, we would have just
| skipped all of that, and it would have been our vendor's
| problem 3 days earlier.
| icehawk wrote:
| For something like the spacecraft in the article, you
| absolutely would have the ability to get access to the
| Windows source code.
| exe34 wrote:
| windows is source-available if you have deep enough
| pockets: https://www.microsoft.com/en-
| us/sharedsource/enterprise-sour...
| johnisgood wrote:
| I have not heard of anyone building their own, custom
| Windows though, how common is it? I do not see Windows
| forks around either (I get it, it would not be legal).
| pjmlp wrote:
| What do you think some OEMs do with Windows on their
| custom deployment devices?
|
| This happens less nowadays, because stuff like Android
| and ChromeOS, so why pay Microsoft when free beer OSes
| exist.
| GlenTheMachine wrote:
| VxWorks, usually
| redleader55 wrote:
| If you ever count how many patches are in between stable
| point releases - eg. 6.10.5 to 6.10.6, you'll see having
| the source code is not enough. All of these patches are
| fixes for something or another, not features, but fixes.
|
| If you look at an LTS branch, you'll see there are hundreds
| of point releases. Usually a point release is created once
| every 5-10 days. I interpret that to mean bugs were not
| found until many weeks after the LTS branch was cut.
| Obviously, not all of them affect you, but many patches
| apply important subsystems which affect you.
| ssrc wrote:
| VxWorks, LynxOS or RTEMS. RTEMS is open source.
| inamberclad wrote:
| +1 for RTEMS! Great fully featured RTOS
| morcheeba wrote:
| +1 for RTEMS, another happy user!
| BSDobelix wrote:
| RTEMS is great, especially impressed with the reliability
| and performance of the RFS Filesystem:
|
| https://gedare.github.io/pdf/agarwal_comparison_2019.pdf
| perching_aix wrote:
| Maybe sel4? I'm not in embedded so maybe I'm being pretty
| silly here, but I think being formally proven and realtime
| are pretty good things to have in the case of an expensive,
| long lived space project.
|
| That said, this is on the scale that I'm surprised off-the-
| shelf things are even being considered. I'd have thought
| they'd just roll something bespoke and that's the end of
| that.
| wubrr wrote:
| What's the practical benefit of using Windows here?
| wil421 wrote:
| The new galactic empire has an enterprise licensing
| agreement with Microsoft.
| jandrese wrote:
| Easier to find developers who are comfortable with Windows.
| wubrr wrote:
| Probably not the case in spacecraft engineering.
|
| https://news.ycombinator.com/item?id=41650534#41651310
| gosub100 wrote:
| Maybe technically you could save $89 and run pirated
| windows because of (possibly?) no jurisdiction in space?
| jmclnx wrote:
| Really, with Linux (or a BSD), you can make tweaks for free
| to save memory and to only allow specific tasks and hardware.
| Plus you could publish these changes and maybe they will be
| accepted by upstream.
|
| With Windows, you need to beg and pay Microsoft for
| customizations and hope these changes will not cause other
| issues.
|
| Plus, most space projects are on a tight and limited budgets
| where management would rather spend on hardware than
| software.
| dylan604 wrote:
| Well, if your account is only a Home Edition, you will not get
| the same support as if you upgrade to Universal Galactic
| Edition which has a LTS measured in generations.
| freedomben wrote:
| Also, if you want RDP then I think you need to upgrade to
| Universal Galactic Edition
| pjmlp wrote:
| Only if the ground control station has more than two
| persons.
| hypeatei wrote:
| Wonder if it's LTSC or the standard image with candy crush,
| windows store, and Xbox game app bloat?
| lpribis wrote:
| Don't know if it's publically available, but MS makes Windows
| IoT which is a stripped down distro for embedded systems.
| geepytee wrote:
| Genuine question, what would you use instead and why?
| MPSimmons wrote:
| Depending on the complexity of the satellite, a linux node,
| several linux nodes, or a LOT of linux nodes. Or a very small
| embedded SoC if your satellite is very simple or has a
| segregated payload.
| jasonwatkinspdx wrote:
| Real Time Operating System(s) like VxWorks.
|
| Commodity equipment running specialized distros of Linux is a
| growing thing however.
| MPSimmons wrote:
| Linux just mainlined the RTOS patch that a lot of people
| have used. Including SpaceX - https://ntrs.nasa.gov/api/cit
| ations/20200002390/downloads/20...
|
| Linux RT merge: https://git.kernel.org/pub/scm/linux/kernel
| /git/torvalds/lin...
| gosub100 wrote:
| What parts would actually need realtime response? Only
| thing I can think of would be thrusters, but wouldn't that
| be solved by an asic? Essentially a network command where
| you say "I don't care specifically when you do this, but
| I'm requesting a correction of M-magnitude in this
| 3-vector. If I don't get an ack within 500ms cancel. "
| jasonwatkinspdx wrote:
| Navigation / attitude control needs to be real time. Even
| if there's a microcontroller at the device, the guidance
| system receives telemetry and then sends commands. If the
| send is arbitrarily delayed then the resulting motion
| won't be what the system desired. The microcontroller
| can't do the complex calculations on its own because it
| doesn't have all the telemetry from multiple sensors.
|
| Also RTOS tend to have a lot of verification work applied
| so that you can be confident some facilities just won't
| fail.
| scottyah wrote:
| I'm forgetting the name of it, but there's a special OS
| designed for space. The main issue is that bits get flipped
| all the time once you leave the protection of the
| magnetosphere. On the surface of the earth, we generally
| trust memory a lot more than you can in space, even with
| special chips and shielding. It causes all sorts of weird
| problems and slow-downs.
| cogman10 wrote:
| If I had barrels full of money to waste, probably a
| microkernel architecture like fuchsia [1]. The barrels full
| of money would be turning it into a real time OS. The benefit
| of such an OS is if there's a bug in the drivers, the kernel
| itself keeps on plugging along, it can dump and reload the
| misbehaving driver without crashing.
|
| [1] https://fuchsia.dev/
| wongarsu wrote:
| The article doesn't actually involve Windows. They wanted to
| avoid a satellite going into safemode, which they describe as
| "the satellite equivalent of a blue screen of death". That's
| honestly not even a good analogy. The headline is just bad
| hunter2_ wrote:
| For anyone reading only headlines and comments, from TFA:
|
| > If the watchdog timer has not been restarted and instead
| times out after ~30 seconds, the satellite enters something
| called safemode. Safemode is when all non critical functions
| are automatically shut down and the satellite becomes entirely
| focused on generating power by pointing its solar panels
| towards the Sun and trying to reestablish any communication
| that was lost. It's a state the vehicle goes into when
| something bad happens [...] the satellite equivalent of a blue
| screen of death.
|
| If only Windows would be so kind!
| neuralRiot wrote:
| Installing update 1 of 456. Please don't turn off your
| spaceship.
| lisper wrote:
| They're not running Windows.
|
| https://news.ycombinator.com/item?id=41651715
| dekhn wrote:
| 1998: Windows 95 on Thinkpads (not running system-critical)
| https://www.nytimes.com/1998/11/05/technology/laptops-on-the...
|
| Detailed writeup by project manager:
| https://forum.nasaspaceflight.com/index.php?topic=27043.0
| (doesn't mention the OS)
|
| Some indication the laptops on ISS run linux on HP ZBooks:
| https://training.linuxfoundation.org/solutions/corporate-sol...
| GlenTheMachine wrote:
| Thee are a bunch of comments here asking why one would run
| Windows on a spacecraft.
|
| I am a spacecraft engineer. I don't see anything in the linked
| article indicating that they are actually running Windows - the
| BSOD claim is tongue-in-cheek, or at least that's how I read it.
| I also don't know of anyone anywhere that runs Windows on a
| spacecraft, with the exception of laptops used by astronauts.
| Typically one runs vxWorks, or maybe QNX. Some experimental (high
| risk, low cost) systems run Linux. Older spacecraft don't run any
| OS at all, everything is running on bare metal, and that may be
| true for a handful of current spacecraft as well.
|
| Windows is used in some places by ground controllers, but these
| days they tend to be running Linux a lot more often.
| TrueDuality wrote:
| Seconding the vxWorks and bare metal. Never seen Windows or
| Linux on a satellite bus. Haven't really touched payloads but
| I've seen some wonky things shipped to orbit by universities
| and not all them have been cubesat student projects.
| nicce wrote:
| Every Starlink runs with Linux.
|
| The license list is a bit long:
|
| https://www.starlink.com/assets/pdfs/Starlink-Open-Source-
| Co...
| MichaelZuo wrote:
| It's surprising they didn't pick a RTOS.
| gosub100 wrote:
| Why? What benefit would that give them?
| karlgkk wrote:
| I can't tell if this a serious comment or not, but they
| need an OS with realtime guarantees. They have claimed to
| use linux in a rtos configuration, also probably have a
| redundancy and voting-based failover system.
| gosub100 wrote:
| I can't tell if this is a serious answer to the question
| or not, as you have answered it in terms of the question:
| "why the need for RTOS? -> because they need RTOS". Stop
| wasting time with replies like this that add no value.
| karlgkk wrote:
| They did.
| MichaelZuo wrote:
| How? That was a very recent change to linux.
| wrs wrote:
| It was merged to _mainline_ Linux last week. The patches
| have been usable for many years if you compile a custom
| kernel.
| cobalt wrote:
| that doesn't state what uses linux, angular.io is also on
| there
| mrpippy wrote:
| That's likely software for the receiver/router/CPE that
| customers use, since it's being distributed they have to
| satisfy license obligations for it.
|
| Software actually running on satellites isn't being
| distributed, so there's no license obligations there
| (unless it's AGPL, ha!)
| nicce wrote:
| It includes that software too, but they have claimed that
| they use Linux, so I would assume that some software is
| from the satellite itsef as well. It is still not clear
| "what" obligations you have for a piece in the space.
| lofaszvanitt wrote:
| Yeah, why would a satellite have angular...
| inamberclad wrote:
| I've used Linux on the payload processor computer of a
| spacecraft so I know it happens :P
|
| I've also worked with a payload running Windows Embedded.
| jdiez17 wrote:
| Linux on a satellite bus is definitely up and coming. For
| example ESA's OPS-SAT ran the NanoSat MO Framework
| (https://nanosat-mo-framework.github.io/), written in Java
| and running on an ARM Linux OBC. It also makes a lot of sense
| for payload computers.
|
| I'm working on a Linux distribution for space applications
| that addresses some of the pain points (application
| deployment, software update & memory-safe implementations of
| typical space protocols like CCSDS/PUS). If all goes to plan
| it will fly on a 1U CubeSat tech demonstrator this year, a
| cybersecurity research 1U CubeSat next year, and a 2U high
| performance satellite ... later. :)
| zanthras wrote:
| Linux(with realtime patch) is used very heavily in spacecraft
| by Spacex. So both in terms of high visibility/important/danger
| (dragon 2) and high count (starlink) it is very widely used.
|
| citation
| https://old.reddit.com/r/spacex/comments/ncj4vz/we_are_the_s...
| XorNot wrote:
| I wonder how the integration of PREEMPT_RT is going to affect
| that technology stack going forwards (I imagine slowly, but
| it's there now).
| yndoendo wrote:
| Save costs by integration with the new feature or
| increasing cost with maintaining a custom kernel branch in
| the long run.
| chupasaurus wrote:
| It was merged to mainline a week ago.
| bboygravity wrote:
| And apparently the astronaut !touchscreen! GUI is written in
| Javascript (not a joke).
| DrammBA wrote:
| > And apparently the astronaut !touchscreen! GUI is written
| in Javascript (not a joke).
|
| why would that be mistaken for a joke?
| vasco wrote:
| Because they don't want people confusing the launch pad
| with left-pad and having the spaceship slowing down
| because of Facebook Like button embeds.
| weard_beard wrote:
| Attempt to give a serious answer: Javascript, as a
| language, has some bizarre return types that can make the
| kind of thorough testing required in spaceflight
| difficult. (See: https://github.com/denysdovhan/wtfjs) It
| also has a reputation, like PHP, as being utilized by
| inexperienced programmers who write poorly structured,
| poorly test, bloated, and slow code that often crashes
| and fails. If you want fast, light weight, reliable, well
| structured, testable, and ultimately very stable code
| Javascript would seem to be a poor choice within the
| parameters required for a space vehicle.
|
| (Maybe this is a good place to ask, anyone have a
| recommendation for static testing of JS?)
| electrosphere wrote:
| I believe it's because in JavaScript some values or
| expressions are "truthy" or "falsey" depending on how
| they are evaluated.
| cliff wrote:
| I think I was the person who originally proposed to
| implement the crew control UI in a web browser, and I
| participated in a week-long retreat in beautiful Bend,
| Oregon where we implemented the first prototype.
|
| At the time, some very good flight software engineers had
| been working diligently on a new UI framework that was
| written in the same code style and process as the rest of
| our flight software. However, I noticed a classic problem -
| we were working on the UI platform at the same time that we
| were trying to design and prototype the actual UI.
|
| I made some observations:
|
| 1) We can create a prototype right now in Chrome, with its
| incumbent versatility.
|
| 2) The chip running the UI can actually reasonably run
| Chrome.
|
| 3) Web browsers are historically known for crashing, but
| that's partly because they have to handle every page on the
| whole Internet. A static system with the same browser
| running a single website, heavily tested, may be reliable
| enough for our needs.
|
| 4) We can always go back and reimplement the UI on top of
| the space-grade UI platform, and actually it'll be a lot
| easier because we will know exactly functionality we need
| out of that platform.
|
| The prototype was a great success; we were able to
| implement a lot of interesting UI in just a week.
|
| I left SpaceX before Crew Dragon launched, so I'm not sure
| what ended up launching or what the state of affairs is
| today. I remember hearing some feedback from testing
| sessions that the astronauts were pleasantly surprised when
| we were able to live edit a button when they commented it
| was too hard to reliably press it with their gloved finger.
|
| As for reliability, to do a fair analysis you need to
| understand the requirements of the mission. Only then can
| you start thinking about faults and how to mitigate them.
| This isn't like Apollo where the astronauts had to
| physically reconfigure the spacecraft for each phase of the
| mission -- to an exceptionally large extent, Dragon flies
| itself. As a minor example of systemic fault tolerance,
| each display is individually controlled by its own
| processor. If a display fails, whether due to Chrome or
| cosmic radiation, an astronaut can simply use a different
| display.
|
| Also, as a side note regarding "touchscreens" -- I believe
| some (very important) buttons did launch with Crew Dragon,
| but buttons and wiring are heavy, and weight is the enemy.
| If you're going to have a screen anyways, making it a
| touchscreen adds relatively trivial weight.
| tobylane wrote:
| When would the Chrome version be frozen? Once you've
| completed the UI?
| jve wrote:
| So the implementation speed came solely from developer
| experience and not someone pushing away this custom UI
| framework implementation aside?
| ramesh31 wrote:
| >I think I was the person who originally proposed to
| implement the crew control UI in a web browser, and I
| participated in a week-long retreat in beautiful Bend,
| Oregon where we implemented the first prototype.
|
| Please tell me you have a blog
| telotortium wrote:
| https://hnrss.org/user?id=cliff
| doctorpangloss wrote:
| Why doesn't anyone at Boeing make these observations? I
| don't think anyone needs to be persuaded that a browser
| is a good UI middleware.
| rblatz wrote:
| I suspect that Boeing has a lot of momentum and the
| risk/reward for pushing an initiative like that doesn't
| make sense in that org.
| mixmastamyk wrote:
| At a minimum it should use typescript, no? Also web pages
| get out of sync sometimes, and need to be reloaded, which
| doesn't sound great for mission critical reliability.
| Compiled, typed UI lib sounds like a better fit.
| bearjaws wrote:
| Wow a real SWE showing up and explaining how you can
| actually approach a real problem using a browser. Instead
| of just going "well its not compiled so clearly it will
| just randomly explode".
|
| I am always amazed how HN doesn't realize many mission &
| life critical systems are powered by JS - especially as a
| front end through a browser.
| lisper wrote:
| The author of TFA clarifies here:
|
| https://news.ycombinator.com/item?id=41651715
|
| TL;DR: the spacecraft is indeed not running Windows. It's
| running a custom OS written in C.
| yashap wrote:
| Indeed, and it's clearly stated in the article:
|
| > Safemode is the satellite equivalent of a blue screen of
| death.
|
| It's about avoiding safemode, and more generally about the end-
| to-end QA/testing process for satellites before they're sent up
| into orbit. It's very clearly not about actual Windows BSODs,
| it's just written in a tongue-in-cheek style. Those commenting
| about "wtf windows on a spacecraft" clearly didn't read the
| article, just read the title.
|
| FWIW I found the writing style engaging and the content
| interesting. I guess the title is a little click-bait-y, but
| not in a way that I minded much, and I probably wouldn't have
| read an article titled "How to avoid safemode on a satellite."
| It's a fine line, but titles DO have to draw you in, otherwise
| you'll never read the article.
|
| Re: the article itself, I did think it was pretty wild that
| customers have to be informed of every incident where a
| satellite flips into safemode in TESTING! In real operations,
| sure, but in testing, that's wild. Feels like having to report
| bugs caught in my local dev environment, that were never
| deployed to prod.
| GlenTheMachine wrote:
| This would be during formal testing, which is similar to what
| you might know as "acceptance testing". The spacecraft
| doesn't "enter safe mode" during development.
|
| If you're paying two billion $ for something you become very
| very interested in test design and test results.
|
| Also, safe mode isn't really the same as a BSOD. It's a mode
| where the spacecraft decides something is wrong and disables
| a lot of functionality and focuses on pointing the solar
| panels at the sun and the antennas at the ground. It does not
| cease functioning - if that happens, you've probably lost
| your spacecraft. It is therefore VITALLY IMPORTANT that safe
| mode works, and a smart program manager tests the hell out of
| it.
| Brian_K_White wrote:
| So it's a bsod that switches to safe mode instead of
| halting.
|
| We already got that it's not actually Windows and so not
| literally identical to bsod in every detail.
|
| It's not the same as a common os safe mode either because
| it happens by itself as the last resort response to a
| problem, like a bsod. Not just on command.
| sandworm101 wrote:
| >> customers have to be informed of every incident where a
| satellite flips into safemode in TESTING!
|
| Because the customers are almost certainly running their own
| metrics, tracking failure rates over time. An increasing rate
| of failures across a program is probably a sign of something
| going wrong at a higher level. Remember too that there is
| "testing" and _testing_. One is you playing around with the
| software at your workstation, the other is the more formal
| testing as monitored by the acceptance and standards people.
| firecall wrote:
| So could this finally be the year of Linux on the Moon? ;-)
| Fnoord wrote:
| Which OS do the Mars rovers run?
| GlenTheMachine wrote:
| vxWorks
| garaetjjte wrote:
| The helicopter ran Linux.
| freedomben wrote:
| I worked on UAVs in the late 00s and early 10s and it was all
| VxWorks as well. We were playing around with some embedded
| Linux but it wasn't used on anything "serious," despite some of
| our dreams.
| rob74 wrote:
| Well, it's a fair question, because the article jumps right
| into extremely detailed specifics like "Closed Loop Tests" that
| probably just people working in this domain are familiar with,
| without first making clear _what exactly it 's talking about_.
| "The lifecycle of most spacecraft consists of a final phase" -
| as someone with only a cursory knowledge of spacecraft, at this
| point I assumed he would be talking about deorbiting?
| TacticalCoder wrote:
| > Thee are a bunch of comments here asking why one would run
| Windows on a spacecraft.
|
| Because TFA is highly misleading
|
| > I don't see anything in the linked article indicating that
| they are actually running Windows
|
| TFA literally begins with a picture of a Windows BSOD with a
| Windows error message.
| metadat wrote:
| Is there any open-source equivalent to QNX?
| farceSpherule wrote:
| Or you can avoid contracting with Boeing.
| rdist wrote:
| And here I thought we were going to rehash Crowdstrike ;-)
| TrueDuality wrote:
| Just a tactful reference hahah
|
| > the US government isn't burning taxpayer dollars on a ten
| figure spaceship just to have us push a Crowdstrike update on
| it.
| gosub100 wrote:
| I know you kid, but theoretically running crowdstrike-
| susceptible windows on a spacecraft would work fine (at least I
| claim so), because you'd need a robust backdoor / OOB into it
| anywa (And I'm no windows fanboy, I hate them just as much as
| the next guy). Crowdstrike bug would cause an N-day loss of
| comms just like a thousand other things they plan for in a
| spacecraft.
| linebeck wrote:
| Author here: I should clarify the satellite is not running
| Windows. Instead, it's running its own custom OS written in C
| called Flight Software (FSW) specifically designed for the
| satellite onboard computer.
|
| Re-reading the post, I see how the title, my analogies, and poor
| attempts at humor would give the incorrect description of what's
| happening with the satellite when it enters safemode. I'll amend
| the post soon.
|
| Thanks for the feedback, I'll be better next time.
| barbegal wrote:
| Could I ask you to clarify why avoiding safemode is so
| important? In a non satellite system safemode means everything
| is driven to a safe state which is fine during testing in the
| lab.
|
| Also do you not run these tests in an even more simulated
| environment where there is only the flight computer and no real
| hardware at all?
| linebeck wrote:
| Having discussed this same question with the more experienced
| members of my team, the only conclusion I can draw is that
| the customer (US Government) is incredibly risk averse. Any
| unexpected entry into safemode would require a report,
| multiple meetings with the customer, and them being pretty
| angry. Their line of reasoning seems to be
| "Safemode->Something is wrong->Why is something wrong? We're
| not paying you to be wrong". I'm personally of the opinion
| that safemode isn't that bad. It's fully recoverable and
| shows the system is working properly.
|
| We normally have a Functional Test Assembly (real computer
| and some other hardware for testing) to run our tests
| against, but we only have one setup and it is consistently
| unreliable. This particular CLT was unable to get a clean run
| in the lab but it was decided that the issues were related to
| the lab setup rather than the actual test, so we moved
| forward to run on the satellite (against our team's
| protests).
|
| This to me is the real crux of the issue: if we can't even
| trust our own testing environment, what's the point of having
| it at all? If the customer is so risk averse, why would we
| take this chance? Needless to say, I don't think we'll be
| running anything on the satellite without full FTA vetting
| anytime in the near future.
| Jtsummers wrote:
| > Any unexpected entry into safemode would require a
| report, multiple meetings with the customer, and them being
| pretty angry. Their line of reasoning seems to be
| "Safemode->Something is wrong->Why is something wrong?
| We're not paying you to be wrong". I'm personally of the
| opinion that safemode isn't that bad. It's fully
| recoverable and shows the system is working properly.
|
| To the last part first: Good that safe mode kicked in and
| did the right thing, but now what? What _caused_ it to
| enter safe mode in the first place?
|
| That's why they care when it happens. If they don't know
| why it's entering safe mode, they can't correct the actual
| problems in the system.
| axus wrote:
| "Safemode is when all non critical functions are
| automatically shut down and the satellite becomes
| entirely focused on generating power by pointing its
| solar panels towards the Sun and trying to reestablish
| any communication that was lost."
|
| The non-critical functions are all the things the
| customer actually bought the satellite for. Cool that
| it's still alive, but now the Space Internet / death
| lasers / etc. are offline.
| linebeck wrote:
| There are faults IDs that trip if certain telemetry goes
| outside of a normal range. If a safemode were to occur,
| we would investigate which faults tripped and at what
| time, and use those to construct a "story" of what
| happened on the satellite before it entered safemode.
| We're also constantly recording every telemetry that
| comes down, so we could reference any telemetry we wanted
| as far back as months in the past.
|
| To your point, yes you're correct. The cause of the
| safemode is much more interesting than the fact we
| entered it.
| minetest2048 wrote:
| > We normally have a Functional Test Assembly (real
| computer and some other hardware for testing) to run our
| tests against, but we only have one setup and it is
| consistently unreliable
|
| Its interesting to see that someone with a 2B budget have
| the same problem as someome with 5 million budget... we
| have an engineering model for our cubesats but its flaky
| topspin wrote:
| I understood you were using an analogy. Didn't even occur to me
| that Windows was actually being used.
|
| However, I did come away thinking there are other dysfunctions
| at play in all of this. Perhaps an excessive amount of wheel
| re-inventing.
| yashap wrote:
| I enjoyed the humour, and the content. Personally I wouldn't
| change it - it's kind of a click-bait title, but I never would
| have read the article if it had a boring title, and I am glad I
| read it.
| akira2501 wrote:
| Can you speak at all as to how the development on this software
| is done? Is it distributed with centralized version control?
| Does release and engineering process interact with the version
| control at all? Are there mechanisms that link defect reports,
| corrections, and sign offs back to version control and into the
| build system?
|
| I got lost recently in how the Shuttle software was managed,
| mostly through IBM mainframes, and z/OSs facilities for all the
| above. I'm curious how modern development looks in comparison.
| linebeck wrote:
| FSW development is done by a different team than mine but I
| believe it's just managed through gitlab. Releases are done
| through tags, and any updates that need to be made have
| tickets created for them and are developed by the FSW team.
| Final approval is given by certified product engineers and
| then a new tag is created for that release. Like I said this
| is a different team but from what I've seen the process is
| fairly modern given how old our hardware is. I'm not sure of
| the exact process of how it's loaded onto the satellite
| through.
| jdiez17 wrote:
| > I got lost recently in how the Shuttle software was
| managed, mostly through IBM mainframes, and z/OSs facilities
| for all the above. I'm curious how modern development looks
| in comparison.
|
| Do you have any references for this? I also recently went
| down a research rabbit hole of the history of computing on
| Earth and in space - super interesting stuff. And the
| parallels are quite obvious when you look at it.
| wrs wrote:
| Technical blog pro tip: Assume that many of your readers are
| VERY literal-minded, and many of your other readers like their
| humor obscure and as deadpan as possible. Sorry.
| dangoodmanUT wrote:
| Step 1: Use linux
| ksajh wrote:
| Step 1: Read and understand the article
| imoverclocked wrote:
| Step 2: install vxworks
| jesprenj wrote:
| Was the spacecraft from the event described in the article an
| actual spacecraft in space or a simulation of a space mission on
| the ground?
| MadnessASAP wrote:
| A simulation of a space mission on the ground with a satellite
| that will eventually be in space.
|
| Take your satellite, replace it's navigation/communication
| inputs with ones generated by a reasonably high fidelity real
| time physical simulation. Feed it's outputs back into said
| simulation. Ensure the satellite does the right things at the
| right time.
| LorenPechtel wrote:
| Why is it using memory-mapped stuff in the first place rather
| than some sort of messaging system that would allow more
| defensive programming?
| gavinsyancey wrote:
| If I were to guess -- At the lowest level, that's what the
| hardware does. I wouldn't be surprised if they have a library
| to wrap that with something more friendly, but something has to
| translate that into writes to memory-mapped addresses, and
| whatever that is was configured with the wrong addresses...
| jdiez17 wrote:
| To the shock and horror of many programming-inclined people, it
| turns out that having "arbitrary memory read/write" commands on
| a remote computer that *must* keep the mission going is quite
| useful.
|
| I can tell you a little first-hand account of where this helped
| a satellite formation flight mission I worked on. The
| communication system was working fine in terms of signal
| strength, but many command packets were ignored (no response).
| We were able to figure out that a message queue in the
| processing pipeline was considered full, by strategically
| reading certain memory locations. We then sent a memory patch
| to the satellite to skip some of the processing steps and this
| improved the communication system dramatically.
|
| We also brought back to life the first CubeSat launched by my
| university, BEESAT-1, by using arbitrary memory write to patch
| some telemetry collection software to avoid a damaged section
| of the onboard flash. Pretty cool story actually.
| 0xDEAFBEAD wrote:
| It occurs to me that writing software for spacecraft could
| demand an entirely different paradigm than writing software
| for traditional applications.
|
| For example, you could use an OS that is deterministic down
| to the last detail, and have a "digital twin" / virtual
| machine of the spacecraft computer here on Earth, kept in
| sync with all sensor and actuator activity out in space.
| Before issuing any command to the spacecraft, you branch the
| digital twin, issue the command on the branch, and make sure
| everything looks good.
|
| With this method, you wouldn't even need to read memory
| locations on the spacecraft, you could just read them on the
| digital twin. Then test the memory patch on the digital twin
| and make very sure that it won't brick your spacecraft before
| you transmit it.
| jdiez17 wrote:
| Interesting idea. If you had control over the whole
| software stack down to the hardware, you could organize it
| in such a way that all meaningful mutable state is kept in
| a contiguous region of memory - and then you could just
| download or sync that.
|
| I can imagine that these "state dumps" would be quite big
| though. And there is definitely some hidden state in the
| hardware blocks themselves (SPI, I2C, whatever...).
| LorenPechtel wrote:
| Sounds like a very, very useful capability to put in for
| diagnostic capability, but it's not exactly what I would like
| for routine operations.
|
| Things happen, and the electronics has to take radiation
| hits. I'd make the software as defensive as feasible so
| failures have less potential to cascade.
| joelkevinjones wrote:
| As much as I hate writing "getter" functions for referencing
| global variables, I would when I knew I didn't have the right
| address yet. Write them first to error out loudly, then when you
| have the actual addresses replace the error out code.
| jwrallie wrote:
| I would bet the schedule didn't allow much time to doing
| subsystem level test with on-board computer, so everyone went to
| the big test praying for the best.
|
| That or inexperienced programmers were involved, assuming they
| were not scared of modifying memory addresses directly.
|
| As for the safe-mode, if it happened maybe you could say you were
| randomly injecting errors in the memory during runtime and
| spacecraft entered safe mode as expected, would not be far off
| from the truth, just do not mention it was unintended :)
| taspeotis wrote:
| https://www.fastcompany.com/28121/they-write-right-stuff
| PoignardAzur wrote:
| > _I think what surprised me the most was how nonchalant the
| response was. We had documented all of our actions, so other
| people had read what happened and knew something had gone on. I
| wasn't expecting any fanfare but we weren't even debriefed on
| what happened._
|
| That's... Concerning. No root cause analysis? Not even an
| internal one?
| bronlund wrote:
| Clickbait. Unlike british missile submarines, they are not using
| Windows.
| pif wrote:
| Very simple: just _Write the Right Stuff_!
|
| https://www.eng.auburn.edu/~kchang/comp6710/readings/They%20...
___________________________________________________________________
(page generated 2024-09-26 23:02 UTC)