[HN Gopher] Air traffic failure caused by two locations 3600nm a...
___________________________________________________________________
Air traffic failure caused by two locations 3600nm apart sharing
3-letter code
Author : basilesimon
Score : 119 points
Date : 2024-11-14 15:04 UTC (4 days ago)
(HTM) web link (www.flightglobal.com)
(TXT) w3m dump (www.flightglobal.com)
| Optimal_Persona wrote:
| Well, 3600 billionths of a meter IS kinda close...just sayin'
| dh2022 wrote:
| I read it the same way....
| marky1991 wrote:
| What did they mean, if not 'nanometers'?
| abracadaniel wrote:
| Nautical miles
| bilekas wrote:
| I was thinking the same and thinking that's a super weird edge
| case to happen. I'm obviously tired.
| FateOfNations wrote:
| Good news: the system successfully detected an error and didn't
| send bad data to air traffic controllers.
|
| Bad News: the system can't recover from an error in an individual
| flight plan, bringing the whole system down with it (along with
| the backup system since it was running the same code).
| wyldfire wrote:
| > he system can't recover from an error in an individual flight
| plan, bringing the whole system down with it
|
| From the system's POV maybe this is the right way to resolve
| the problem. Could masking the failure by obscuring this
| flight's waypoint problem have resulted in a potentially
| conflicting flight not being tracked among other flights? If
| so, maybe it's truly urgent enough to bring down the system and
| force the humans to resolve the discrepancy.
|
| The systems outside of the scope of this one failed to preserve
| a uniqueness guarantee that was depended on by this system. Was
| that dependency correctly identified as one that was the job of
| System X and not System Y?
| martinald wrote:
| Yes I agree. The reason the system crashed from what I
| understand wasn't because of the duplicate code, it was
| because it had the plane time travelling, which suggests very
| serious corruption.
| kevin_thibedeau wrote:
| Waves hand... This is not the SQL injection you're looking
| for. It's just a serious corruption.
| aftbit wrote:
| It seems fundamentally unreasonable for the flight processing
| system to entirely shut itself down just because it detected
| that one flight plan had corrupt data. Some degree of
| robustness should be expected from this system IMO.
| HeyLaughingBoy wrote:
| It depends on what the potential outcomes are.
|
| I've worked on a (medical, not aviation) system where we
| tried as much as possible to recover from subsystem
| failures or at least gracefully reduce functionality until
| it was safe to shut everything down.
|
| However, there were certain classes of failure where the
| safest course of action was to shut the entire system down
| immediately. This was generally the case where continuing
| to run could have made matters worse, putting patient
| safety at risk. I suspect that the designers of this system
| ran into the same problem.
| mannykannot wrote:
| It does not seem reasonable when you put it like that, but
| when could be said with confidence that it only affected
| just one flight plan? I get the impression that it is only
| in hindsight that this could be seen to be so. On the face
| of it, this was just an ordinary transatlantic flight like
| thousands of others.
|
| In general, the point where a problem first becomes
| apparent is not a guideline to its scope.
|
| Air traffic control is inherently a coordination problem
| dependent on common data, rules and procedures, which would
| seem to limit the degree to which subsystems can be siloed.
| Multiple implementations would not have helped in this
| case, either.
| MBCook wrote:
| I think you're on the right track, I assume it's safety.
|
| If one bad flight plan came in, what are the chances
| other unnoticed errors may be getting through?
|
| Given the huge danger involved with being wrong shutting
| down with a "stuff doesn't add up, no confidence in safe
| operation" error may be the best approach.
| akira2501 wrote:
| > obscuring this flight's waypoint problem have resulted in a
| potentially conflicting flight not being tracked among other
| flights?
|
| Flights are tracked by radar and by transponder. The
| appropriate thing to do is just flag the flight with a
| discontinuity error but otherwise operate normally. This
| happens with other statuses like "radio failure" or
| "emergency aircraft."
|
| It's not something you'd see on a commercial flight, but a
| private IFR flight (one with a flight plan), you can actually
| cancel your IFR plan mid flight and revert to VFR (visual
| flight rules) instead.
|
| Some flights take off without an IFR clearance as a VFR
| flight, but once airborne, they call up ATC and request an
| IFR clearance already en route.
|
| The system is vouchsafing where it does not need to.
| outworlder wrote:
| > From the system's POV maybe this is the right way to
| resolve the problem. Could masking the failure by obscuring
| this flight's waypoint problem have resulted in a potentially
| conflicting flight not being tracked among other flights? If
| so, maybe it's truly urgent enough to bring down the system
| and force the humans to resolve the discrepancy.
|
| Flagging the error is absolutely the right way to go. It
| should have rejected the flight plan, however. There could be
| issues if the flight was allowed to proceed and you now have
| an aircraft you didn't expect showing up.
|
| Crashing is not the way to handle it.
| hobs wrote:
| People posting on this forum saying "ah well software's failure
| case isn't as bad"
|
| > This forced controllers to revert to manual processing, leading
| to more than 1,500 flight cancellations and delaying hundreds of
| services which did operate.
| egypturnash wrote:
| Zero fatalities though. You could do a _lot_ worse for a
| massive air traffic control failure.
| hobs wrote:
| It's true, not saying they did a bad job here, just that even
| minor problems in your code can exacerbated into giant net
| effects without you even considering it.
| lxgr wrote:
| Unfortunately shutting down air traffic generally does not
| result in zero excess deaths:
| https://pmc.ncbi.nlm.nih.gov/articles/PMC3233376/
| d1sxeyes wrote:
| Your source says "the fatality rate did not change
| appreciably".
| lxgr wrote:
| Injuries did increase, though, and I can't think of a
| plausible mechanism that would somehow cap expected
| outcomes at "injury but not death".
| d1sxeyes wrote:
| So we were talking about excess deaths, which means that
| supporting your argument with a paper that argues that a
| previous finding of excessive deaths was flawed is
| probably not the strongest argument you could make.
|
| Increased number of injuries but not deaths could be, for
| example, (purely making things up off the top of my head
| here) due to higher levels of distractedness among
| average drivers due to fear of terrorism, which results
| in more low-speed, surface-street collisions, while
| there's no change in high speed collisions because a
| short spell of distractedness on the highway is less
| likely to result in an accident.
| lallysingh wrote:
| The software needs a way to reject bad plans without falling
| over.
| ipunchghosts wrote:
| Title should be nmi
| jordanb wrote:
| I do a lot of navigation and have never seen nautical miles
| abbreviated as "nmi."
| lxgr wrote:
| I bet not everybody on here does, so picking the unambiguous
| unit sign would definitely avoid some double-takes.
| barbazoo wrote:
| The unit of "nm" is common among pilots but yeah technically it
| should be "NM".
| yongjik wrote:
| NGL, two locations 3600 non-maskable interrupts apart would
| have been a _much_ more interesting story.
| buildsjets wrote:
| Maybe that is true in your industry. It is not true in my
| industry. NM is the legally accepted abbreviation for nautical
| miles when used in the context of aircraft operations.
| Andys wrote:
| Non-maskable Interrupt?
| jp57 wrote:
| FYI: nm = nautical miles, not nanometers.
| barbazoo wrote:
| Given the context, I'd say NM actually
| https://en.wikipedia.org/wiki/Nautical_mile
| jp57 wrote:
| I was clarifying the post title, which uses "nm".
| pvitz wrote:
| Yes, it looks like they should have written "NM" instead of
| "nm".
| andkenneth wrote:
| No one is using nanometers in aviation navigation. Quite
| a few aviation systems are case insensitive or all caps
| only so you can't always make a distinction.
|
| In fact, if you say "miles", you mean nautical miles. You
| have to use "sm" to mean statute miles if you're using
| that unit, which is often used for measuring visibility.
| ianferrel wrote:
| Sure but I could imagine some kind of software failure
| caused by trying to divide by a distance that rounded two
| zero because the same location was listed in two
| databases that were almost but not exactly the same
| location. In fact I did when I first read the headline,
| then realized that it was probably nautical miles.
|
| That would be roughly consistent with the title and not a
| totally absurd thing to happen in the world.
| anigbrowl wrote:
| Indeed, but you can easily imagine a software glitch over
| what looks like a single location but which the computer
| sees as two separate ones.
| dietr1ch wrote:
| Thanks, from the title I was confused on why there was such a
| high resolution on positions.
| ikiris wrote:
| Nanometers would be a very short flight.
| cheschire wrote:
| I could imagine conflict arising when switching between
| single and double precision causing inequality like this.
| noqc wrote:
| man, this ruins everything.
| andyjohnson0 wrote:
| Even though I knew this was about aviation, I still read nm as
| nanometres. Now I'm wondering what this says about how my brain
| works.
| lostlogin wrote:
| It says 'metric'. Good.
| tialaramex wrote:
| Indeed. There are plenty of things in aviation where they
| care so much about compatibility that something survives
| decades after it should reasonably be obsolete and
| replaced.
|
| Inches of mercury, magnetic bearings (the magnetic poles
| _move!_ but they put up with that) and gallons of fuel, all
| just accepted.
|
| Got a safety-of-life emergency on an ocean liner, oil
| tanker or whatever? Everywhere in the entire world mandates
| GMDSS which includes Digital Selective Calling, the boring
| but complicated problems with radio communication are
| solved by a machine, you just need to know who you want to
| talk to (for Mayday calls it's everyone) and what you want
| to tell them (where you are, that you need urgent
| assistance and maybe the nature of the emergency)
|
| On an big plane? Well good luck, they only have analogue
| radio and it's your problem to cope with the extensive
| troubles as a result.
|
| I'm actually impressed that COSPAS/SARSAT wasn't obliged to
| keep the analogue plane transmitters working, despite
| obsoleting (and no longer providing rescue for) analogue
| boat or personal transmitters. But on that, at least, they
| were able to say no, if you don't want to spend a few grand
| on the upgrade for your million dollar plane we don't plan
| to spend _billions of dollars_ to maintain the satellites
| just so you can keep your worse system limping along.
| jug wrote:
| Yeah, I went into the article thinking this because I
| expected someone had created waypoints right on top of each
| other and in the process also somehow generating the same
| code for them.
| QuercusMax wrote:
| Ah! I thought this was a case where the locations were just
| BARELY different from each other, not that they're very far
| apart.
| fabrixxm wrote:
| It took me a while, tbh..
| cduzz wrote:
| I was wondering; it seemed like if the to airports were 36000
| angstroms apart (3600 nanometers), it'd be reasonable to give
| them the same airport code since they'd be pretty much on top
| of each other.
|
| I've also seen "DANGER!! 12000000 mVolts!!!" on tiny little
| model railroad signs.
| atonse wrote:
| That's so adorable (for model railroads)
| jmvoodoo wrote:
| So, essentially the system has a serious denial of service flaw.
| I wonder how many variations of flight plans can cause different
| but similar errors that also force a disconnect of primary and
| secondary systems.
|
| Seems "reject individual flight plan" might be a better system
| response than "down hard to prevent corruption"
|
| Bad assumption that a failure to interpret a plan is a serious
| coding error seems to be the root cause, but hard to say for
| sure.
| mjevans wrote:
| Reject the flight plan would be the last case scenario, but
| where it should have gone without other options rather than
| total shutdown.
|
| CORRECT the flight plan, by first promoting the exit/entry
| points for each autonomous region along the route, validating
| the entry/exit list only, and then the arcs within, would be
| the least errant method.
| mcfedr wrote:
| Reject the plan surely should have come many places before
| shutdown the whole system!
| d1sxeyes wrote:
| You can't just reject or correct the flight plan, you're a
| consumer of the data. The flight plan _was_ valid, it was the
| interpretation applied by the UK system which was incorrect
| and led to the failure.
|
| There are a bunch of ways FPRSA-R can already interpret data
| like this correctly, but there were a combination of 6
| specific criteria that hadn't been foreseen (e.g. the
| duplicate waypoints, the waypoints both being outside UK
| airspace, the exit from UK airspace being implicit on the
| plan as filed, etc).
| perihelions wrote:
| Original (2023) thread with 446 comments,
|
| https://news.ycombinator.com/item?id=37461695 ( _" UK air traffic
| control meltdown (jameshaydon.github.io)"_)
| Jtsummers wrote:
| There's been some prior discussion on this over the past year,
| here are a few I found (selected based on comment count, haven't
| re-read the discussions yet):
|
| From the day of:
|
| https://news.ycombinator.com/item?id=37292406 - 33 points by
| woodylondon on Aug 28, 2023 (23 comments)
|
| Discussions after:
|
| https://news.ycombinator.com/item?id=37401864 - 22 points by
| bigjump on Sept 6, 2023 (19 comments)
|
| https://news.ycombinator.com/item?id=37402766 - 24 points by
| orobinson on Sept 6, 2023 (20 comments)
|
| https://news.ycombinator.com/item?id=37430384 - 34 points by
| simonjgreen on Sept 8, 2023 (68 comments)
| perihelions wrote:
| There's also a much larger one,
|
| https://news.ycombinator.com/item?id=37461695 ( _" UK air
| traffic control meltdown (jameshaydon.github.io)"_, 446
| comments)
| steeeeeve wrote:
| You know there's a software engineer somewhere that saw this as a
| potential problem, brought up a solution, and had that solution
| rejected because handling it would add 40 hours of work to a
| project.
| ryandrake wrote:
| ... or there's a software engineer somewhere who simply
| _assumed_ that three letter navaid identifiers were globally
| unique, and baked that assumption into the code.
|
| I guess we now need a "Falsehoods Programmers Believe About
| Aviation Data" site :)
| MichaelZuo wrote:
| Or even more straightforward, just don't believe anyone 100%
| knows what they are doing until they exhaustively list every
| assumption they are making.
| gregmac wrote:
| Which also means never assume the exhaustive list is 100%.
| MichaelZuo wrote:
| Bingo, without some means of credible verification, then
| assume it's incomplete.
| Filligree wrote:
| I wouldn't be able to produce such a list, even for areas
| where I totally do know everything that would be on the
| list.
| madcaptenor wrote:
| Even more straightforward, just don't believe anyone 100%
| knows what they are doing.
| metaltyphoon wrote:
| Did aviation software for 7 years. This is 100% the first
| assumption about waypoint / navaid when new devs come in.
| em-bee wrote:
| or falsehoods programmers believe about global identifiers
| CrimsonCape wrote:
| C dev: "You are telling me that the three digit codes are not
| globally unique??? And now we have to add more bits to the
| struct?? That's going to kill our perfectly optimized bit
| layout in memory! F***! This whole app is going to sh**"
| throw0101a wrote:
| > _C dev: "You are telling me that the three digit codes are
| not globally unique???_
|
| They are understood not to be. They are generally known to be
| regionally unique.
|
| The "DVL" code is unique with-in FAA/Transport Canada
| control, and the "DVL" is unique with-in EASA space.
|
| There are pre-defined three-letter codes:
|
| * https://en.wikipedia.org/wiki/IATA_airport_code
|
| And pre-defined four-letter codes:
|
| * https://en.wikipedia.org/wiki/ICAO_airport_code
|
| There are also five-letter names for major route points:
|
| * https://data.icao.int/icads/Product/View/98
|
| * https://ruk.ca/content/icao-icard-and-5lnc-how-
| those-5-lette...
|
| If there are duplicates there is a resolution process:
|
| * https://www.icao.int/WACAF/Documents/Meetings/2014/ICARD/IC
| A...
| marcosdumay wrote:
| Hum... Somebody has a list of foreign local-codes sharing
| the same space as the local ones?
|
| I assumed IATA messed up, not I'm wondering how that even
| happens. It's not even easy to discover the local codes of
| remote aviation authorities.
| skissane wrote:
| > I assumed IATA messed up,
|
| This isn't IATA. IATA manages codes used for passenger
| and cargo bookings, which are distinct from the codes
| used by pilots and air traffic control we are talking
| about here-ultimately overseen by ICAO. These codes
| include a lot of stuff which is irrelevant to
| passengers/freight, such as navigation waypoints,
| military airbases (which normally would never accept a
| civilian flight, but still could be used for an emergency
| landing-plus civilian and military ATC coordinate with
| each other to avoid conflicts)
| CrimsonCape wrote:
| It seems like tasking a software engineer to figure this
| out when the industry at large hasn't figured this out just
| isn't fair.
|
| Best I can see (using Rust) is a hashmap on UTF-8 string
| keys and every code in existence gets inserted into the
| hash map with an enum struct based on the code type. So you
| are forced to switch over each enum case and handle each
| case no matter what region code type.
|
| It becomes apparent that the problem must be handled with
| app logic earlier in the system; to query a database of
| codes, you must also know which code and "what type" of
| code it is. Users are going to want to give the code only,
| so there's some interesting mis-direction introduced; the
| system has to somehow fuzzy match the best code for the
| itinerary. Correct me if i'm wrong, but the above seems
| like a _mandatory_ step in solving the problem which would
| have caught the exception.
|
| I echo other comments that say that there's probably 60%
| more work involved than your manager realizes.
| skissane wrote:
| > They are understood not to be. They are generally known
| to be regionally unique.
|
| Then why aren't they namespaced? Attach to each code its
| issuing authority, so it is obvious to the code that
| DVL@FAA and DVL@EASA are two different things?
|
| Maybe for backward compatibility/ human factors reasons,
| the code needs to be displayed without the namespace to
| pilots and air traffic controllers, but it should be a
| field in the data formats.
| shagie wrote:
| CGP Grey The Maddening Mess of Airport Codes!
| https://youtu.be/jfOUVYQnuhw
|
| I'd rather deal with designing tables to properly represent
| names.
| nightowl_games wrote:
| I don't know that and I don't like this assumption that only
| 'managers' make mistakes, or that software engineers are always
| right. I thinks needlessly adversarial, biased and largely
| incorrect.
| zer8k wrote:
| Spoken like a manager.
|
| Look, when you're barking orders at the guys in the trenches
| who, understandably in fear for their jobs, do the _stupid_
| "business-smart" thing, then it is entirely the fault of
| management.
|
| I can't tell you how many times just in the last year I've
| been blamed-by-proxy for doing something that was decreed
| upon me by some moron in a corner office. Everything is an
| emergency, everything needs to be done yesterday, everything
| is changing all the time because King Shit and his merry band
| of boot-licking middle managers decide it should be.
|
| Software engineers, especially ones with significant
| experience, are almost surely more right than middle
| managers. "Shouldn't we consider this case?" is almost always
| met with some parable about "overengineering" and followed up
| by a healthy dose of "that's not AGILE". I have grown so
| tired of this and thanks to the massive crater in job
| mobility most of us just do as we are told.
|
| It's the power imbalance. In this light, all blame should
| fall on the manager unless it can be explicitly shown to be
| developer problems. The addage "those who can, do, and those
| who can't, teach" applies equally to management.
|
| When it's my f _@#_ $U neck on the line and the only option
| to keep my job is do the stupid thing you can bet I'll do the
| stupid thing. Thank god there's no malpractice law in
| software.
|
| Poor you - only one of our jobs is getting shipped overseas.
| kortilla wrote:
| Your attitude is super antagonistic and your relationship
| with management is not representative of the industry. I
| recommend you consider a different job or if this pattern
| repeats at every job that you reflect on how you interact
| with managers to improve.
| elteto wrote:
| Agreed. And most of the people with these attitudes have
| never written actual safety critical code where everything is
| written to a very detailed spec. Most likely the designers of
| the system thought of this edge case and required adding a
| runtime check and fatal assertion if it was ever encountered.
| _pete_ wrote:
| The DVL really is in the details.
| spatley wrote:
| Har! should have seen that one coming :)
| jrochkind1 wrote:
| I don't know how long that failure mode has been in place or if
| this is relevant, but it makes me think of analogous times I've
| encountered similar:
|
| When automated systems are first put in place, for something high
| risk, "just shut down if you see something that may be an error"
| is a totally reasonable plan. After all, literally yesterday they
| were all functioning without the automated system, if it doesn't
| seem to be working right better switch back to the manual process
| we were all using yesterday, instead of risk a catastrophe.
|
| In that situation, switching back to yesterday's workflow is
| something that won't interrupt much.
|
| A couple decades -- or honestly even just a couple years --
| later, that same fault system, left in place without much
| consideration because it rarely is triggered -- is itself
| catastrophic, switching back to a rarely used and much more
| inefficient manual process is extremely disruptive, and even
| itself raises the risk of catastrophic mistakes.
|
| The general engineering challenge, is how we deal with little-
| used little-seen functionality (definitely thinking of fault-
| handling, but there may be other cases) that is totally
| reasonable when put in place, but has not aged well, and nobody
| has noticed or realized it, and even if they did it might be hard
| to convince anyone it's a priority to improve, and the longer you
| wait the more expensive.
| telgareith wrote:
| Dig into the OpenZFS 2.2.0 data loss bug story. There was at
| least one ticket (in FreeBSD) where it cropped up almost a year
| prior and got labeled "look into layer," but it got closed.
|
| I'm aware closing tickets of "future investigation" tasks when
| it seems to not be an issue any longer is common. But, it
| shouldnt be.
| Arainach wrote:
| >it shouldnt be
|
| Software can (maybe) be perfect, or it can be relevant to a
| large user base. It cannot be both.
|
| With an enormous budget and a strictly controlled scope
| (spacecraft) it may be possible to achieve defect-free
| software.
|
| In most cases it is not. There are always finite resources,
| and almost always more ideas than it takes time to implement.
|
| If you are trying to make money, is it worth chasing down
| issues that affect a miniscule fraction of users that take
| eng time which could be spent on architectural improvements,
| features, or bugs affecting more people?
|
| If you are an open source or passion project, is it worth
| your contributors' limited hours, and will trying to insist
| people chase down everything drive your contributors away?
|
| The reality in any sufficiently large project is that the bug
| database will only grow over time. If you leave open every
| old request and report at P3, users will grow just as
| disillusioned as if you were honest and closed them as "won't
| fix". Having thousands of open issues that will never be
| worked on pollutes the database and makes it harder to keep
| track of the issues which DO matter.
| Shorel wrote:
| I'm in total disagreement with your last paragraph.
|
| In fact, I can't see how it follows from the rest.
|
| Software can have defects, true. There are finite
| resources, true. So keep the tickets open. Eventually
| someone will fix them.
|
| Closing something for spurious psychological reasons seems
| detrimental to actual engineering and it doesn't actually
| avoid any real problem.
|
| Let me repeat that: ignoring a problem doesn't make it
| disappear.
|
| Keep the tickets open.
|
| Anything else is supporting a lie.
| Arainach wrote:
| It's not "spurious psychological reasons". It is being
| honest that issues will never, ever meet the bar to be
| fixed. Pretending otherwise by leaving them open and
| ranking them in the backlog is a waste of time and
| attention.
| exe34 wrote:
| it's more fun/creative/CV-worthy to write new shiny
| features than to fix old problems.
| gbear605 wrote:
| There have been a couple times in the past where I've run
| into an issue marked as WONT FIX and then resolved it on
| my end (because it was luckily an open source project).
| If the ticket were still open, it would have been trivial
| to put up a fix, but instead it was a lot more annoying
| (and in one of the cases, I just didn't bother). Sure,
| maybe the issue is _so_ low priority that it wouldn't
| even be worth reviewing a fix, and this doesn't apply for
| closed source projects, but otherwise you're just losing
| out on other people doing free fixes for you.
| mithametacs wrote:
| Everything is finite including bugs. They aren't magic or
| spooky.
|
| If you are superstitious about bugs, it's time to triage.
| Absolutely full turn disagreement with your directions
| ronsor wrote:
| > The general engineering challenge, is how we deal with
| little-used little-seen functionality (definitely thinking of
| fault-handling, but there may be other cases) that is totally
| reasonable when put in place, but has not aged well, and nobody
| has noticed or realized it, and even if they did it might be
| hard to convince anyone it's a priority to improve, and the
| longer you wait the more expensive.
|
| The solution to this is to trigger all functionality
| periodically and randomly to ensure it remains tested. If you
| don't test your backups, you don't have any.
| ericjmorey wrote:
| Which company deployed a chaos monkey deamon on their
| systems? Seemed to improve resiliency when I read about it.
| theolivenbaum wrote:
| Netflix did that many years ago, interesting idea even if a
| bit disruptive in the beginning
| https://netflix.github.io/chaosmonkey/
| amelius wrote:
| "Your flight has been delayed due to Chaos Monkey."
| bitwize wrote:
| The chaos monkey is there to remind you to always mount a
| scratch monkey.
| crtified wrote:
| Also, as codebases and systems get more (not less) complex over
| time, the potential for technical debt multiplies. There are
| more processing and outcome vectors, more (and different)
| branching paths. New logic maps. Every day/month/year/decade is
| a new operating environment.
| mithametacs wrote:
| I don't think it is exponential. In fact, one of the things
| that surprises me about software engineering is that it's
| possible at all.
|
| Bugs seem to scale log-linearly with code complexity. If it's
| exponential you're doing it wrong.
| sam0x17 wrote:
| I've posted this here before, but they really need globally
| unique codes for all the airports, waypoints, etc, it's crazy
| there are collisions. People always balk at this for some reason
| but look at the edge cases that can occur, it's crazy CRAZY
| crote wrote:
| Coming up with a globally unique waypoint system is trivial.
| Convincing the aviation industry to spend many hundreds of
| millions of dollars to change a core data type used in just
| about every single aviation-related system, in order to avoid
| triggering rare once-a-decade bugs? That's a _lot_ harder.
| lostlogin wrote:
| > That's a lot harder.
|
| I wonder what 1,500 cancelled flights and 700,000 disrupted
| passengers adds up to in cost? And that's just this one
| incident.
| amiga386 wrote:
| ...an incident where they didn't parse the data as other
| systems already parsed the data.
|
| It sounds like the solution is better validation and test
| suites for the existing scheme, not a new less-ambiguous
| scheme
| buildsjets wrote:
| That's not CRAZY at all. CRAZY is at 14deg 4' 50.87" N. 145deg
| 38' 16.22" E
|
| https://opennav.com/waypoint/US/CRAZY
| gadders wrote:
| If you want to, you can read the final report from the UK Civil
| Aviation Authority here:
| https://www.caa.co.uk/publication/download/23340
|
| It's pretty readable and quite interesting.
| tempodox wrote:
| When there's no global clearing house for those identifiers,
| maybe namespaces would help?
|
| Related: The editorialized HN title uses nanometers (nm) when
| they possibly mean nautical miles (nmi). What would a flight
| control system make of that?
| bigfatkitten wrote:
| The reason idents for radio navaids (VOR/NDB) are only three
| characters is because they are broadcast via morse code. They
| need to be copyable by pilots who are otherwise somewhat busy
| and not particularly proficient in Morse. For this purpose,
| they only need to be unique to that frequency within plausible
| radio range.
|
| 'nm' and 'NM' are the accepted abbreviations for nautical miles
| in the aviation industry, whether official or not.
| buildsjets wrote:
| Every aircraft I've ever flown as either Pilot in Command or
| required crewmember, and also every marine navigation system I
| have used in my life has displayed distance information as nm,
| Nm, or NM, interchangeably. I have never been confused by this,
| and I have never seen any other crew be confused. I have not
| ever seen any version of nmi used, in any variation of
| capitalization. This includes Boeing flight decks, Airbus
| flight decks, general aviation Garmin equipment, and a few MIL
| aircraft. And some boats.
| chefandy wrote:
| As an aside, that site's cookie policy sucks. You can opt out of
| some, but others, like "combine and link data from other
| sources", "identify devices based on information transmitted
| automatically", "link different devices" and others can't be
| disabled. I feel bad for people that don't have the technical
| sophistication to protect themselves against that kind of prying.
| amiga386 wrote:
| This is old news, but what's new news is that last week, the UK
| Civil Aviation Authority openly published its _Independent Review
| of NATS (En Route) Plc 's Flight Planning System Failure on 28
| August 2023_ https://www.caa.co.uk/publication/download/23337
| (PDF)
|
| Let's look at point 2.28: "Several factors made the
| identification and rectification of the failure more protracted
| than it might otherwise have been. These include:
|
| * The Level 2 engineer was rostered on-call and therefore was not
| available on site at the time of the failure. Having exhausted
| remote intervention options, it took 1.5 hours for the individual
| to arrive on-site to perform the necessary full system re-start
| which was not possible remotely.
|
| * The engineer team followed escalation protocols which resulted
| in the assistance of the Level 3 engineer not being sought for
| more than 3 hours after the initial event.
|
| * The Level 3 engineer was unfamiliar with the specific fault
| message recorded in the FPRSA-R fault log and required the
| assistance of Frequentis Comsoft to interpret it.
|
| * The assistance of Frequentis Comsoft, which had a unique level
| of knowledge of the AMS-UK and FPRSA-R interface, was not sought
| for more than 4 hours after the initial event.
|
| * The joint decision-making model used by NERL for incident
| management meant there was no single post-holder with
| accountability for overall management of the incident, such as a
| senior Incident Manager.
|
| * The status of the data within the AMS-UK during the period of
| the incident was not clearly understood.
|
| * There was a lack of clear documentation identifying system
| connectivity.
|
| * The password login details of the Level 2 engineer could not be
| readily verified due to the architecture of the system."
|
| WHAT DOES "PASSWORD LOGIN DETAILS ... COULD NOT BE READILY
| VERIFIED" MEAN?
|
| EDIT: Per _NATS Major Incident Investigation Final Report -
| Flight Plan Reception Suite Automated (FPRSA-R) Sub-system
| Incident 28th August 2023_
| https://www.caa.co.uk/publication/download/23340 (PDF) ... "There
| was a 26-minute delay between the AMS-UK system being ready for
| use and FPRSA-R being enabled. This was in part caused by a
| password login issue for the Level 2 Engineer. At this point, the
| system was brought back up on one server, which did not contain
| the password database. When the engineer entered the correct
| password, it could not be verified by the server. "
| mcfedr wrote:
| But no mention of this insane failure mode? If the article is
| to be believed
| fyt2024 wrote:
| Is nm the official abbreviation for nautical miles? I assume it
| is natural miles. For me it is nanometers.
| andkenneth wrote:
| Contextually no one is using nanometers in aviation nav
| applications. Many aviation systems are case insensitive or all
| caps only so capitalisation is rarely an important distinction.
| buildsjets wrote:
| Officially, NM is the abbreviation for nautical miles when used
| in the context of aircraft operations. It's not just a good
| idea, it's the Law. Specifically, 14 CFR Part 1.2 of the United
| States Code of Federal Regulations.
|
| https://www.ecfr.gov/current/title-14/chapter-I/subchapter-A...
| cbhl wrote:
| Hmm, is this the same incident which happened last year? Or is
| this a new incident?
|
| From Sept 2023 (flightglobal.com):
|
| - https://archive.is/uiDvy
|
| - Comments: https://news.ycombinator.com/item?id=37430384
|
| Also some more detailed analysis:
|
| - https://jameshaydon.github.io/nats-fail/
|
| - Comments: https://news.ycombinator.com/item?id=37461695
| javawizard wrote:
| First sentence of the article:
|
| > Investigators probing the serious UK air traffic control
| system failure in August _last year_ [...]
| Joel_Mckay wrote:
| In other news, goat carts are still getting 100 furlong-firkin-
| fortnight on dandelions.
|
| =3
| convivialdingo wrote:
| I guarantee that piece of code has a comment like
| /* This should never happen */ if (waypoints.matchcount >
| 2) {
| GnarfGnarf wrote:
| Funny airport call letters story: I once headed to Salt Lake
| City, UT (SLC) for a conference. My luggage was processed by a
| dyslexic baggage handler, who sent it to... SCL (Santiago,
| Chile).
|
| I was three days in my jeans at business meetings. My bag came
| back through Lima, Peru and Houston. My bag was having more fun
| than me.
___________________________________________________________________
(page generated 2024-11-18 23:00 UTC)