[HN Gopher] Why do systems fail? Tandem NonStop system and fault...
___________________________________________________________________
Why do systems fail? Tandem NonStop system and fault tolerance
Author : PaulHoule
Score : 108 points
Date : 2024-10-11 17:10 UTC (1 days ago)
(HTM) web link (www.erlang-solutions.com)
(TXT) w3m dump (www.erlang-solutions.com)
| 082349872349872 wrote:
| at Tandem, even the company coffee mugs had redundancy:
| https://i.etsystatic.com/33311136/r/il/08fbca/5271808290/il_...
| sillywalk wrote:
| I'm still hoping to find a more detailed article about modern
| X86-64 NonStop, complete with Mackie Diagrams.
|
| The last one I can find is for the NonStop Advanced Architecture
| (on Itanium), with ServetNet. I gather that this was replaced
| with the NonStop Multicore Architecture (also on Itanium), with
| Infiniband, and I assume x86-64 is basically the same but on
| x86-64, but in pseudo big-endian.
| hi-v-rocknroll wrote:
| A hypervisor (software) approach is one way to accomplish it
| far cheaper and much more configurable and reusable than having
| to rely on dedicated hardware. VMware's virtualization method
| of x86_64 fault tolerant feature runs 2 VMs on different hosts
| using the lockstep method. Either fails, then the hypervisor
| moves the (V)IP over with ARP to the running one and spawns
| another to replace it. More often than not, it's a way to run a
| critical machine that cannot accept any downtime and cannot
| otherwise be (re)engineered in a conventional HA manner with
| other building-blocks. In general, one should never do this and
| prefer to use always consistent quorum 2 phase commit
| transactions at the cost of availability or throughput, or
| eventual consistency through gossip updates at the cost of
| inconsistency and potential data loss.
| adastra22 wrote:
| What do you want to know?
| sillywalk wrote:
| What has changed since Itanium? What counts as a logical
| NonStop CPU now? As I (mis?)understand it, under Itanium a
| physical server blade was called a slice. It had multiple CPU
| sockets (called Processing Elements) and memory on the was
| partitioned with MMU mapping and Itanium security keys so
| each Processing Element could only access a portion of it.
| All IO on a Processing Element went out over ServerNet (or
| Infiniband) to a pair of Logical Sync Units, and was
| checked/compared with IO from another Processing Element
| running the same code on a different physical server blade.
| The 2 (or 3) processing elements combined to form a single
| logical CPU. I wonder if this is still the case? I believe
| there was a follow on (I assume when Itanium went multi-core)
| called NonStop Multicore Architecture, but I haven't found a
| paper on it.
|
| Also, I'm curious how the Disk Process fits in with Storage
| Clustered IO Modules(CLIMs)? Do CLIMs just act as a raw disk,
| with the Disk Process talking to it like they would use to
| talk to a locally attached disk? Or is there more integration
| with the CLIM - like a portion of the Disk Process has been
| ported to Linux, or has Enscribe been ported to run on the
| CLIMs.
|
| The same thing with how Networking CLIMs fit in.
| macintux wrote:
| 10 years ago I used Jim Gray's piece about Tandem fault tolerance
| in a talk about Erlang at Midwest.io (RIP, was a great
| conference).
|
| https://youtu.be/E18shi1qIHU
|
| Because it's a small world, a former Tandem employee was
| attending the talk. Unfortunately it's been long enough that I
| don't remember much of our conversation, but it was impressive to
| hear how they moved a computer between data centers; IIRC, they
| simply turned it off, and when they powered it back on, the CPU
| resumed precisely where it had been executing before.
|
| (I have no idea how they handled the system clock.)
|
| Jim Gray's paper:
|
| https://jimgray.azurewebsites.net/papers/TandemTR86.2_FaultT...
| sillywalk wrote:
| > I have no idea how they handled the system clock.)
|
| It is or was on the Internet Archive and probably elsewhere -
|
| Tandem Systems Review, Volume 2, Number 1 (February 1986) -
| "Managing System Time Under Guardian 90"
| macintux wrote:
| Nice, thanks, will have to look that up.
| throw0101c wrote:
| > _Tandem Systems Review, Volume 2, Number 1 (February 1986)
| - "Managing System Time Under Guardian 90"_
|
| * https://vtda.org/pubs/Tadem_Systems_Review/
|
| * https://www.mrynet.com/FTP/os/DEC/www.hpl.hp.com/hpjournal/
| t...
| abrookewood wrote:
| That is crazy! I assume that all the RAM was battery backed?
| What about the CPU cache, the OS state etc? I'm struggling to
| see how this was possible.
| Animats wrote:
| Tandem was interesting. They had a lot of good ideas, many
| unusual today.
|
| * Databases reside on raw disks. There is no file system
| underneath the databases. If you want a flat file, it has to be
| in the database. Why? Because databases can be made with good
| reliability properties and made distributed and redundant.
|
| * Processes can be moved from one machine to another. Much like
| the Xen hypervisor, which was a high point in that sort of thing.
|
| * Hardware must have built in fault detection. Everything had
| ECC, parity, or duplication. It's OK to fail, but not make
| mistakes. IBM mainframes still have this, but few microprocessors
| do, even though the necessary transistors would not be a high
| cost today. (It's still hard to get ECC RAM on the desktop,
| even.)
|
| * Most things are transactions. All persistent state is in the
| database. Think REST with CGI programs, but more efficient.
| That's what makes this work. A transaction either runs to
| successful completion, or fails and has no lasting effect.
| Database transactions roll back on failures.
|
| The Tandem concept lived on through several changes of ownership
| and hardware. Unfortunately, it ended up at HP in the Itanium
| era, where it seems to have died off.
|
| It's a good architecture. The back ends of banks still look much
| like that, because that's where the money is. But not many
| programmers think that way.
| spockz wrote:
| Not to take away from your main point: The only reason it is
| hard to get ECC in a desktop is because it is used as customer
| segmentation, not because it if technically hard or because it
| would drive the actual cost of the hardware up.
| sitkack wrote:
| ECC should be mandatory in consumer and cpus and memory. This
| will be seen like cars with fins and not having seatbelts in
| the future.
| Animats wrote:
| I have a desktop where CPU, OS and motherboard all support
| it. But ECC memory wa hard to find. Memory with useless
| LEDs, though, is easily available.
| spockz wrote:
| That is because it doesn't make sense producing a product
| that cannot be used at all. It just doesn't work in
| consumer boards due to lack of support for it in consumer
| CPUs. Again due to artificial customer segmentation.
| c0balt wrote:
| Most ryzen CPUs have supported some ECC RAM for multiple
| years by now. The HED platforms, like Thread Ripper, did
| too. It just hasn't really been advertised as much
| because most consumers don't appear to be willing to pay
| the higher cost.
| PhilipRoman wrote:
| Ok, I'll bite - what tangible benefit would ECC give to the
| average consumer? I'd wager in the real world 1000x more
| data loss/corruption happens due to HDD/SSD failure with no
| backups.
|
| Personally I genuinely don't care about ECC ram and I would
| not pay more than $10 additional price to get it.
| adastra22 wrote:
| Most users experience data loss due to ECC these days.
| They just might not attribute it to cosmic rays. It's
| kinda hard to tell ECC data loss apart from intermittent
| hardware failure. It can be just as catastrophic though,
| if the bit flip hits a critical bit of information and
| ends up corrupting the disk entirely.
| immibis wrote:
| My Threadripper 7000 system with ECC DDR5 and MCE logging
| reports a corrected bit error every few hours, but I've
| got no idea if that's normal. I assume it was a tradeoff
| for memory density.
| MichaelZuo wrote:
| This, memory densities are so high nowadays it's almost
| guaranteed that a new computer bought in 2024 will hard
| fault with actual consequences (crashing, corrupted data,
| etc...) at least once a year due to lack of ECC.
| sillywalk wrote:
| > Databases reside on raw disks. There is no file system
| underneath the databases.
|
| The terminology of "filesystem" here is confusing. The original
| database system was/is called Enscribe, and was/is similar to
| VMS Record Management Services - it had different types of
| structured files types, in addition to unstructured
| unix/dos/windows stream-of-byte "flat" files. Around 1987
| Tandem added NonStop SQL files. They're accessed through a
| PATH: Volume.SubVolume.Filename, but depending on the file
| type, there is different things you can do with them.
|
| > If you want a flat file, it has to be in the database.
|
| You could create unstructured files as well.
|
| > Processes can be moved from one machine to another
|
| Critical system processes are process-pairs, where a Primary
| process does the work, but sends checkpoint messages to a
| Backup process on another processor. If the Primary process
| fails, the Backup process transparently takes over and becomes
| the Primary. Any messages to the process-pair are automatically
| re-routed.
|
| > Unfortunately, it ended up at HP in the Itanium era, where it
| seems to have died off.
|
| It did get ported to Xeon processors around 10 years ago, and
| is still around. Unlike OpenVMS, HPE still works on it, but as
| I don't think there is even a link to it on the HPE website* .
| It still runs on (standard?) HPE x86 servers connected to HPE
| servers running Linux to provide storage/networking/etc.
| Apparently it also runs supported under VMWare of some kind.
|
| * Something something Greenlake?
| Animats wrote:
| > Critical system processes are process-pairs, where a
| Primary process does the work, but sends checkpoint messages
| to a Backup process on another processor. If the Primary
| process fails, the Backup process transparently takes over
| and becomes the Primary. Any messages to the process-pair are
| automatically re-routed.
|
| Right. Process migration was possible, but you're right in
| that it didn't work like Xen.
|
| > It still runs on (standard?) HPE x86 servers connected to
| HPE servers running Linux to provide storage/networking/etc.
|
| HP is apparently still selling some HPE gear. But it looks
| like all that stuff transitions to "mature support" at the
| end of 2025.[1] "Standard support for Integrity servers will
| end December 31, 2025. Beyond Standard support, HPE Services
| may provide HPE Mature Hardware Onsite Support, Service
| dependent on HW spares availability." The end is near.
|
| [1] https://www.hpe.com/psnow/doc/4aa3-9071enw?jumpid=in_hpes
| ite...
| sillywalk wrote:
| It looks like that Mature Support stuff is all for
| Integrity i.e. Itanium servers. As long as HPE still makes
| x86 servers for Linux/Windows, I assume NonStop can tag
| along.
| Animats wrote:
| Right, that's just the Itanium machines. I'm not current
| on HP buzzwords.
|
| The HP NonStop systems, Xeon versions, are here.[1] The
| not-very-informative white paper is here.[2] Not much
| about how they do it. Especially since they talk about
| running "modern" software, like Java and Apache.
|
| [1] https://www.hpe.com/us/en/compute/nonstop-
| servers.html
|
| [2] https://www.hpe.com/psnow/doc/4aa6-5326enw?jumpid=in_
| pdfview...
| lazide wrote:
| As a side point - that is some _amazing_ lock in.
| MichaelZuo wrote:
| They were pretty much the only game in town, other than
| IBM and smaller mainframe vendors, if you wanted actual
| written, binding, guarantees of performance with penalty
| clauses. (e.g. with real consequences for system failure,
| such as being credited back X millions of dollars after Y
| failure)
|
| At least from what I heard pre-HP acquisition, so it's
| not 'amazing lock in', just that, if you didn't want a
| mainframe and needed such guarantees, there was literally
| no other choice.
| lazide wrote:
| Notably, that _is_ amazing lock in. What else would it
| look like?
| MichaelZuo wrote:
| Well if just price/performance alone is enough to
| qualify... viz. IBM, Then the moment another mainframe
| vendor decided to undercut them by say 20%, the lock in
| would evaporate. Of course no mainframe vendor would
| likely do so, but the latent possibility is always there.
|
| Facebook is an example of 'amazing lock in' where it's
| not theoretically possible for any potential competitor
| to just negate it with the stroke of a pen.
| kev009 wrote:
| Yes, IBM mainframes employ or have analogous concepts to all of
| this which may be one of many reasons they haven't disappeared.
| A lot of it was built up over time whereas Tandem started from
| the HA specification so the concepts and marketing are clearer.
|
| Stratus was another interesting HA vendor, particularly the
| earlier VOS systems as their modern systems are a bit more
| pedestrian. http://www.teamfoster.com/stratus-computer
| sillywalk wrote:
| I present to you "Commercial Fault Tolerance: A Tale of Two
| Systems" [2004][0] - a paper comparing the similarities and
| differences towards reliability/available/integrity between
| Tandem Nonstop and IBM Mainframe systems,
|
| and the book "Reliable Computer Systems - Design and
| Evaluation"[1] which has general info on reliability, and
| specific looks at IBM Mainframe, Tandem, and Stratus, plus
| AT&T switches and spaceflight computers.
|
| [0] https://pages.cs.wisc.edu/~remzi/Classes/838/Fall2001/Pap
| ers...
|
| [1] https://archive.org/download/reliablecomputer00siew/relia
| ble...
| mech422 wrote:
| Yeah - Stratus rocked :-) The 'big battle' used to be between
| Non-Stops more 'software based' fault tolerance VS. Stratus's
| fully hardware level high availability. I used to love
| demo'ing our Stratus systems to clients and let them pull
| boards while the machine was running...Just don't pull 2 next
| to each other :-)
|
| Also, I think Stratus was the first (only?) computer IBM re-
| badged at the time - IBM sold Stratus's as the Model 88, IIRC
| adastra22 wrote:
| > Unfortunately, it ended up at HP in the Itanium era, where it
| seems to have died off.
|
| My dad continues to maintain NonStop systems under the umbrella
| of DXC. (Which is a spinoff of HP? Or something? Idk the
| details.) He worked at Tandem back in the day, and has stayed
| with it ever since. I think he'd love to retire, but he never
| ends up as part of the layoffs that get sweet severance
| packages, because he's literally irreplaceable.
|
| The whole stack got moved to run on top of Linux, IIRC, with
| all these features being emulated. It still exists though, for
| the handful of customers that use it.
| Sylamore wrote:
| Kinda the other way around, the NonStop kernel can present a
| Guardian personality or an OSS (Open Systems Services) linux-
| like compatible personality. The OSS layer is basically
| running on top of the NSK/Guardian native layer but allows
| you to compile most linux software.
| adastra22 wrote:
| No, I meant the other way around. I don't know to what
| degree it ever got released, but he spent years getting it
| to work on "commodity" mainframe hardware running Linux, as
| HP wanted to get out of the business of maintaining special
| equipment and OS just for this customer.
| Sylamore wrote:
| Speaking of Tandem Databases, HP had released the SQL engine
| behind SQL/MX[0] as open source (Trafodion) running in front of
| Hadoop to the Apache Software Foundation but it appears they
| have shutdown the project[1].
|
| [0]: https://thenewstack.io/sql-hadoop-database-trafodion-
| bridges...
|
| [1]: https://attic.apache.org/projects/trafodion.html
| mannyv wrote:
| Oracle has had raw disk support for a long time. I'm pretty
| sure it's the last 'mainstream' database that does.
| vivzkestrel wrote:
| completely unrelated to the topic written but i wanted to point
| it out. there is some accessiblity issue with this page. The
| arrow keys up and down do not scroll the page on Firefox 131.0.2
| M1 Mac
| hi-v-rocknroll wrote:
| Stanford's Forsythe DC had a Tandem mainframe just inside the
| main floor area. It was a short beast standing on its own about
| 1.5m / 4' tall, and not in a 19" rack.
| redbluff wrote:
| As someone who has worked on nonstops for 35 years (and still
| counting!) it's nice to see them get a mention on here. I even
| have two at home, one a K2000 (MIPS) machine from the 90's and an
| Itanium server from a the mid 10's. I am pretty sure the suburbs
| lights dim when I fire them up :).
|
| It's an interesting machine architecture to work on, especially
| the "Guardian 90" personality, and quite amazing that you can run
| late 70's based programs without a recompilation written for a
| CPU using TTL logic on a MIPS, Itanium or X86 CPU; not all of
| them mind you, and not if they were natively compiled. The note
| on Stratus was quite interesting for a long time the only real
| direct competitor Nonstop had in a real sense was Stratus. The
| other thing that makes these systems interesting is they have a
| unix like personality called "OSS" that allows you to run quite a
| bit of POSIX style unix programs.
|
| My favourite nonstop story was in the big LA earthquake (89?) a
| friend of mine was working at a POS processor. When they returned
| to the building the Tandem machine was lying on its side,
| unplugged and still operating (these machines had their own
| battery backup). The righted it, plugged everything back in and
| the machine continued operating as though nothing happened. The
| fact that pretty much all the network comms were down kind of
| made this a moot point, but it was fascinating none the less.
| Pulling a CPU board, network board or disc controller or disc -
| all doable with no impact to transaction flow. The discs
| themselves were both mirrored and shadowed, which back in the day
| made these systems very expensive.
| lostemptations5 wrote:
| So if Tandem is so out of favour these days, what do people and
| organizations use? AWS availability zones, etc?
___________________________________________________________________
(page generated 2024-10-12 23:01 UTC)