[HN Gopher] Porting OpenVMS to the Itanium Processor Family (200...
___________________________________________________________________
Porting OpenVMS to the Itanium Processor Family (2003)[pdf]
Author : naves
Score : 29 points
Date : 2024-09-29 16:35 UTC (6 hours ago)
(HTM) web link (de.openvms.org)
(TXT) w3m dump (de.openvms.org)
| twoodfin wrote:
| The Apache #'s pretty much give the game away: An Itanium clocked
| 50% higher was losing to a 2yo Alpha by about 20% on throughput
| at peak.
|
| VLIW made sense when Intel wanted to win the FP-heavy workstation
| market. But while it was in development, integer-heavy web
| workloads became dominant and that was basically the ballgame.
| johndoe0815 wrote:
| The world would be much nicer if we still had new Alpha CPUs.
| It was intended to be a CPU architecture that lasts 25 years
| and Digital intended the architecture to support a 1000x
| increase in performance during that time.
|
| Now we have RISC-V reinventing the wheel. Not the worst
| outcome, but we could have had it so much better...
| ahoka wrote:
| It couldn't even handle unaligned access in it's original
| form. Surely an architecture to last for 25 years.
| fredoralive wrote:
| Not handling unaligned access gracefully is a classic RISC
| "feature", as part of the general simplification of a
| processor to its basics. I'm not sure if it's really an
| Alpha specific thing. Plus they added some instructions to
| ease the pain in 1996.
|
| The main issue people tend to bring up with Alpha is the
| very loose memory model, of the "things happen, but
| different processors may not really agree on the order they
| happened in" type of thing (plus, isn't it rude to want to
| know what other cores have in their cache?). Which would be
| a pain in our modern multicore world.
|
| Of course we don't know how things would've evolved over
| time, ARM (at least on big cores[1]) shifted towards the
| forgiving model for unaligned access, it's possible over
| time Alpha would've similarly moved to a more forgiving
| environment for low level programmers.
|
| [1] On embedded stuff, you're going to the Hardfault
| handler.
| formerly_proven wrote:
| Alpha had a super loosy-goosy memory model because iirc
| the cache size they wanted couldn't be built with the
| performance they needed on the process they had, so they
| made it from two wholly independent cache banks, both
| serving the same core through a shared queue.
| dfox wrote:
| BWX does not help with unaligned accesses, but solves the
| fact that original Alpha did not even have instructions
| for memory accesses smaller than word. Which kind of
| becomes an issue when you start building the systems on
| PC-like hardware (another related "feature" is that EV5
| does not have equivalent to MTRRs, but dealing with the
| weirdness of VGA framebuffer accesses is part of the
| architecture specification by means of hardcoded uncached
| memory region).
| fredoralive wrote:
| TBH, I'm not an expert on Alpha, and wow, as an embedded
| programmer by trade, that's really wacky way of handling
| memory access. I guess it made more sense in the
| minicomputer world where you control the whole stack, but
| as a more general purpose architecture its well, not the
| greatest is it.
| dfox wrote:
| There is a lot of good things to be said about Alpha and
| it is probably the most RISC of all 90's RISC ISAs, but
| the actual hardware is full of weirdness and all the real
| CPUs were deeply pipelined OoO designs (think Intel
| NetBurst) that prioritized high clock rates and huge
| straight-line throughput above all else (which is also
| why that ran really hot and could not be really scaled
| down for embedded use). Taking good ideas from that while
| discarding the "speed-demon" design is a part of why AMD
| become relevant again in 00's with amd64 cementing that
| position (but well, AMD K7 is very much Alpha-related
| design, to the extent that the chipsets are
| interchangeable between K7 and EV6. The interesting part
| of that is that these CPUs do not have FSB in the "bus"
| sense, but there is a point-to-point link between CPU and
| chipset).
| fredoralive wrote:
| AMD using an Alpha bus for early Athlons feels like a
| weird lost opportunity. Cheap x86 aimed motherboards that
| can run also run Alpha chips with Windows 2000 + FX!32
| for compatibility, it might've had a chance to shine,
| albeit a slight chance. Sadly by then Compaq had already
| boarded the Itanic...
| formerly_proven wrote:
| DEC designed StrongARM pretty much immediately after Alpha
| shipped because Alpha chips ran hot as frick and DEC
| engineers didn't see a path to low-power Alpha.
| twoodfin wrote:
| Do you know a good paper on the development of StrongARM?
| aardvark179 wrote:
| Much as some aspects of the Alpha were great its weak memory
| model would have resulted in even more concurrency issues
| than we have now, and way more explicit fences.
| formerly_proven wrote:
| Itanium was primarily developed by Intel, Itanium 2 primarily
| by the HP team that also was responsible for the competitive
| PA-RISC chips. (Or so they say). In any case, Itanium 2 still
| outperformed much later AMD Opterons and Intel Xeons running at
| twice the clock in numerical workloads. That's pretty
| impressive.
| twoodfin wrote:
| That's my point: If the demand for high-end compute at the
| turn of the millennium had looked the same as the demand for
| high-end compute in 1992, Itanium probably would have
| conquered the world.
|
| But Tim Berners-Lee had a NeXT and some good ideas...
| jcranmer wrote:
| The Itanium architecture is a weird weird architecture. It's
| not weird in the sense of Alpha's weirdness (e.g., the super
| weak memory model), which can be fairly easily compensated for,
| but it's weird in several ways that make me just stare at the
| manual and go "how are you supposed to write a compiler for
| this?" It's something that requires a sufficiently smart
| compiler to get good performance, while at the same time so
| designed as to make writing that sufficiently smart compiler
| effectively compiled.
|
| It wouldn't surprise me if Itanium actually had pretty
| compelling SPECint numbers. But a lot of those compelling
| numbers would have come from massive overtuning of the compiler
| to the benchmark specifically. Something that's going to be
| especially painful for I/O-heavy workloads is that the
| gargantuan register files make any context switch painfully
| slow.
| aleph_minus_one wrote:
| There exists a co-evolution between compilers, programming
| languages and CPUs (or more generally ASICs). I consider it
| to be very plausible that it is quite possible to develop a
| programming language that makes it sufficiently easy for a
| programmer to write performant code for an Itanium, but such
| a programming language would look different from C or C++.
| aleph_minus_one wrote:
| > The Apache #'s pretty much give the game away: An Itanium
| clocked 50% higher was losing to a 2yo Alpha by about 20% on
| throughput at peak.
|
| This is not just a benchmark of the CPUs, but also of the
| compilers involved. It is well-known that it was very hard to
| write a compiler that generates programs that could harness the
| optimization potential of Itanium's instruction set.
| pdw wrote:
| Amusingly similar to the much more recent slide decks about the
| x86 port, e.g.
| https://vmssoftware.com/docs/State_of_Port_20171006.pdf
| sillywalk wrote:
| On a similar note, porting Linux to Itanium -- A System
| Implementor's Tale [PDF]
|
| https://www.usenix.org/legacy/event/usenix05/tech/general/gr...
|
| and, NonStop on Itanium [PDF]:
|
| https://www.researchgate.net/profile/David_Bernick/publicati...
___________________________________________________________________
(page generated 2024-09-29 23:01 UTC)