https://86box.net/2025/02/25/riva128-part-1.html

86Box logo86Box logo Downloads Docs Blog GitHub

NVIDIA emulation journey, part 1: RIVA 128 / NV3 architecture history
and basic overview

February 25, 2025 - written by starfrost013

[hero]

Editor's Note: This is the first in a series of guest posts by
starfrost013 going over the architecture of NVIDIA's first
commercially-successful product, and the ongoing effort to get it
emulated on 86Box. It gets more technical than all our previous
posts, but there is little detailed information out there on this
chip that helped launch NVIDIA into success, so we've decided to
publish this saga here.

---------------------------------------------------------------------

Note: Documents wanted

If you are in possession of any of:

  * NVIDIA RIVA 128 Programmers' Reference Manual
  * NVIDIA RIVA 128 Customer Evaluation Kit (we have the NV1 CEK
    version 1.22)
  * NVIDIA RIVA 128 Turnkey Manufacturing Package
  * Source code (drivers, VBIOS, etc) related to the NVIDIA RIVA 128
  * Any similar documents, excluding the well-known datasheet, with
    technical information about a GPU going by the name "NV3",
    "STG-3000", "RIVA 128", "NV3T", "RIVA 128 Turbo" (an early name
    for the ZX) or "RIVA 128 ZX"
  * Any document, code, or materials relating to a graphics card by
    NVIDIA, in association with Sega, Helios Semiconductor or
    SGS-Thomson (now STMicroelectronics) codenamed "Mutara", "Mutara
    V08", or "NV2", or relating to a cancelled Sega console codenamed
    "V08"
  * Any documentation relating to RIVA TNT
  * Any NVIDIA SDK version that is not 0.81 or 0.83

Please contact me @ thefrozenstar_ on Discord, via the 86Box Discord,
my email address ([email protected]) or via the linked GitHub
account. These documents would be very helpful in helping me to
emulate the NVIDIA RIVA 128 and other NVIDIA graphics cards.

---------------------------------------------------------------------

Introduction

The NVIDIA RIVA 128 is a graphics card released in 1997 by NVIDIA,
nowadays of AI and $2000 overpriced quad-slot GPU fame. It was a
Direct3D 5.0-capable accelerator, and one of the first to use a
standard graphics API such as DirectX as its "native" API. I have
been working on emulating this graphics card for the last several
months; currently, while VGA works and the drivers are loading
successfully on Windows 2000, they are not rendering any kind of
accelerated output yet. Many people, myself included, have asked for
and even tried to develop emulation for this graphics card and other
similar cards (such as its successor, "NV4" or RIVA TNT), but have
not succeeded yet, although many of these efforts continue. This is
the first part of a series where I explore the architecture and my
experiences in emulating this graphics card. I can't guarantee
success, but if it was successful, it appears that it would be the
first time that a full functional emulation of this card has been
developed; although later NVIDIA cards have been emulated to at least
some extent, such as the GeForce 3 in Cxbx-Reloaded and Xemu.

This is the first part in a series of blog posts that aims to
demystify, once and for all, NVIDIA RIVA 128. This first part will
dive into the history of NVIDIA up to the release of the RIVA 128,
and a brief overview of how the chip actually works. The second part
will dive into the architecture of NVIDIA's drivers and how they
relate to the hardware, and the third part will follow the lifetime
of a graphics object from birth to display on the screen in extreme
detail. Then, part four and an unknown number of parts after part
four will go into detail on the experiences of developing a
functional emulation for this graphics card.

---------------------------------------------------------------------

A not so brief history

Beginnings

NVIDIA was conceived in 1992 by three LSI Logic and Sun Microsystems
engineers: Jensen Huang (now one of the world's richest men, still
the CEO and, apparently, mobbed by fans in his country of birth
Taiwan), Curtis Priem (whose boss almost convinced him to work on
Java instead of founding the company) and Chris Malachowsky (a
veteran of graphics chip development). They saw a business
opportunity in the PC graphics and audio market, which was dominated
by low-end, high-volume players such as S3 Graphics, Tseng Labs,
Cirrus Logic and Matrox^1. The company was formally founded on April
5, 1993, after all three left their jobs at LSI Logic and Sun between
December 1992 and March 1993.

After the requisite $3 million of venture capital funding was
acquired (a little nepotism owing to their reputation helped), work
immediately began on a first generation graphics chip; this was one
of the first of a rush of dozens of companies attempting to develop
graphics cards - both established players in the 2D graphics market
such as Number Nine and S3, and new companies, almost all of which no
longer exist - many of which failed to even release a single graphics
card. The name was initially GXNV for "GX next version", after a
graphics chip Malachowsky led the development of at Sun, but Huang
requested him to rename the chip to NV1 in order to not get sued.
This also inspired the name of the company - NVIDIA, after other
names such as "Primal Graphics" and "Huaprimal" were considered and
rejected, and their originally chosen name of "Invision" turned out
to have been trademarked by a toilet paper company.

In a perhaps ironic twist of fate, toilet paper turned out to be an
apt metaphor for the sales, if not quality, of their first product,
which Jensen Huang appears to be embarassed to discuss when asked,
and has been quoted as saying "You don't build NV1 because you're
great". The product was released in 1995 after a two-year development
cycle and the creation of what NVIDIA dubbed a hardware simulator,
but actually appears to have been simply a set of Windows 3.x drivers
intended to emulate their architecture, called the NV0 in 1994.

The NV1

The NV1 was a combination graphics, audio, DRM (yes, really) and game
port card implementing what NVIDIA dubbed the "NV Unified Media
Architecture" (UMA); the chip was manufactured by SGS-Thomson
Microelectronics (now STMicroelectronics) on a 350-nanometer process
node, who also white-labelled NVIDIA's design (which allegedly^2
featured a DAC block designed by SGS-Thomson) as the STG-2000, a
variant without audio functionality, also called the "NV1-V32" (for
32-bit VRAM) in internal documentation as opposed to NVIDIA's
NV1-D64. The chip was designed to implement a reasonable level of 3D
graphics functionality, as well as audio, public-key encryption for
DRM purposes (ultimately never used as it would have required the
cooperation of software companies) and Sega Saturn game ports, all
within a single megabyte of RAM, as memory costs were around $50 a
megabyte when initial design began in 1993.

In order to achieve this, many techniques had to be used that
ultimately compromised the chip's 3D rendering quality, such as
forward texture mapping, where a texel (output pixel) of a texture is
directly mapped to a point on the screen, instead of the more
traditional inverse texture mapping, which iterates through pixels
and maps texels from those. While this has memory space advantages
(as you can cache the texture in the very limited amount of VRAM
NVIDIA had to work with very easily), it has many more disadvantages;
firstly, this approach does not support UV mapping (a special
coordinate system used to map textures to three-dimensional objects)
and other aspects of what would be considered to be today basic
graphical functionality.

Additionally, the fundamental implementation of 3D rendering used
quad patching instead of traditional triangle-based approaches; this
has very advantageous implications for things like curved surfaces,
and may have been a very effective design for the CAD/CAM customers
purchasing more high end 3D products, however, it turned out to not
be particularly useful at all for the actually intended target market
of gaming. There was also a total lack of Sound Blaster compatibility
(very much a requirement for half-decent audio in games back then) in
the audio engine, and VGA compatibility was very slow and partially
emulated, which led to slow performance in the games people actually
played, unless your favourite game was a crappier, slower version of
Descent, Virtua Cop or Daytona USA for some reason. Another body blow
to NVIDIA was received when Microsoft released Direct3D in 1996 with
DirectX 2.0, which not only used triangles, but also became the
standard 3D API and deprecated all of the numerous non-OpenGL
proprietary APIs of the time, including S3's S3D and MeTaL^3, ATI's
3DCIF, and NVIDIA's own NVLIB.

The upshot of all of this was what can be understood as nothing less
than the total failure of NVIDIA to sell or convince anyone to
develop for NV1 in any way, despite its innovative silicon design.
While Diamond Multimedia purchased 250,000 chips to place into their
"Edge 3D" series of cards, and other manufacturers produced cards in
smaller quantities, barely any of them sold, and those that did sell
were often returned, leading to the chips themselves being returned
to NVIDIA and hundreds of thousands of chips sitting simply unused in
warehouses. Barely any NV1-capable software was released, with the
few pieces of software that do exist coming via a partnership with
Sega (more on that later), while most others were forced to run under
software emulators for Direct3D (or other APIs) written by Priem,
which were made possible by the software architecture NVIDIA chose
for their drivers, but were slower and worse-looking than software
rendering, buggy, and generally extremely unappealing.

NVIDIA lost $6.4 million in 1995 on a revenue of $1.1 million, and $3
million on a revenue of $3.9 million in 1996. Most of the capital
that allowed NVIDIA to continue operating were from the milestone
payments from SGS-Thomson for developing the chip, their NV2 contract
with Sega (again, more on that later), and their venture capital
funding, but not from the very few NV1 sales. The NV1 was poorly
reviewed, had very little software and ultimately almost no sales;
despite various desperate efforts to revive it, including releasing
the SDK for free (with a new proprietary NVLIB API for game
development as an alternative to direct hardware programming) and by
early 1996 straight up begging customers on their website to spam
developers with requests to add NV1 support to games, the chip was
effectively dead within a year.

The NV2

Nevertheless, NVIDIA grew to close to a hundred employees, including
sales and marketing teams. The company, and especially its
cofounders, remained confident in their architecture and overall
prospects of success. They had managed to solidify a business
relationship with Sega, to the point where they had initially won the
contract to provide the graphics hardware for the successor to the
Sega Saturn, at that time codenamed "V08". The GPU was codenamed
"Mutara" (after the nebula critical to the plot in Star Trek II: The
Wrath of Khan) and the overall architecture was the NV2. It
maintained many of the functional characteristics of the NV1 and was
essentially a more powerful successor to that chip. According to
available sources, this would have been the only NVIDIA chip
manufactured by the then-just founded Helios Semiconductor.

However, problems started to emerge almost immediately. Game
developers, especially Sega's internal teams, were not happy with
having to use a GPU with such a heterodox design; for example,
porting games to or from the PC, which Sega did do at the time, would
be made far harder. This position was especially championed by Yu
Suzuki, head of one of Sega's most prestigious internal development
teams Sega-AM2, responsible for the Daytona USA, Virtua Racing,
Virtua Fighter, and Shenmue series among others, who sent his best
graphics programmer to interface with NVIDIA and push for the company
to change the rendering method to a more traditional triangle-based
approach. At this point, the story diverges: some tellings claim that
NVIDIA simply refused to accede to Sega's request and this damaged
their relationship irreparably, leading to the NV2's cancellation;
others that the NV2 wasn't killed until it failed to produce any
video during a demonstration, and Sega still paid NVIDIA for
developing it to prevent bankruptcy, with a single engineer
apparently assigned to (and succeeding at) getting the chip working
for the sole purpose of receiving a milestone payment.

At some point, Sega, as a traditional Japanese company, couldn't
simply kill the deal, so the NV2 was officially relegated to be used
in the successor to the educational toddler-aimed Sega Pico, while in
reality, Sega of America had already been told to "not worry" about
NVIDIA anymore. NVIDIA got the hint, and the NV2 was cancelled. With
both NV1 and NV2 out of the picture, NVIDIA had no sales, no
customers, and barely any money; by late 1996, the company had $3
million in the bank and was burning through $330,000 a month, and
most of the NV2 team had been redeployed to the next-generation NV3.
No venture capital funding was going to be forthcoming due to the
failure to actually create any products people wanted to buy, at
least not without extremely unfavourable terms on things like
ownership. The company was effectively almost a complete failure and
a waste of years of the employees' time.

Near destruction of the company

By the end of 1996, things had gotten infinitely worse, with the
competition heating up extraordinarily fast; despite NV1 being the
first texture-mapped consumer GPU ever released, they had been
fundamentally outclassed by their competition. It was a one-two
punch: initially, Rendition - founded around the same time as NVIDIA
in 1993 - released its V1000 chip based on a custom RISC
architecture, and while not particularly fast, it was, for a few
months, the only chip that could run Quake (the hottest game of 1996)
in hardware accelerated mode. The V1000 was an early market leader,
alongside S3's laughably bad ViRGE (Video and Rendering Graphics
Engine) which was infamously slower than software rendering on
high-end CPUs at launch, and was reserved for high-volume OEM
bargain-bin disaster machines.

However, this was nothing compared to the body blow about to hit the
entire industry, NVIDIA included. At a conference in early 1996, an
$80,000 machine from SiliconGraphics, then the world leader in
accelerated graphics, crashed during a demo by the then-CEO Ed
McCracken. If accounts of the event are to be believed, while the
machine rebooted, people who had heard rumors left the room and
headed downstairs to another demo by a then-tiny company made up of
ex-SGI employes calling themselves "3D/fx" (later shortened to 3dfx),
claiming comparable graphics quality for $250... with demos to prove
it. As with many cases of supposed "wonder innovations" in the tech
industry, it was too good to be true, but when their card, the
"Voodoo Graphics" was first released in the form of the "Righteous
3D" by Orchid in October 1996, it turned out to be true. Despite the
fact that it was a 3D-only card and required a 2D card to be
installed, and the fact it could not accelerate graphics in a window
(which almost all other cards could do), performance was so high
relative to other products (including the NV1) that it not only had
rave reviews on its own but also kicked off a revolution in consumer
3D graphics, which especially caught fire when GLQuake was released
in January 1997.

The reasons for 3dfx being able to design such an effective GPU when
all others failed were numerous. The price of RAM plummeted by 80%
throughout 1996, allowing the Voodoo's estimated retail price to be
cut from $1000 to $300; many of their staff members came from
SiliconGraphics, perhaps the most respected and certainly the largest
company in the graphics industry of that time^4; and while the Voodoo
used a proprietary API called Glide, it also supported OpenGL and
Direct3D. Glide was designed to be very similar to OpenGL while
allowing for 3dfx to approximate standard graphical techniques,
which, as well as their driver design; the Voodoo only accelerates
edge interpolation^5, texture mapping and blending, span
interpolation^6, and final presentation of the rendered 3D scene,
while the rest is all done in software. All of these factors were key
in what proved to be an exceptionally high-quality product at an
exceptionally low price for the time.

Meanwhile, NVIDIA effectively had to design a graphics architecture
that could at the very least get close to 3dfx's performance, on a
shoestring budget and with very little resources, as 60% of their
staff (including the entire sales and marketing teams) had been laid
off to preserve money. They could not do a complete redesign of the
NV1 from scratch if they felt the need to, as it would take two years
(time they simply didn't have) and any design that came out of this
effort would be immediately obsoleted by competitors, such as 3dfx's
Voodoo series, and ATI's Rage which was initially rather pointless
but rapidly advancing in performance and driver stability. The chip
would also have to work reasonably well on the first tapeout, as
there was no capital to produce more revisions of the chip. The fact
NVIDIA were able to achieve a successful design in the form of the
NV3 under such conditions was a testament to the intelligence, skill
and luck of their designers; we will explore how they managed to
achieve this later on this write-up.

The NV3

It was with these financial, competitive and time constraints in mind
that design on the NV3 began in 1996. This chip would eventually be
commercialised as the RIVA 128, standing for "Real-time Interactive
Video and Animation accelerator" followed by a nod to its 128-bit
internal bus width, which was very large for the time. NVIDIA
retained SGS-Thomson (soon to be STMicroelectronics) as their
manufacturing partner, in exchange for SGS-Thomson cancelling their
competing STG-3001 GPU. In a similar vein to the NV1, NVIDIA was to
sell the chip as "NV3" and SGS-Thomson was to white-label it as
STG-3000, once again separated by audio functionality; however,
NVIDIA convinced SGS-Thomson to cancel their own part and stick to
manufacturing the NV3 instead, which later proved to be a terrible
decision when NVIDIA dropped them in favor of TSMC for manufacturing
of the RIVA 128 ZX due to both yield issues and pressure from venture
capital funders. ST went on to manufacture the PowerVR Kyro series of
GPU chips before dropping out of the market entirely by 2002.

After the NV2 disaster, the company made several calls on the NV3's
design that turned out to be very good decisions. First, they
acquiesced to Sega's advice (which they might have already done to
save the Mutara V08/NV2, but it was too late) and moved to an inverse
texture mapping triangle-based model, although some remnants of the
original quad patching design remain. The unused DRM functionality
was also removed, which may have been assisted by David Kirk^7 taking
over from Curtis Priem as chief designer, as Priem insisted on
including the DRM functionality with the NV1, citing piracy issues
with the game he had written as a demo of the Malachowsky-designed GX
GPU back when he worked at Sun.

Another decision that turned out to pay very large dividends was
deciding to forgo a native API entirely and build the card around
accelerating the most popular graphical APIs, which led to an initial
focus on Direct3D, although OpenGL drivers were first publicly
released in alpha form in December 1997 and fully in early 1998.
DirectX 3.0 was the initial target, and after 4.0 was cancelled due
to lack of developer interest in its new functionality, 5.0 came out
late during development of the chip, which turned out to be mostly
compliant, with the exception of some blending modes such as additive
blending which Jensen Huang later claimed was due to Microsoft not
giving them the specification in time. This compliance was made much
easier by the design of their driver, which allowed (and still
allows) graphical APIs to be plugged in as "clients" to the Resource
Manager kernel; as I mentioned earlier, this will be explained in
full detail later.

The VGA core, previously so separate from the main GPU on the NV1
that it had its own PCI ID, was replaced by a new one licensed from
Weitek who would soon exit the graphics market. The core was placed
in the chip parallel to the main GPU with its own 32-bit bus, which
massively accelerated performance in unaccelerated VESA titles such
as Doom, and provided a real advantage over the 3D-only 3dfx cards,
especially as their combination SST-96 or Voodoo Rush card used a
questionable Alliance chip and was generally considered a failure.
Finally, Huang, in his capacity as the CEO, allowed the chip to be
expanded (in terms of physical size and number of gates) from its
original specification, allowing for a more complex design with more
features.

The initial revision of the architecture appears to have been
completed in January 1997. Then, aided by hardware simulation
software (an actual hardware simulation unlike the NV0) purchased
from another almost-bankrupt company, an exhaustive test set was
completed. The first bug presented itself almost immediately when the
"C" character in the MS-DOS codepage appeared incorrectly, Windows
took 15 minutes to boot, and moving the mouse cursor required a map
of the screen so you didn't lose it by moving too far, but ultimately
the testing was completed. However, NVIDIA didn't have the money to
respin the silicon for a second stepping if problems appeared, so it
had to work at least reasonably well in the first stepping.

RIVA 128

Luckily for NVIDIA, the NV3 chip worked well enough to be sold to
their board partners (almost certainly thanks to that hardware
simulation package), and the company survived. Most accounts indicate
they were only three or four weeks away from bankruptcy; when 3dfx
saw the RIVA 128 at its reveal at the CGDC 1997 conference, one
founder responded with "you guys are still around?", considering 3dfx
almost bought NVIDIA effectively for the purpose of killing the
company as a theoretical competitor, but NVIDIA refused as they
assumed they would be bankrupt within months anyway. However, this
revision A of the chip was not the one NVIDIA actually
commercialised; SGS-Thomson dropped their plans for the STG-3000 at
some point, which led NVIDIA, now flush with cash^8, to create a new
revision of the chip to remove the sound functionality (although some
parts remained), fix some errata and make other minor adjustments to
the silicon.

The chip was respun, with the revision B silicon being completed in
October 1997 and presumably available a month or two later; it is
most likely that some revision A cards were sold at retail, but based
on the dates, these would have to be very early units^9, with the
earliest NVIDIA RIVA 128 drivers that I have discovered (labelled as
"Version 0.75" and also doubling as the only NV1 drivers for Windows
NT) being dated August 1997, and reviews starting to drop on websites
such as AnandTech in the first half of September 1997. There are no
known drivers available for the audio functionality in the revision A
RIVA 128, so anyone wishing to use it would have to write custom
drivers.

The RIVA 128 was generally well-reviewed at its launch and considered
as the fastest graphics chip released in 1997, beating the Voodoo1 in
raw speed but not output video quality, most likely due to NVIDIA's
financial situation leading to rushed development of the chip with
shortcuts taken in the design process in order to ship on time.
Examples of this lower quality include the lack of support for some
of Direct3D 5.0's blending modes, and the use of per-polygon
mipmapping^10 instead of the more accurate per-pixel approach,
causing seams between different mipmapping layers; the dithering and
bilinear texture filtering quality were often criticised as well, and
some games exhibited seams between polygons. Furthermore, the drivers
were generally very rough at launch, especially if the graphics card
was an upgrade and previous drivers were not; while NVIDIA were able
to fix many driver issues by the 3.xx versions released in 1998 and
1999, going as far as writing a fairly decent OpenGL ICD, the
standards for graphical quality had risen over time and what was
considered "decent" in 1997 was considered to be "bad" and even
"awful" by 1999.

Nevertheless, over a million RIVA 128 units sold within a few months,
and NVIDIA's immediate existence as a company was secured; an
enhanced version (revision C, also called "NV3T") was released in
March 1998 as the RIVA 128 ZX, in order to compete with the Intel/
Lockheed Martin i740, a chip which was hyped as being very fast on
paper but turned out to be not very good, leading to Intel starting
their long line of sub-par integrated GPUs before finally returning
to the discrete market in recent years under the Arc brand, with the
current Battlemage line being their 13th or 16th generation product
depending on who you ask.

After all of this history and exposition, we are finally ready to
actually explore the GPU behind the RIVA 128 series. I refer to it as
NV3 as a catch-all term for all chips using this architecture,
including the RIVA 128, RIVA 128 ZX, and the hypothetical STG-3000.
Note that the 32-bit Weitek VGA core's architecture will not be
discussed at length here unless absolutely required, as it is pretty
much a standard SVGA core, and really is not that interesting
compared to the main GPU; they're not even substantially integrated,
although there are a few areas in the design that allow the main GPU
to write directly to the Weitek's registers.

---------------------------------------------------------------------

Architectural overview

NV3 is the third-generation NV architecture designed by NVIDIA in
1997, commercialised as the RIVA 128 family. It implements a
fixed-function 2D and 3D render path primarily aimed at desktop
software and video games, with hardware acceleration best described
as partial by modern standards, but one of the more complete,
fully-featured solutions for 1997. It can be attached through the
legacy PCI 2.1 bus or AGP 1X (2X on the RIVA 128 ZX), a higher-speed
superset of PCI designed for graphics which was brand new at the time
but ultimately proved successful.

The primary goals of this architecture were low manufacturing cost,
short development time (due to NVIDIA's dire financial condition at
the time), and beating the 3dfx Voodoo1 in raw pixel pushing
performance. It generally achieved these goals with caveats, with a
bulk cost of $15 per chip, a design period of around 9 months
(excluding Revision B), and performance generally better than that of
the Voodoo, in spite of 3dfx's more integrated Glide API, and
NVIDIA's smaller performance advantage with large triangles as
compared to smaller ones.

While the focus of study has been the Revision B card, efforts have
been made to understand the A and C revisions as well. Each revision
has different values for the GPU ID in the framebuffer boot
configuration register in MMIO space (at offset 0x100000) and the PCI
configuration space Revision ID register:

Revision NV_PFB_BOOT_0 value PCI revision ID
A        0x30100             0x00
B        0x30110             0x10
C        0x30120             0x20

There is a common misconception that the PCI ID is different on RIVA
128 ZX chips; this is partially true, but misleading. The standard
NV3 architecture uses a PCI vendor ID of 0x12D2 (identified as
"NVidia / SGS Thomson (Joint Venture)" by The PCI ID Repository)
instead of NVIDIA's own 0x10DE, with a device ID of 0x0018, or 0x0019
on a RIVA 128 ZX with ACPI enabled. However, the presence of a 0x0019
device ID is not sufficient for a RIVA 128 ZX to be detected as such;
the revision must be C, or 0x20, regardless of device ID, as
confirmed through VBIOS and driver reverse engineering. Since the
device ID can be either value, the best way to check is to use the
revision ID encoded into the board at manufacturing time, through the
NV_PFB_BOOT_0 register or PCI configuration space.

The NV3 architecture incorporates accelerated triangle setup (the
Voodoo is limited to around 2/3 of that), the aforementioned span and
edge interpolation, texture mapping, blending, and final
presentation. It does not accelerate the initial polygon
transformation or lighting rendering phases. It is capable of
rendering in 2D at a resolution of up to 1280x1024 (at least
1600x1200 in ZX, not sure what?) and 32-bit colour. 3D rendering is
only possible in 16-bit colour, and at 960x720 or lower in a 4 MB
card due to a lack of VRAM. EDID is supported for monitor
identification via an entirely software-programmed I2C bus.

While 2 MB and even 1 MB cards were planned, they were seemingly
never released. The level of pain of using them can only be imagined;
there were also low-end cards released that only used a 64-bit bus,
handled using a manufacture-time configuration mechanism (sometimes
exposed via DIP switches) known as straps, which will be explained in
Part 2. To compete with the i740, the RIVA 128 ZX had, among other
changes that will be described later, an increased amount of VRAM (8
MB) also allowing it to render 3D at higher resolutions of up to
1280x1024.

The design of the RIVA is very complex compared to other
contemporaneous video cards. I am not sure why such a complex design
was used, but it was inherited from the NV1; the only real reason I
can think of is that the overengineered design is intended to be
future-proof and easy to enhance without requiring complete rewiring
of the silicon, as many other companies had to do. The GPU is split
into a around a dozen subsystems (functional hardware blocks), each
with names starting in P for some reason; some examples are PGRAPH,
PTIMER, PFIFO, PRAMDAC and PBUS. Presumably, a subsystem has a 1:1
mapping with a functional block on the GPU die, since the registers
are named after the subsystem that they are a part of.

There are several hundred different registers across the entire
graphics card, so things are necessarily simplified for brevity, at
least in Part 1. To be honest, the architecture of this graphics card
is too complicated to show in a diagram without simplifying things so
much as to be effectively pointless or complicating it to the point
of not being useful (I tried!), so a diagram has not been provided.

Fundamental concept: the scene graph

In order to begin this journey through the NVIDIA NV3 architecture,
you must understand the fundamental concept of a scene graph.
Although the architecture does not strictly implement a scene graph,
knowing the concept helps understand how graphical objects are
represented by the GPU. A scene graph is a description of a form of
tree where the nodes of the tree are graphical objects, and the
properties of a parent object cascade down to its children.

This is how almost all modern game engines (Unity, Unreal, Godot...)
represent 3D space. A very easy way to understand how a scene graph
works is - I am not joking - install Roblox Studio, place some
objects into the scene, save it as an RBXLX file (not the default),
and open it in a text editor of your choice. You will see an XML
representation of the scene you have created as a scene graph; the
only caveat is that on Roblox, the cascading of characteristics from
parent nodes to children is optional.

The scene graph is almost certainly the namesake for the functional
block actually implementing the 2D and 3D drawing engine that makes
the GPU, well, a GPU: PGRAPH. This part has survived from the very
first NV1 all the way to the current Blackwell (RTX 5000)
architecture; NVIDIA have never done a ground-up redesign since
initial development of the NV1 architecture began in 1993, although
the Ship of Theseus argument applies here.

Clocks

The RIVA 128 is not dependent on the host machine's clock. It has a
13.5 or 14.3 MHz (depending on boot-time configuration) clock
crystal, split by the hardware into a memory clock (MCLK) and video
clock (VCLK). Note that these names are misleading; the MCLK also
handles the chip's actual rendering and timing, with the VCLK
seemingly just handling the actual pushing out of frames.

The actual clocks are controlled by registers in PRAMDAC set by the
video BIOS, which can later be overridden by drivers. In this
iteration of the NVIDIA architecture, the VBIOS only performs a very
basic POST sequence, initialises the card and sets its clock speed;
once the chip is initialised, the VBIOS is effectively never needed
again, although there are mechanisms to read from it after
initialisation. Clocks were controlled by card manufacturers through
parameters m/n/p, from which the chip derives the final memory and
pixel clock speed with the formula (frequency * n) / (m << p).
Generally, most manufacturers set the memory clock at around 100 MHz,
and the pixel clock at around 40 MHz, although drivers seemingly
reduce these clocks in some cases.

The chip's RAMDAC handles final conversion of the digital image
generated by the GPU into an analog video signal, and clock
generation via three phase-locked loops. It has its own clock (ACLK)
running at around 200 MHz on RIVA 128 (revision A/B) and 260 MHz on
RIVA 128 ZX (revision C) chips, which unlike the other clocks, was
not configurable by manufacturers.

Memory mapping

Before we can discuss any part of how the RIVA 128 works, its memory
architecture must be explained, since this is a fundamental
requirement to even access the graphics card's registers in the first
place. NVIDIA picked a fairly strange memory mapping architecture, at
least for cards of that time. The exact setup of the memory mapping
changed numerous times as NVIDIA's architecture evolved, so only
NV3-based GPUs will be analyzed.

The memory mapping is split into three primary components, all
exposed via memory-mapped I/O through Base Address Registers (BAR) in
PCI configuration space; there is no port I/O support outside of the
Weitek core's registers for SVGA compatibility. The RIVA 128 uses two
BARs, both 16 MB in size: BAR0 holding the main GPU registers, and
BAR1 holding the DFB and RAMIN areas (which really refer to
overlapping areas of memory).

MMIO

This is the primary memory mapping area, set up as Base Address
Register 0 in the PCI configuration registers. This is how you speak
to the GPU: 16 MB (!) of MMIO, mapped at a memory location defined by
the system BIOS. Since the video BIOS has no access to PCI services,
it instead uses I/O ports 0x3D0-0x3D3 in the Weitek SVGA core, mapped
to a mechanism called RMA (Real Mode Access); a 32-bit address is
formed by writing to all four RMA registers, then the next read/write
to the VGA I/O region is redirected to the MMIO area, allowing the
VBIOS to access it from real mode and initialise the GPU.

This MMIO area has numerous functional subsystems of the GPU mapped
into it, with some overlap. The actual function of each graphics
object will be described later.

      Range         Name                    Purpose
0x0-0xFFF         PMC      Controls the GPU functional units and
                           interrupt state
0x1000-0x1FFF     PBUS     Controls the 128-bit internal bus
0x1800-0x18FF     PCI      Mirror of PCI configuration registers
                  mirror
0x2000-0x3FFF     PFIFO    FIFO buffer for graphics command
                           submission from DMA
0x4000-0x4FFF     PRM      Real mode device support (e.g. MPU-401)
0x6000-0x6FFF     PRAM     Controls RAMIN area configuration
0x7000-0x7FFF     PRMA     Real Mode Access registers
0x9000-0x9FFF     PTIMER   Custom programmable interval timer
0xA0000-0xAFFFF   PRMFB    Real Mode Framebuffer: emulated VGA video
                           memory
0xC0000-0xCFFFF   PRMVIO   Real Mode Video: VGA emulation registers
                           (Weitek)
0x100000-0x100FFF PFB      Framebuffer interface (config, debug,
                           initialisation)
0x101000-0x101FFF PEXTDEV  External Device interface
0x101000          PSTRAPS  Device configuration bits (set at factory)
0x110000-0x110FFF PROM     Video BIOS mirror
0x120000-0x120FFF PALT     External memory access mirror (unknown,
                           possible NV1 remnant)
0x200000-0x200FFF PME      Mediaport: External MPEG decoder interface
0x400000-0x401FFF PGRAPH   2D/3D graphics engine: Core
0x410000-0x411FFF UBETA    2D/3D graphics engine: Beta factor object
0x420000-0x421FFF UROP     2D/3D graphics engine: Render operation
                           object
0x430000-0x431FFF UCHROMA  2D/3D graphics engine: Chroma key object
0x440000-0x441FFF UPLANE   2D/3D graphics engine: Plane mask object
0x450000-0x451FFF UCLIP    2D/3D graphics engine: Clip object
0x460000-0x461FFF UPATT    2D/3D graphics engine: Blit pattern object
                           (e.g. for BitBLT)
0x470000-0x471FFF URECT    2D/3D graphics engine: Rectangle object
0x480000-0x481FFF UPOINT   2D/3D graphics engine: Point object
0x490000-0x491FFF ULINE    2D/3D graphics engine: Line object
0x4A0000-0x4A1FFF ULIN     2D/3D graphics engine: Lin (line without
                           starting or ending pixels)
0x4B0000-0x4B1FFF UTRI     2D/3D graphics engine: Triangle object
                           (possible NV1 leftover)
0x4C0000-0x4C1FFF UW95TXT  2D/3D graphics engine: Windows 95 GDI text
                           acceleration object
0x4D0000-0x4D1FFF UMEMFMT  2D/3D graphics engine: Memory to memory
                           format object
0x4E0000-0x4E1FFF USCALED  2D/3D graphics engine: Scaled image from
                           memory object
0x500000-0x501FFF UBLIT    2D/3D graphics engine: Blit object
0x510000-0x511FFF UIMAGE   2D/3D graphics engine: Image object
0x520000-0x521FFF UBITMAP  2D/3D graphics engine: Bitmap object
0x540000-0x541FFF UTOMEM   2D/3D graphics engine: Transfer to memory
                           object
0x550000-0x551FFF USTRTCH  2D/3D graphics engine: Stretched image
                           from CPU object
0x570000-0x571FFF UD3D0Z   2D/3D graphics engine: Direct3D 5.0
                           triangle w/zeta buffer^11 object
0x580000-0x581FFF UPOINTZ  2D/3D graphics engine: Point w/zeta buffer
                           ^11
0x5C0000-0x5C1FFF UINMEM   2D/3D graphics engine: Image in memory
                           object
0x601000-0x601FFF PRMCIO   VGA CRTC registers
0x680000-0x6802FF PVIDEO   Video overlay engine
0x680300-0x680FFF PRAMDAC  Video signal generation, cursor, CLUT,
                           clock generation
0x681200-0x681FFF USER_DAC Optional for external DAC?
0x800000-0xFFFFFF USER     Graphics object submission area (for
                           PFIFO, via DMA)

DFB

DFB is the aptly-named "Dumb Framebuffer", a linear framebuffer set
up as Base Address Register 1 on the NV3 (moved to BAR0 at 0x1000000
on later GPUs). The default size of 4 MB may change depending on VRAM
size. This area is presumably meant for manipulating the GPU without
using its DMA facilities.

RAMIN

RAMIN is also located in BAR1. It's a somewhat complicated area, but
also the most important one to understand when it comes to the actual
operation of the GPU, as it's the part of video RAM where graphics
objects and structures containing references to them are stored.

This area is effectively the last megabyte of VRAM (regardless of
VRAM size), but organized as 16-byte blocks which are then stored
from the top down. A RAMIN address can be converted to a real VRAM
address with the formula ramin_address ^ (vram_size - 16). I'm not
entirely sure why they did this, but I assume it was for providing a
more convenient interface to the user and for general efficiency
reasons.

Interrupts

A traditional interrupt system is implemented, supporting interrupts
issued by different GPU components. PMC contains an interrupt status
register and an interrupt enable register, with one bit for each
component (including the eventually-removed PAUDIO), as well as a
software interrupt represented by bit 31; components also have a
local status register and enable register, with each bit representing
an individual interrupt from that block. If the PMC interrupt status
and enable bits for a given component are both 1, with some minor
exceptions to be explained in later parts, an interrupt is declared
to be pending and a PCI IRQ is sent.

Interrupts can be turned off globally (or just component interrupts,
or just the software interrupt) via the PMC_INTR_EN register.

Programmable interval timer

Time-sensitive functions are provided by a relatively simple
programmable interval timer PTIMER that fires an interrupt whenever
the threshold value (set by the PTIMER_ALARM) is exceeded in
nanoseconds. This is how the drivers internally keep track of many
actions that they need to perform, and is the first functional block
which must be done right if you ever hope to emulate the RIVA 128.

The least straightforward part of this timer is the counter itself, a
56-bit value split across two 32-bit registers: the lower 27 bits are
stored in bits [31:5] of PTIMER_TIME0, and the upper 29 bits are
stored in bits [28:0] of PTIMER_TIME1.

Graphics commands and DMA engine

What may be called graphics commands in other GPU architectures are
instead called graphics objects in the NV3 and all other NVIDIA
architectures. Objects are submitted into the GPU core by writing
into the NV_USER section of the MMIO BAR0 region through programmed I
/O. Despite the fact that a custom memory access engine with its own
translation lookaside buffer and other memory management structures
was implemented for graphics object types that perform memory
transfers, it does not seem to be used for graphics object submission
until the NV4 architecture. Existing documentation is contradictory
on whether or not this exists on the NV3, but drivers do not seem to
use DMA to submit graphics objects; if a DMA submission method
exists, it certainly works very differently to later versions of the
architecture.

There are 8 DMA channels, with the default being channel 0 (also the
only channel accessible through PIO?), but only one can be used at a
time; using other channels requires a context switch, which entails
writing the current channel ID to to PGRAPH registers for every
class. All DMA channels use 64 KB of RAMIN memory (to be explained
later), further divided into 8 KB (0x2000) subchannels, effectively
representing one object; the meaning of what is in those subchannels
depends on the type (or class to use NVIDIA terminology) of the
object submitted into them, with the attributes of each object being
called a method. A simple way to program the GPU is to just create
subchannels for specific objects (such as one for text, one for
rectangle, and so on) and change their data and methods as the
program runs in order to create a graphical effect; however, this is
a severely limited approach^12, and tapping the chip's full potential
requires the use of context switches between DMA channels, as well as
the additional classes implemented in software by the drivers.

All objects have a context, consisting of a 32-bit "name" and another
32-bit value storing its class, associated channel and subchannel ID,
where it is relative to the start of RAMIN, and whether it's a
software-injected or hardware graphical rendering object (bit 31).
Contexts are stored in an area of RAM called RAMFC if the object's
channel is not being used; otherwise, they are stored in RAMHT, a
hash table where the hash key is a single byte calculated by XORing
each byte of the object's name^13 as well as the channel ID. Objects
are stored in RAMHT as structures consisting of their 8-byte context
followed by the methods mentioned earlier; an object's byte offset in
RAMHT is its hash multiplied by 16.

The exact set of methods of every graphics object in the architecture
is incredibly long and often shared between several different types
of objects (although the first 256 bytes and usually a few more after
that are shared), and thus won't be listed in part 1. An overall list
of graphics objects can be found in the next section, but note that
these are the ones defined by the hardware, while the drivers
implement a much larger set of objects that do not map exactly to the
ones in the GPU; furthermore, as you will see later, as each object
is quite large at 8 KB, only one object does not mean only one (or
even any at all - some are used to represent DMA objects, for
example) graphics objects are drawn once the object is process.
Objects can also be connected together with a special type of object
called a "patchcord" constructed by the Resource Manager; the name is
a remnant from the old NV1 quad patching days.

After being written to NV_USER, graphics objects are sent to one of
two caches within the PFIFO subsystem: CACHE0 which holds a single
entry (really intended for the notifier engine to be able to inject
graphics commands from software), or CACHE1 which holds 32 entries on
revisions A-B and 64 on revision C onwards. What these critical
components actually do will be explored in full in later parts, but
they effectively just store object names and contexts as they are
waiting to be sent to RAMIN; a "pusher" pushes objects in from the
bus as they are written into NV_USER, and a "puller" pulls them out
of the bus and sends them where they need to be inside of the VRAM
(or to RAMRO if they are invalid).

Once objects are pulled out, the GPU will simply manipulate the
various registers in the PGRAPH subsystem in order to draw the object
(if the object is actually rendered), and/or perform any DMA
operations the graphics object may require using the DMA engine.
Objects do not appear to "disappear" on frame refresh; instead, it
would simply appear that they are simply drawn over, and most likely,
any renderer will simply clear the entire screen (with a Rectangle
object for instance) before resubmitting any graphics objects they
need to render.

Both RAMFC and RAMHT can have their sizes, and to some extent their
location within RAMIN, configured by registers within the PFIFO
block. RAMHT can be 4 KB (of questionable usefulness as that cannot
fill CACHE1), 8 KB, 16 KB, or 32 KB in size, while RAMFC is either
512 bytes or 8 KB.

Object list

Any class values not listed here are invalid; in theory, the 5-bit
value in the object context allows for 32 classes, but NVIDIA did not
implement the full amount, and moved to a different approach (where
the classes are somewhat more constructed in software) with the NV4
architecture.

0x01 (Beta factor): The beta factor used for blending operations.
(combining an output pixel with another pixel to produce a final
image)

0x02 (ROP5 operation): The Render OPeration used for blending. (e.g.
XOR)

0x03 (Chroma Key): Similar to a color key used in video editing.

0x04 (Plane mask): Seems to be implemented similarly to Chroma Key,
not sure what it has to do with planes. (bitplane? 2D plane?)

0x05 (Clipping rectangle): A rectangle used for enabling/disabling
render operations within a specific region.

0x06 (Pattern): Pattern used for bitblit and other blits.

0x07 (Rectangle): Up to 16 rectangles with size and position
represented as a 32-bit value. (low 16 bits are X, high 16 bits are
Y)

0x08 (Point): An arbitrary point on the screen. Depending on the
methods used to submit the object, this object can take the form of:

  * Up to 32 points, each with a single arbitrary 32-bit colour
    (probably BGRA format) and 16-bit size and position values;
  * Up to 16 points, each with a single arbitrary 32-bit colour
    (probably BGRA format) and 32-bit size and position values;
  * Up to 16 points, making up a polygon, with an arbitrary 32-bit
    colour for each polygon line (probably BGRA format) and 16-bit
    size and position values.

0x09 (Line): An arbitrary line on the screen. Depending on the
methods used to submit the object, this object can take the form of:

  * Up to 16 lines, each with a single arbitrary 32-bit colour
    (probably BGRA format) and 16-bit size and position values;
  * Up to 8 lines, each with a single arbitrary 32-bit colour
    (probably BGRA format) and 32-bit size and position values;
  * Up to 32 lines, each making up a polygon, with a single arbitrary
    32-bit colour (probably BGRA format) and 16-bit size and position
    values;
  * Up to 16 lines, each making up a polygon, with a single arbitrary
    32-bit colour (probably BGRA format) and 32-bit size and position
    values;
  * Up to 16 lines, each making up a polygon, with an arbitrary
    32-bit colour for each polygon line (probably BGRA format) and
    16-bit size and position values.

0x0A (Lin): Same as Line, but the starting and ending pixels are not
drawn for each line.

0x0B (Triangle): A basic (presumably pre-transformed?) 2D triangle.
Depending on the methods used to submit the object, this object can
take the form of:

  * A single triangle with a single arbitrary 32-bit colour and three
    16-bit position values for each of the triangle's vertexes;
  * A single triangle with a single arbitrary 32-bit colour for the
    entire mesh, and three 16-bit position values for each of the
    triangle's vertexes;
  * A part of a mesh of up to 32 triangles with a single arbitrary
    32-bit colour and two 16-bit position values for each of the
    points on the mesh;
  * A part of a mesh of up to 16 triangles with a single arbitrary
    32-bit colour and two 32-bit position values for each of the
    points on the mesh;
  * A set of up to 8 triangles with a single arbitrary 32-bit colour
    for the entire mesh, and three 16-bit position values for each of
    the triangle's vertexes;
  * A part of a mesh of up to 16 triangles with a 32-bit colour and
    two 32-bit position values for each of the points on the mesh.

0x0C (Windows 95 GDI Text Acceleration): A specialized hardware
accelerator for the manner by which Windows 95's GDI (and its DIB
Engine?) renders text. This is a very complicated set of clipping
logic that won't be covered until Part 3; it's too long for this
part, and I don't fully understand it yet.

0x0D (Memory to memory format): Changes the format of a set of pixels
in VRAM. Allows for changing the line (vertical size) length, count
and pitch of the image.

0x0E (Scaled image from memory): Obtains an image from VRAM (in YUV
or RGB format) and scales it before displaying it to the screen,
performing a bit of differentiation to achieve this. Parameters an
output position and size for the final screen as well as an input
position or size.

0x10 (Blit): Blits an image (a final one made up of 3D polygons or a
2D one) between two different parts of the screen. Has an input and
output position and a size.

0x11 (Image from CPU): Takes an image from "CPU" (main memory?),
optionally scales it, and then displays it on the screen. Parameters
are an input size, set of 32-bit colour values and output position
and size.

0x12 (Bitmap): Similar to Image from CPU, but deals with monochrome
or two-colour bitmaps instead, possibly as an optimisation.

0x14 (Transfer to Memory): Takes an image from the screen (?) and
transfers it to memory. Parameters are a start position offset from
VRAM and a pitch, as well as a position and size for the image.

0x15 (Stretched image from CPU): Takes an image from "CPU" (main
memory?), stretches it using an optional clip region and a little bit
of differentiation, and then uses it. Parameters are an input size
and a clip region using the same 16-bit coordinate format used by the
basic primitive drawing silicon.

0x17 (Direct3D 5.0 accelerated triangle with zeta buffer): Seemingly
an attempt to implement the Direct3D 5.0 specification to the letter
in silicon. Allows for up to 128 triangles to be submitted at a time,
with six coordinates:

  * The traditional X, Y and Z coordinates used for representing
    vector values in 3D space
  * U and V coordinates for textures. Textures may be uploaded at
    sizes up to 2048x2048 (only power of two textures are allowed!),
    but are scaled down to 256x256 during upload, if they are larger.
  * An "M" coordinate, apparently a "measurement dimension" used for
    more precise measurement of real-world distances

Each triangle may have a 32-bit colour value as well. Note that the
RIVA 128 is not a multitexture-capable GPU; you can only apply one
texture to each batch of 128 triangles, so the implementation of
Direct3D in the drivers should attempt to send as many triangles with
the same texture to the GPU as the GPU can fit, as closely as
possible. If you write applications targeting this GPU, you should
try ensuring objects with the same texture add up to close to a
multiple of 128 triangles, as the D3D driver's implementation of this
optimisation will improve the efficiency of your renderer.

These triangles, as a group, may have the following effects applied
to them:

  * "Zeta buffer" (may be similar to the Z-buffer used for polygon
    ordering, or for mipmapping?)
  * "Alpha buffer" (probably for alpha blending)
  * Specular highlighting
  * Vertex fog (of any 32-bit colour)
  * Interpolation between vertex positions (using a zero-order hold,
    "Microsoft" variant of zero-order hold, or full-order hold
    implementation)
  * Frustum culling clockwise or counterclockwise (discarding
    triangles, although presumably this would only work in the batch
    of 128 triangles sent to the hardware for processing)
  * Texture UV coordinate wrapping for seamless textures (coordinates
    can wrap cleanly, be clamped to their "last" pixels or mirror
    themselves)

0x18 (Point with zeta buffer): Similar to Point, but the zeta and
alpha buffer can be applied to it too.

When you screw up: RAMRO

Aside from the previously-covered RAMFC and RAMAU, another important
structure is stored in RAMIN. RAMRO saves the day and prevents the
GPU from blowing up if a graphics object you submit is invalid,
because after all, nothing is perfect and there are always bugs in
code.

During object submission, if the GPU detects that the cache ran out,
was turned off, or any kind of illegal access was performed, the
submission is not processed; instead, it is sent to a special area of
RAMIN known as RAMRO (always half the size of RAMHT), which stores
the object, what went wrong, and whether a read or write operation
was involved in the error. Additionally, an interrupt is fired so
that any drivers running on the system can catch the error and
(hopefully) correct it.

The PFIFO_RUNOUT_STATUS register holds the current state of the RAMRO
region, including whether or not any errors have occurred.

RAMAU

RAMAU was an area used on the NV1 and revision A NV3 chips for
storing audio data being streamed into the CPU. On Revision B and
later cards, this area is still mapped to MMIO space, but its
functionality has been removed entirely.

Interrupts 2.0: Notifiers

Some people at NVIDIA decided that they were too cool for interrupts
and thought: why have an interrupt that tells the GPU to do
something, when you could have an interrupt that has the GPU tell the
drivers to do something? And thus the incredible notifier system was
born.

Notifiers appear to have been implemented to allow the drivers to
manage GPU resources implemented in software instead of silicon.
Every single subsystem in the GPU has a notifier enable register
alongside its interrupt enable register, with some having multiple
enable registers for different notifier types; notifiers are mostly
found within PGRAPH, PME and PVIDEO, but may also exist in other
subsystems.

PGRAPH notifiers appear to be intended to work with the object class
system, and actually differ - there is basically one "type" of
notification depending on the object class, with each object having a
set of "notification parameters" that can be used to trigger the
SetNotify method at 0x104 within an object stored in RAMHT. There is
also the SetNotifyCtxDma method, usually but not always at 0x0, which
is used for context switching. Notifiers appear to be "requested"
until the GPU processes them, and PGRAPH can take up to 16 software
and 1 hardware notifier type. Objects are signalled to the driver by
directly DMAing into the Resource Manager memory. Notifications are
represented by a structure as such:

typedef struct {
    struct {
        uint32_t nanoseconds[2];
    } TimeStamp;
    int32_t info32;
    int16_t info16;
    int16_t status; /* 0xFF (NV_NOTIFICATION_IN_PROGRESS) = pending; 0x00 = complete; others = error */
} NvNotification;

More research is ongoing. It appears that most notifiers are
generated by the driver in order to manage hardware resources that
they would not otherwise be capable of managing, such as the PFIFO
caches.

PRAMDAC

The final part of the GPU that handles the intricacies of generating
a video signal, sets the resolution, and holds a color lookup table
for the various modes.

I haven't looked into this part as much, so expect more information
in an update on this part or in future parts of this series. It's not
really super critical to emulate anyway, outside of the clock control
portion, as the actual analog video generation part mostly does not
apply to emulation.

---------------------------------------------------------------------

Next part

The next part will dive into how NVIDIA's drivers work and how they
make this ridiculously complicated mess of an architecture transform
itself into a GPU that allows you to run games you may actually want
to play. Stay tuned!

---------------------------------------------------------------------

 1. Only Matrox is both still around and still in the graphics space,
    after exiting the consumer market in 2003 and ceasing to design
    graphics cards entirely from 2014 until their recent comeback
    with Intel Arc-based products. Cirrus Logic is still around as an
    audio chip designer, stemming from their acquisition of Crystal
    in 1991. -

 2. Source: Strategic Collaboration Agreement between NVIDIA and
    SGS-Thomson, originally covering NV1 but later revised to include
    NV3, apparently part of a filing with the US Securities and
    Exchange Commission. -

 3. S3's later proprietary API for the Savage family, not to be
    confused with Apple's Metal from many years later. -

 4. By 1997, SGI had over 15 years of experience in developing
    graphics hardware, while also suffering from rampant
    mismanagement and experiencing the start of what would later
    prove to be their terminal decline. -

 5. In the edge interpolation process, a triangle is converted into
    "spans" of horizontal lines, and the positions of nearby vertexes
    are used to determine the span's start and end positions. -

 6. To simplify a complex topic, in a GPU of this era, span
    interpolation generally involves Z-buffering (also known as depth
    buffering), sorting polygons back to front, and color buffering,
    storing the color of each pixel sent to the screen in a buffer
    which allows for blending and alpha transparency. Some GPUs do
    not implement a Z-buffer and delegate polygon sorting to software
    instead; examples include the NV1, original ATI Rage and the
    PlayStation 1's Geometry Transformation Engine. -

 7. David Kirk is perhaps notable as a "Special Thanks" credit on Gex
    and the producer of the truly unparalleled 3D Baseball on Sega
    Saturn during his time at Crystal Dynamics. -

 8. NVIDIA's revenue in the first nine months of 1997 was only $5.5
    million, but skyrocketed up to $23.5 million in the last three
    months, corresponding to the first three months of the RIVA 128's
    availability, owing to the numerous sales of chips to add-in
    board partners. -

 9. While there are mentions of quality problems with early cards in
    a lawsuit involving STB Systems, the RIVA 128's first OEM
    partner, it is not clear if the problems were on STB or NVIDIA's
    end. -

10. Mipmapping is a graphical technique involving scaling down
    textures as you move away from an object in order to prevent
    shimmering. -

11. A zeta buffer is NVIDIA parlance for a combined Z-buffer (a
    buffer within the framebuffer for sorting polygons based on their
    distance from the camera) and stencil buffer (a buffer for
    discarding parts of an image). In this case, a 16-bit Z-buffer
    and 8-bit stencil buffer are interleaved. This evolved to a
    "super zeta buffer" on later NVIDIA GPUs. - -^2

12. NVIDIA have successfully deployed this approach on simpler
    projects, such as early versions of their Windows NT miniport
    drivers, before the full Resource Manager was able to be ported. 
    -

13. Object names below 4096 are reserved on NVIDIA's drivers, which
    also have the duty to prevent the hash table area from getting
    full with only basic error handling from the hardware itself. -

86Box is maintained by Miran Grca (OBattler) and the 86Box developers
. Website developed by Foxlet with retro-reduced version by
richardg867.