[HN Gopher] Ask HN: How does a CPU communicate with a GPU?
___________________________________________________________________
Ask HN: How does a CPU communicate with a GPU?
I've been learning about computer architecture [1] and I've become
comfortable with my understanding of how a processor communicates
with main memory - be it directly, with the presence of caches or
even virtual memory - and I/O peripherals. But something that
seems weirdly absent from the courses I took and what I have found
online is how the CPU communicates with other processing units,
such as GPUs - not only that, but an in-depth description of
interconnecting different systems with buses (by in-depth I mean an
RTL example/description). I understand that as you add more
hardware to a machine, complexity increases and software must
intervene - so a generalistic answer won't exist and the answer
will depend on the implementation being talked about. That's fine
by me. What I'm looking for is a description of how a CPU tells a
GPU to start executing a program. Through what means do they
communicate - a bus? How does such a communication instance look
like? I'd love get pointers to resources such as books and
lectures that are more hands-on/implementation aware. [1] Just so
that my background knowledge is clear: I've concluded NAND2TETRIS,
watched and concluded Berkeley's 2020 CS61C and have read a good
chunk of H&P (both Computer Architecture: A Quantitative Approach
and Computer Organization and Design: RISC-V edition), and now am
moving on to Onur Mutlu's lectures on advanced computer
architecture.
Author : pedrolins
Score : 58 points
Date : 2022-03-30 20:17 UTC (2 hours ago)
| simne wrote:
| Lot of things happen there.
|
| But most important, PCIe bus is serial bus, which have
| virtualized interface, so there is no physical process of
| communication, what happen more similar to Ethernet network, mean
| on each device exists few endpoints, each has it's own controller
| with its own address and few registers to store state and
| transitions, and memory buffer(s).
|
| Videocards usually have many behaviors. In simplest modes, they
| behave just as RAM mapped to large chunk of system RAM space,
| plus video registers to control video output, and to control
| address mapping of video ram, and to switch modes.
|
| In more complex modes, Videocards generate interrupts (just
| special type of message on PCIe).
|
| In 3D modes, which are most complex, Videocontroller take data
| from its own memory (which mapped to system space), there are
| stored tree of graphic primitives, some draw directly from
| videoram, but for others used bus master option of PCIe, in which
| videocontroller read additional data (textures) from predefined
| chunks of system RAM.
|
| About GPU operation, usually, CPU copy data to Videoram directly,
| than ask videocontroller to run program in videoram, and when
| complete, GPU issue interrupt, and than CPU copied result from
| videoram.
|
| Recent additions where, add GPU possibility to read data from
| system disks, using mentioned before bus master, but those
| additions are not already wide implemented.
| simne wrote:
| For beginner, I think the best to begin read about Atari
| consoles, Atari-65/130, NES, as their ideas where later
| implemented in all commodity videocards, just slightly
| extended.
|
| BTW all modern videos use bank-switching.
| melenaboija wrote:
| It is old and I am not sure everything still applies but I found
| this course useful to understand how GPUs work:
|
| Intro to Parallel Programming:
|
| https://classroom.udacity.com/courses/cs344
|
| https://developer.nvidia.com/udacity-cs344-intro-parallel-pr...
| aliasaria wrote:
| There is some good information on how PCI-Express works here:
| https://blog.ovhcloud.com/how-pci-express-works-and-why-you-...
| dragontamer wrote:
| I'm no expert on PCIe, but its been described to me as a network.
|
| PCIe has switches, addresses, and so forth. Very much like IP-
| addresses, except PCIe operates on a significantly faster level.
|
| At its lowest-level, PCIe x1 is a single "lane", a singular
| stream of zeros-and-ones (with various framing / error correction
| on top). PCIe x2, x4, x8, and x16 are simply 2x, 4x, 8x, or 16
| lanes running in parallel and independently.
|
| -------
|
| PCIe is a very large and complex protocol however. This "serial"
| comms can become abstracted into Memory-mapped I/O. Instead of
| programming at the "packet" level, most PCIe operations are seen
| as just RAM.
|
| > even virtual memory
|
| So you understand virtual memory? PCIe abstractions go up to and
| include the virtual memory system. When your OS sets aside some
| virtual-memory for PCIe devices, when programs read/write to
| those memory-addresses, the OS (and PCIe bridge) will translate
| those RAM reads/writes into PCIe messages.
|
| --------
|
| I now handwave a few details and note: GPUs do the same thing on
| their end. GPUs can also have a "virtual memory" that they
| read/write to, and translates into PCIe messages.
|
| This leads to a system called "Shared Virtual Memory" which has
| become very popular in a lot of GPGPU programming circles. When
| the CPU (or GPU) read/write to a memory address, it is then
| automatically copied over to the other device as needed. Caching
| layers are layered on top to improve the efficiency (Some SVM may
| exist on the CPU-side, so the GPU will fetch the data and store
| it in its own local memory / caches, but always rely upon the CPU
| as the "main owner" of the data. The reverse, GPU-side shared
| memory, also exists, where the CPU will communicate with the
| GPU).
|
| To coordinate access to RAM properly, the entire set of atomic
| operations + memory barriers have been added to PCIe 3.0+. So you
| can perform "compare-and-swap" to shared virtual memory, and
| read/write to these virtual memory locations in a standardized
| way across all PCIe devices.
|
| PCIe 4.0 and PCIe 5.0 are adding more and more features, making
| PCIe feel more-and-more like a "shared memory system", akin to
| cache-coherence strategies that multi-CPU / multi-socket CPUs use
| to share RAM with each other. In the long term, I expect Future
| PCIe standards to push the interface even further in this "like a
| dual-CPU-socket" memory-sharing paradigm.
|
| This is great because you can have 2-CPUs + 4 GPUs on one system,
| and when GPU#2 writes to Address#0xF1235122, the shared-virtual-
| memory system automatically translates that to its "physical"
| location (wherever it is), and the lower-level protocols pass the
| data to the correct location without any assistance from the
| programmer.
|
| This means that a GPU can do things like perform a linked-list
| traversal (or tree traversal), even if all of the nodes of the
| tree/list are in CPU#1, CPU#2, GPU#4, and GPU#1. The shared-
| virtual-memory paradigm just handwaves the details and lets PCIe
| 3.0 / 4.0 / 5.0 protocols handle the details automatically.
| simne wrote:
| I agree that PCIe is mostly shared memory system.
|
| But for videocards this sharing is unequal, because their RAM
| sizes exceeds 32bit address space, and lot of still used
| mainboards have 32bit PCIe controller, so all PCIe addresses
| should be inside 4GB address space, and this is seen on windows
| machines as total installed memory is nor all, but minus
| approximately 0.5GB, from which 256MB is videoram access
| window.
|
| So in most cases, remain in force rule, that videocard share
| all it's memory through 256mb window using bank-switching.
|
| As for GPU read main system memory, usually this is useless,
| because vram is magnitudes faster, even if not consider usage
| of bus bandwidth by other devices, like HDD/SSD.
|
| And in most cases, only usage of access GPU to main system
| memory, is traditional read of textures (for 3D accelerator)
| from main system memory - for example ALL 3D software using GPU
| rendering, could only use for this videoram, none use system
| ram.
| roschdal wrote:
| Through the electrical wires in the PCI express port.
| danielmarkbruce wrote:
| I could be misunderstanding the context of the question, but I
| think OP is imagining some sophisticated communication logic
| involved at the chip level. The CPU doesn't know anything much
| about the GPU other than it's there and data can be sent back
| and forth to it. It doesn't know what any of the data means.
|
| I think the logic OP imagines does exist, but it's actually in
| the compiler (eg the cuda compiler), figuring exactly what
| bytes to send which will start a program etc.
| coolspot wrote:
| Not in the compiler but in GPU driver. A graphic program (or
| compute) just calls APIs (DirectX/Vulkan/CUDA) of a driver,
| which then knows how to do that on a low-level writing to
| particular regions of RAM mapped to GPU registers.
| danielmarkbruce wrote:
| Yes! This is correct. My bad, it's been too long. I guess
| either way the point is that it's done in software, not
| hardware.
| lxgr wrote:
| There's also odd/interesting architectures like one of
| the earlier Raspberry Pis, where the GPU was actually
| running its own operating system that would take care of
| things like shader compilation.
|
| In that case, what's actually being written to
| shared/mapped memory is very high level instructions that
| are then compiled or interpreted on the GPU (which is
| really an entire computer, CPU and all) itself.
| alberth wrote:
| Nit pick...
|
| Technically it's not "through" the electrical wires, it's
| actually through the electrical field created _around_ the
| electrical wires.
|
| Veritasium explains https://youtu.be/bHIhgxav9LY
| tux3 wrote:
| Nitpicking the nitpick: the energy is what's in the fields,
| but the electrical wires aren't just for show, the electrons
| do need to be able to move in the wire for there to be a
| current, and the physical properties of the wire have a big
| impact on the signal.
|
| So things get very complicated and unintuitive, especially at
| high frequencies, but it's okay to say through the wire!
| a9h74j wrote:
| And as you might be alluding, particularly high
| frequencies: in the skin (via skin effect) of the wire!
|
| I'll confess I have never seen a plot of actual rms current
| density vs radius related to skin effect.
| rayiner wrote:
| Typically CPU and GPU communicate over the PCI Express bus. (It's
| not technically a bus but a point to point connection.) From the
| perspective of software running on the CPU, these days, that
| communication is typically in the form of memory-mapped IO. The
| GPU has registers and memory mapped into the CPU address space
| using PCIE. A write to a particular address generates a message
| on the PCIE bus that's received by the GPU and produces a write
| to a GPU register or GPU memory.
|
| The GPU also has access to system memory through the PCIE bus.
| Typically, the CPU will construct buffers in memory with data
| (textures, vertices), commands, and GPU code. It will then store
| the buffer address in a GPU register and ring some sort of
| "doorbell" by writing to another GPU register. The GPU
| (specifically, the GPU command processor) will then read the
| buffers from system memory, and start executing the commands.
| Those commands can include, for example, loading GPU shader
| programs into shader memory and triggering the shaders to execute
| those shaders.
| Keyframe wrote:
| If OP or anyone else wants to see this firsthand.. well shit, I
| feel old now, but.. try an exercise into assembly programming
| of commodore 64. Get a VICE emulator and dig into it for a few
| weeks. It's real easy to get into, CPU (6502 based), video chip
| (VIC II), sound chip (famous SID), ROM chips.. they all love in
| this address space (yeah, not mentioning pages), CPU has three
| registers.. it's also real fun to get into, even to this day.
| vletal wrote:
| Nice exercise. Similarly I learned most about basic computer
| architecture by programing 8050 in ASM as well as C.
|
| And I'm 32. Am I old yet? I'm not right? Right?
| silisili wrote:
| Sorry pal!
|
| I remember playing Halo in my early 20's, and chatting with
| a guy from LA who was 34. Wow, he's so old, why was he
| still playing video games.
|
| Here I sit in my late 30's...still playing games when I
| have time, denying that I'm old, despite the noises I make
| getting up and random aches and pains.
| Keyframe wrote:
| 40s are new thirties, my friend. Also, painkillers help.
| jeroenhd wrote:
| There's a nice guide by Ben Eater on Youtube about a
| breadboard computers: https://www.youtube.com/playlist?list=P
| LowKtXNTBypFbtuVMUVXN...
|
| It doesn't sport any modern features like DMA, but builds up
| from the core basics: a 6502 chip, a clock, and a blinking
| LED, all hooked up on a breadboard. He also built a basic VGA
| card and explains protocols like PS/2, USB, and SPI. It's a
| great introduction or refresher into the low level hardware
| concepts behind computers. You can even buy kits to play
| along at home!
| zokier wrote:
| Is my understanding correct that compared to those historical
| architectures, modern GPUs are a lot more asynchronous?
|
| What I mean that these days you'd issue a data transfer or
| program execution on the GPU, they will complete at its own
| pace and the CPU in the meanwhile continues executing other
| code; in contrast in those 8 bitters you'd poke a video
| register or whatev and expect that to have more immediate
| effect allowing those famous race the beam effects etc?
| Keyframe wrote:
| There were interrupts telling you when certain things
| happened. If anything, it was asynchronous. Big thing is
| also that you had to tally the cost of what you eere doing.
| There was a budget of how many cycles you got per line, per
| screen and then fit whatever you had to in that. With
| playing sound it was common to draw color when you fed the
| music into SID so you could tell, like a crude debug/ad hoc
| printf, how many cycles your music routines ate.
| divbzero wrote:
| Going one deeper, how does the communication work on a physical
| level? I'm guessing the wires of the PCI Express bus passively
| propagate the voltage and the CPU and GPU do "something" with
| that voltage?
| throw82473751 wrote:
| Voltages yes.. usually its all binary digital signals,
| running serial/parallel and following some communication
| protocol. Maybe you should have a look at something really
| simple/old like UART communication to get some idea how this
| works and then study next how this is scaled up over PCIE to
| understand the chat between CPU/GPU?
|
| Or maybe not, one does not need all the details, so often
| just scaled concepts :)
|
| https://en.m.wikipedia.org/wiki/Universal_asynchronous_recei.
| ..
|
| Edit: Wait it is really already QAM over PCIE? Yeah then UART
| is a gross simplification, but maybe still a good one to
| start with depending on knowledge level?
| _3u10 wrote:
| https://pcisig.com/sites/default/files/files/PCI_Express_El
| e... It doesn't say QAM explicitly but it has all the QAM
| terminology like 128 codes. Inter symbol interference etc.
| I'm not an RF guy by any stretch but it sounds like QAM to
| me.
|
| This is an old spec. I think it's like equivalent to
| QAM-512 for PCIe 6
| rayiner wrote:
| PCI-E isn't QAM. It's NRZ over a differential link, with
| 64/66b encoding, and then scrambled to reduce long runs of
| 0s or 1s.
| wyldfire wrote:
| It might be easier to start with older or simpler/slower
| buses. ISA, SPI, I2C. In some ways ISA is very different -
| latching multiple parallel channels together instead of
| ganging independent serial lanes. But it makes sense to start
| off simple and consider the evolution. Modern PCIe layers
| several awesome technologies together, especially FEC.
| Originally they used 8b10b but I see now they're using
| 242b256b.
| rayiner wrote:
| Before you get that deep, you need to step back for a bit.
| The CPU is itself several different processors and
| controllers. Look at a modern Intel CPU:
| https://www.anandtech.com/show/3922/intels-sandy-bridge-
| arch.... The individual x86 cores are connected via a ring
| bus to a system agent. The ring bus is a kind of parallel
| bus. In general, a parallel bus works by having every device
| on the bus operating on a clock. At each clock tick (or after
| some number of clock ticks), data can be transferred by
| pulling address lines high or low to signify an address, and
| pulling data lines high or low to signify the data value to
| be written to that address.
|
| The system agent then receives the memory operation and looks
| at the system address map. If the target address is PCI-E
| memory, it generates a PCI-E transaction using its built-in
| PCI-E controller. The PCI-E bus is actually a multi-lane
| serial bus. Each lane is a pair of wires using differential
| signaling
| (https://en.wikipedia.org/wiki/Differential_signalling). Bits
| are sent on each lane according to a clock by manipulating
| the voltages on the differential pairs. The voltage swings
| don't correspond directly to 0s and 1s. Because of the data
| rates involved and the potential for interference, cross-
| talk, etc., an extremely complex mechanism is used to turn
| bits into voltage swings on the differential pairs: https://p
| cisig.com/sites/default/files/files/PCI_Express_Ele...
|
| From the perspective of software, however, it's just bits
| sent over a wire. The bits encode a PCI-E message packet:
| https://www.semisaga.com/2019/07/pcie-tlp-header-packet-
| form.... The packet has headers, address information, and
| data information. But basically the packet can encode
| transactions such as a memory write or read or register write
| or read.
| tenebrisalietum wrote:
| Older CPUs - the CPU had a bunch of A pins (address), a bunch
| of D pins (data).
|
| The A pins would be a binary representation of an address,
| and the D pins would be the binary representation of data.
|
| A couple of other pins would select behavior (read or write)
| and allow handshaking.
|
| Those pins were connected to everything else that needed to
| talk with the CPU on a physical level, such as RAM, I/O
| devices, and connectors for expansion. Think 10-base-T
| networking where multiple nodes are physically modulating one
| common wire on an electrical level. Same concept, but you
| have many more wires (and they're way shorter).
|
| Arbitration logic was needed so things didn't step on each
| other. Sometimes things did anyway and you couldn't talk to
| certain devices in certain ways or your system would lock up
| or misbehave.
|
| Were there "switches" to isolate and select among various
| banks of components? Sure, they are known as "gate arrays" -
| those could be ASICs or implemented with simple 74xxx ICs.
|
| Things like NuBus and PCI came about - the bus controller is
| directly connected and addressable to the CPU as a device,
| but everything else is connected to the bus controller, so
| now the new-style bus isn't tied to the CPU and can operate
| at a different speed and CPU and bus speed are now decoupled.
| (This was done on video controllers in the old 8-bit days as
| well - to get to video RAM you had to talk to the video chip,
| and couldn't talk to video RAM directly on some 8-bit
| systems).
|
| PCIE is no longer a bus, it's more like switched Ethernet -
| there's packets and switching and data goes over what's
| basically one wire - this ends up being faster and more
| reliable if you use advanced modulation schemes than keeping
| multiple wires in sync at high speeds. The controllers facing
| the CPU still implement the same interface, though.
| _3u10 wrote:
| It's signaled similar to QAM. Far more complicated than GPIO
| type stuff. Think FM radio / spread spectrum rather than
| bitbanging / old school serial / parallel ports.
|
| Similar to old school modems if the line is noisy it can drop
| to lower "baud" rates. You can manually try to recover higher
| rates if the noise is gone but it's simpler to just reboot.
| tux3 wrote:
| Oh, that is _several_ levels deeper! PCIe is a big standard
| with several layers of abstraction, and it 's far from
| passive.
|
| The different versions of PCIe use a different encoding, so
| it's hard to sum it all up in a couple sentences in terms of
| what the voltage does.
| monkeybutton wrote:
| IMO memory-mapped IO is the coolest thing since sliced bread.
| It's a great example in computing where many different kinds of
| hardware can all be brought together under a relatively simple
| abstraction.
| the__alchemist wrote:
| It was a glorious "click" when learning embedded programming.
| Even when writing Rust in typical desktop uses, it all
| feels... abstract. Computer program logic. Where does the
| magic happen? Where do you go from abstract logic to making
| things happen? The answer is in voltatile memory reads and
| writes to memory-mapped IO. You write a word to a memory
| address, and a voltage changes. Etc.
| justsomehnguy wrote:
| TL;DR: bi-directional memory access with some means to notify the
| other part about "something has changed".
|
| It's not that different for any other PIC/E device, be it a
| network card or a disk/HBA/RAID controller.
|
| If you want to understand how it came to this - look at the
| history of ISA, PCI/PCI-X, a short stint for AGP and finally
| PCI-E.
|
| Other comments provides a good ELI15 for the topic.
|
| A minor note about "bus" - for PCEe it is mostly a historic term,
| because it's a serial, P2P connection, though the process of
| enumerating and qurying the devices is still very akin to what
| you would do on some bus-based system, e.g.: SAS is a serial
| "bus", compared to SCSI, but still you operate with it as some
| "logical" bus, because it is easier for humans to grok it this
| way.
| dyingkneepad wrote:
| On my system, the CPU sees the GPU as a PCI device. The "PCI
| config space" [0] is a standard thing and so the CPU can read it
| and figure out its device ID, vendor ID, revision, class, etc.
| From that, the OS looks at its PCI drivers and tries to find
| which one claims to drive that specific PCI device_id/vendor_id
| combination (or class in case there's some kind of generic
| universal driver for a certain class).
|
| From there, the driver pretty much knows what to do. But
| primarily the driver will map the registers to memory addresses,
| so accessing offset 0xF0 from that map is equivalent as accessing
| register 0xF0. The definition of what each register does is
| something that the HW developers provide to the SW developers
| [1].
|
| Setting modes (screen resolution) and a lot of other stuff is
| done directly by reading and writing to these registers. At some
| point they also have to talk about memory (and virtual addresses)
| and there's quite a complicated dance to map GPU virtual memory
| to CPU virtual memory. On discrete GPUs the data is actually
| "sent" to the memory somehow through the PCI bus (I suppose the
| GPU can read directly from the memory without going through the
| CPU?), but in the driver this is usually abstracted to "this is
| another memory map". On integrated systems both the CPU and GPU
| read directly from the system memory, but they may not share all
| caches so extra care is required here. In fact, caches may also
| mess the communication on discrete graphics, so extra care is
| always required. This paragraph is mostly done by the Kernel
| driver in Linux.
|
| At some point the CPU will tell the GPU that a certain region of
| memory is the framebuffer to be displayed. And then the CPU will
| formulate binary programs that are written in the GPU's machine
| code, and the CPU will submit those programs (batches) and the
| GPU will execute them. These programs are generally in the form
| of "I'm using textures from these addresses, this memory holds
| the fragment shader, this other holds the geometry shader, the
| configuration of threading and execution units is described in
| this structure as you specified, SSBO index 0 is at this address,
| now go and run everything". After everything is done the CPU may
| even get an interrupt from the GPU saying things are done, so
| they can notify user space. This paragraph describes mostly the
| work done by the user space driver (in Linux, this is Mesa),
| which implements OpenGL/Vulkan/etc abstractions.
|
| [0]: https://en.wikipedia.org/wiki/PCI_configuration_space [1]:
| https://01.org/linuxgraphics/documentation/hardware-specific...
| derekzhouzhen wrote:
| Other has mentioned MMIO. MMIO has several kinds:
|
| 1. CPU accessing GPU hw with uncache-able MMIO, such as lower
| level register access
|
| 2. GPU accessing CPU memory with cache-able MMIO, or DMA. such as
| command and data stream
|
| 3. CPU accessing GPU memory with cache-able MMIO, such as
| textures
|
| They all happen on the bus with different latency and bandwidth.
| ar_te wrote:
| And I you looking for some strange architecture forgoten by
| time:). https://www.copetti.org/writings/consoles/sega-saturn/
| throwra620 wrote:
| brooksbp wrote:
| Woah there, my dude. Let's try to understand a simple model
| first.
|
| A CPU can access memory. When a CPU performs loads & stores it
| initiates transactions containing the address of the memory.
| Therefore, it is a bus master--it initiates transactions. A slave
| accepts transactions and services them. The interconnect routes
| those transactions to the appropriate hardware, e.g. the DDR
| controller, based on the system address map.
|
| Let's add a CPU, interconnect, and 2GB of DRAM memory:
| +-------+ | CPU | +---m---+ |
| +---s--------------------+ | Interconnect |
| +-------m----------------+ |
| +----s-----------+ | DDR controller |
| +----------------+ System Address Map:
| 0x8000_0000 - 0x0000_0000 DDR controller
|
| So, a memory access to 0x0004_0000 is going to DRAM memory
| storage.
|
| Let's add a GPU. +-------+ +-------+ |
| CPU | | GPU | +---m---+ +---s---+ |
| | +---s------------m-------+ | Interconnect
| | +-------m----------------+ |
| +----s-----------+ | DDR controller |
| +----------------+ System Address Map:
| 0x9000_0000 - 0x8000_0000 GPU 0x8000_0000 - 0x0000_0000
| DDR controller
|
| Now the CPU can perform loads & stores from/to the GPU. The CPU
| can read/write registers in the GPU. But that's only one-way
| communication. Let's make the GPU a bus master as well:
| +-------+ +-------+ | CPU | | GPU |
| +---m---+ +--s-m--+ | | | +---s
| -----------m-s-----+ | Interconnect |
| +-------m----------------+ |
| +----s-----------+ | DDR controller |
| +----------------+ System Address Map:
| 0x9000_0000 - 0x8000_0000 GPU 0x8000_0000 - 0x0000_0000
| DDR controller
|
| Now, the GPU can not only receive transactions, but it can also
| initiate transactions. Which also means it has access to DRAM
| memory too.
|
| But this is still only one-way communication (CPU->GPU). How can
| the GPU communicate to the CPU? Well, both have access to DRAM
| memory. The CPU can store information in DRAM memory (0x8000_0000
| - 0x0000_0000) and then write to a register in the GPU
| (0x9000_0000 - 0x8000_0000) to inform the GPU that the
| information is ready. The GPU then reads that information from
| DRAM memory. In the other direction, the GPU can store
| information in DRAM memory, and then send an interrupt to the CPU
| to inform the CPU that the information is ready. The CPU then
| reads that information from DRAM memory. An alternative to using
| interrupts is to have the CPU poll. The GPU stores information in
| DRAM memory and then sets some bit in DRAM memory. The CPU polls
| on this bit in DRAM memory, and when it changes, the CPU knows
| that it can read the information in DRAM memory that was
| previously written by the GPU.
|
| Hope this helps. It's very fun stuff!
| pizza234 wrote:
| You'll find a very good introduction in the comparch book "Write
| Great Code, Volume 1", chapter 12 ("Input and Output"), which
| also explains the history of system buses (therefore, you'll find
| an explanation of how ISA works).
|
| Interestingly, there is a footnote explaining that "Computer
| Architecture: A Quantitative Approach provided a good chapter on
| I/O devices and buses; sadly, as it covered very old peripheral
| devices, the authors dropped the chapter rather than updating it
| in subsequent revisions."
| throwmeariver1 wrote:
| Everyone in tech should read the book "Understanding the Digital
| World" by Brian W. Kernighan.
| arduinomancer wrote:
| Is it very in-depth or more for layman readers?
| throwmeariver1 wrote:
| Most normal people would get a red head when reading it and
| techies would nod along and sometimes say "uh... so that's
| how it really works". It's in between but a good primer on
| the essentials.
| dyingkneepad wrote:
| Is this before or after they read Knuth?
| zoenolan wrote:
| Other are not wrong in saying Memory mapped IO. taking a look at
| the Amiga hardware Reference manual [1] and a simple example [2]
| or a NES programming guide [3] would be a good way to see this in
| operation.
|
| A more modern CPU/GPU setup is likely to use a ring buffer. The
| buffer will be in CPU memory. That memory is also mapped into the
| GPU address space. The Driver on the CPU will write commands into
| the buffer which the GPU will execute. These will be different to
| the shader unit instruction set.
|
| Commands would be setting some internal GPU register to a value.
| Allowing the setting resolution, framebuffer base pointer, set up
| the output resolution, setting the mouse pointer position,
| reference a texture from system memory, load a shader, execute a
| shader, set a fence value (Useful for seeing when a resource,
| texture, shader is no longer in use).
|
| Hierarchical DMA buffers are a useful feature of some DMA
| engines. You can think of them as similar to sub routines. The
| command buffer can contain an instruction to switch execution to
| another chunk of memory. This allows the driver to reuse
| operations or expensive to generate sequences. OpenGL's display
| list commonly compiled down to separate buffer.
|
| [1] https://archive.org/details/amiga-hardware-reference-
| manual-...
|
| [2] https://www.reaktor.com/blog/crash-course-to-amiga-
| assembly-...
|
| [3] https://www.nesdev.org/wiki/Programming_guide
| chubot wrote:
| BTW I believe memory maps are set up by the ioctl() system call
| on Unix (including OS X), which is kind of a "catch all" hole
| poked through the kernel. Not sure about Windows.
|
| I didn't understand that for a long time ...
|
| I would like to see a "hello world GPU" example. I think you
| open() the device and the ioctl() it ... But what happens when
| things go wrong?
|
| Similar to this "Hello JIT", where it shows you have to call
| mmap() to change permissions on the memory to execute dynamically
| generated code.
|
| https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-...
|
| I guess one problem is that this may be typically done in vendor
| code and they don't necessarily commit to an interface? They make
| you link their huge SDK
___________________________________________________________________
(page generated 2022-03-30 23:01 UTC)