[HN Gopher] Dissecting the Apple M1 GPU, part I
       ___________________________________________________________________
        
       Dissecting the Apple M1 GPU, part I
        
       Author : caution
       Score  : 309 points
       Date   : 2021-01-07 17:01 UTC (5 hours ago)
        
 (HTM) web link (rosenzweig.io)
 (TXT) w3m dump (rosenzweig.io)
        
       | wil421 wrote:
       | Can someone explain why they would buy a Mac and install linux?
       | 
       | I used to dual boot windows for school and when I first switched
       | to windows I had an old laptop's backup in bootcamp. Cross-
       | platform software is much more ubiquitous than 5-10 years ago.
       | For linux I always used another box or just run a VM. Nowadays my
       | laptop can ssh or Remote Desktop into a more powerful machine. I
       | have a custom built widows box, a custom built NAS running
       | FreeNas (FreeBSD), 2 RPi's running Raspbian, and a not always
       | linux box based on old hardware. There is a machine big or small
       | to do things or play with. My VPN allows me to connect from
       | anywhere.
       | 
       | What are you guys doing that you have to install linux instead of
       | running a VM or remotely connecting to a linux box? If it's just
       | for the sake of knowledge I can understand it.
       | 
       | Apple's touch pad experience in MacOS is the best in the market
       | and it is always very different in Windows and Linux. The XPS,
       | Lenovo and smaller vendors really make killer Linux/Windows
       | laptops that have much more options than Macs.
        
         | gameswithgo wrote:
         | I would very much like the Mac hardware, but I do not like the
         | Mac operating system, or the trend of making it harder and
         | harder to install apps from the store.
        
         | ogre_codes wrote:
         | > Can someone explain why they would buy a Mac and install
         | linux?
         | 
         | Same reason you run Linux on a Dell or Other laptop. Because
         | you prefer Linux. In Apple's case, the M1 is particularly
         | appealing right now.
         | 
         | For me personally, this project is mostly interesting as
         | insurance for if/ when Apple ships a version of MacOS I don't
         | care for or stops supporting the M1. That's likely... 10 years
         | out, but you never know.
        
         | colonwqbang wrote:
         | Personally I think Apple's new hardware looks good (both inside
         | and outside) but their software doesn't really impress me. I
         | would rather run standard Linux software like I usually do. I
         | certainly won't buy any Apple hardware until that is a viable
         | option. I'm also not interested in running my daily driver OS
         | as a VM client.
        
         | londons_explore wrote:
         | Right now the draw is the M1 CPU...
         | 
         | If Linux had software support for it, it would probably be the
         | best platform for Linux (server) software development.
        
         | wmf wrote:
         | The result may be the fastest Linux machine available (by some
         | metrics).
        
           | sliken wrote:
           | Single core performance, maybe. Not much else though. But the
           | apple's do quite well on perf/$, perf/watt. The M1 mini is
           | pretty competitive at $700 for a fast silent desktop.
        
             | wlesieutre wrote:
             | You mentioned performance/watt - in user terms, battery
             | life is a huge advantage of this hardware. The official
             | specs of the 13" MBP are 17 hours of web browsing, 20 hours
             | of video playback.
             | 
             | I assume it won't be as long in Linux but it'll still be
             | longer than a lightweight and high performance x86 laptop.
        
         | [deleted]
        
         | BluSyn wrote:
         | Apple's hardware and build quality in superior then virtually
         | anything else on the market, even at the same price point.
         | You're talking about remote stationary machines, not single
         | portable devices that can do everything you need. Most people
         | don't have that. Linux isn't just a secondary OS you run on VM,
         | many want to run it as a primary desktop OS with the best
         | hardware available. A Dell XPS laptop may be more compatible
         | with Linux out of the box, but it falls short on every other
         | metric.
        
           | wil421 wrote:
           | Apple's hardware quality is the way it is because they
           | control the software and hardware. Looking System76 I can see
           | why people would want a Mac. Purism looks as decent as an
           | XPS.
           | 
           | You also pay a hefty premium for MacOS if you are just
           | running Linux.
        
             | josephg wrote:
             | > You also pay a hefty premium for MacOS if you are just
             | running Linux.
             | 
             | You can't run Linux on the M1 yet, but will this still true
             | when you can? How does price / performance of the M1 air
             | compare to x86 laptops? (Or the Mac mini vs other small
             | form factor PCs?)
        
             | gameswithgo wrote:
             | That is not really the case in any meaningful way. I mean
             | until _just now_ they did not control the cpu or gpu on
             | their laptops and desktops.
        
           | jamespo wrote:
           | Realistically how many people buy macbooks to run linux as a
           | primary OS? And subject themselves to mac keyboards to boot.
        
             | ogre_codes wrote:
             | Save the 2015-2019 MBP keyboard debacle, Apple's keyboards
             | are among the best on the market.
             | 
             | As for how many people run Linux on bare metal on the Mac.
             | That's hard to guess. Likely not a ton. But then... not a
             | ton of anyone runs Linux as their primary desktop/ laptop
             | so that's not surprising.
        
             | selectodude wrote:
             | Linus Torvalds worked on a MacBook Air for years.
        
           | snapcore wrote:
           | Statistical information from repair places suggests otherwise
           | for build quality. For instance:
           | 
           | https://www.appleworld.today/blog/2020/6/1/apples-mac-
           | get-a-...
           | 
           | It varies by year and place though.
        
             | uncledave wrote:
             | Can't take that seriously at all. MS ranks #1 and I
             | actually know someone who works at a large customer and
             | their surface devices last 12-18 months. They had over 500
             | of them. They don't send them for repair because the
             | warranty is a year so they are scrap then. No statistical
             | contribution to that study. They are replacing them with
             | Lenovo units with 3y NBD repair and support and they are
             | cheaper.
             | 
             | As for Apple, you buy a 3 year warranty with AppleCare for
             | less than the surface costs and that includes accidental
             | damage for a small excess fee per event.
        
         | ur-whale wrote:
         | > Can someone explain why they would buy a Mac and install
         | linux?
         | 
         | Because OSX, besides being a user prison, is certainly far from
         | being as good as their hardware.
        
           | baggy_trough wrote:
           | Quite some prison that allows you to install linux and
           | whatever other software you'd like.
        
           | machello13 wrote:
           | Still leagues better than Linux.
        
             | uncledave wrote:
             | I have to agree. I've tried to be a desktop Linux user for
             | 20 years unsuccessfully. It only ever gets to 80% done and
             | then they go and break everything. At least Apple get to
             | 95% :)
        
         | remexre wrote:
         | For "why not macOS on M1," I'm one of those people who
         | customizes their WM to reduce number of keystrokes / mouse
         | usage to a minimum, and that's only really well-served by the
         | free operating systems' WMs.
         | 
         | For "why not Linux on not-M1," I vastly prefer reading and
         | writing AArch64 assembly to amd64... And the battery-
         | life+performance claims of M1 are certainly extremely exciting,
         | coming from a Pinebook Pro (which has the battery life, but
         | pretty wimpy performance by comparison).
        
       | osamagirl69 wrote:
       | Exciting! I can't believe the incredible rate at which progress
       | is being made on this front. And by a sophomore(?!) undergraduate
       | student no less!
        
         | pantalaimon wrote:
         | She also wrote panfrost while still in high school.
        
           | azinman2 wrote:
           | Is there an interview / write up with her somewhere? That's
           | incredible for most people let alone a high schooler.
        
             | l1k wrote:
             | XDC 2018 - Lyude Paul & Alyssa Rosenzweig - Introducing
             | Panfrost
             | 
             | https://www.youtube.com/watch?v=qtt2Y7XZS3k
        
             | leetreveil wrote:
             | https://blogs.gnome.org/engagement/2019/10/03/alyssa-
             | rosenzw...
        
         | jesse_cureton wrote:
         | Alyssa was one of the primary developers behind the Panfrost
         | open source drivers for a subset of Mali GPUs - she's a
         | brilliant engineer and great to work with.
        
         | ognarb wrote:
         | You would be surprised by the amount of high-schoolers doing
         | incredible work in open source software.
        
           | tomnipotent wrote:
           | Like George Hotz, who at 17 removed the SIM lock on the
           | iPhone and a few years later got sued by Sony for breaking
           | security on the PlayStation 3.
        
             | Medox wrote:
             | Obligatory George Hotz (geohot) PS3 rap from back then:
             | https://www.youtube.com/watch?v=9iUvuaChDEg
             | 
             | First time he appeared in the PS3 jailbreak scene I was
             | like "wait, is this a Romanian?! Some boy-genius expat?!"
             | because George is a common first name here and (fun fact)
             | Hotz means "thief". Actually it is written hot but Romanian
             | teens sometimes write tz instead of t. He was meant to be a
             | key thief, I guess.
             | 
             | I still wonder if they got to examine the video in the
             | courtroom...
        
       | londons_explore wrote:
       | Why start with the shader compiler rather than simply whatever
       | commands are necessary to get the screen turned on and a simple
       | memory mapped framebuffer?
       | 
       | It would seem easier to get Linux booting (by just sending the
       | same commands apples software does) before worrying about 3d
       | acceleration and shaders...
        
         | ATsch wrote:
         | Alyssa Rosenzweig specializes in writing GPU compilers/drivers.
         | Other people, like marcan, specialize on the low level hardware
         | and firmware stuff required to get the SoC to boot, including
         | the display controller. The DC and GPU are almost entirely
         | unrelated. She's working ahead here so that GPU acceleration,
         | which is predicted to be the single hardest feature to enable,
         | can be developed more quickly once the easy part of booting
         | linux is done.
        
         | wmf wrote:
         | IIRC the firmware turns the screen on for you so it's already
         | there when Linux boots.
        
           | StillBored wrote:
           | Yes, this is a standard UEFI feature. The firmware basically
           | hands off a structure indicating horizontal/vert resolution,
           | pixel depth and a buffer.
           | 
           | Which is fine until you need to change the mode, or start
           | doing heavy BLT/etc operations. At which point your machine
           | will feel like something from the early 1990s.
           | 
           | So yes, you can get a full linux DE running with mesa and CPU
           | GLES emulation, but its not really particularly usable.
        
         | q3k wrote:
         | I'm not nearly as experienced as the blog post author, but I
         | suspect the M1's GPU does not have any way of blitting to the
         | screen other than driving it 'fully' - ie. using the standard
         | command queue and shader units to even get the simplest 2D
         | output.
        
       | Anka33 wrote:
       | "He a brilliant engineer and great to work with."
       | 
       | Fixed that for you.
        
       | frozenport wrote:
       | >>There are no convoluted optimization tricks, but doing away
       | with the trickery is creating a streamlined, efficient design
       | that does one thing and does it well. Maybe Apple's hardware
       | engineers discovered it's hard to beat simplicity.
       | 
       | What? They shifted the complication from software to hardware.
        
       | Veedrac wrote:
       | > Yet Metal optimization resources imply 16-bit arithmetic should
       | be significantly faster, in addition to a reduction of register
       | usage leading to higher thread count (occupancy).
       | 
       | I believe this is a difference between the A14 GPU and the M1
       | GPU; the former's 32 bit throughput is half its 16 bit
       | throughput, whereas on the latter they are equal.
        
       | MayeulC wrote:
       | > This suggests the hardware is superscalar, with more 16-bit
       | ALUs than 32-bit ALUs
       | 
       | To me, it sounds like it might mean 32-bit ALUs can be used as
       | two 16-bit ones; that's how I would approach it, unless I'm
       | missing something? The vectorization can also happen at the
       | superscalar level, if borrowing the instruction queue concept
       | from out-of-order designs: buffer operations for a while until
       | you've filled a vector unit's worth, align input data in the
       | pipeline, execute. A smart compiler could rearrange opcodes to
       | avoid dependency issues, and insert "flushes" or filling
       | operations at the right time.
        
         | bullen wrote:
         | I recently added half floats to my 3D MMO engine, and I was
         | very dissappointed when I discovered that very few GPUs support
         | it in hardware!
         | 
         | Desktop Nvidia simply converts them to 32-bit floats up front
         | so the performance is kept but memory is wasted, but raspberry
         | does the 16 -> 32 bit calculations every time resulting in
         | horrible performance.
         | 
         | I still have to test the engine on Jetson Nano with half
         | floats, but I'm pretty sure I will be dissapointed again, and
         | since raspberry doesn't support it I need to backtrack the code
         | anyway!
         | 
         | After some further research I heard snapdragon has 16-bit
         | support in the hardware and hopefully this is where we are
         | heading! 32-bit is completely overkill for model data, it
         | wastes memory and cycles! 16-bit for model data: vertex,
         | normal, texture and index and 8-bit for bone-index and weights!
         | Back to the 80-90s!
         | 
         | This is the last memory & performance increase we can grab now
         | at "5"nm without adding too much complexity!
         | 
         | You can try the engine without half float here:
         | http://talk.binarytask.com/task?id=5959519327505901449
        
           | kllrnohj wrote:
           | > Desktop Nvidia simply converts them to 32-bit floats up
           | front so the performance is kept but memory is wasted
           | 
           | Pascal has native FP16 operations and can execute 2 FP16's at
           | once ( https://docs.nvidia.com/cuda/pascal-tuning-
           | guide/index.html#... )
           | 
           | BUT, and this is where things get fucked up, Nvidia then
           | neutered that in the GeForce lineup because market
           | segmentation. In fact, it's _slower_ than FP32 operations:
           | "GTX 1080's FP16 instruction rate is 1/128th its FP32
           | instruction rate" https://www.anandtech.com/show/10325/the-
           | nvidia-geforce-gtx-...
        
             | AzN1337c0d3r wrote:
             | TU102 and GA102 are back up to double-rate FP16.
        
               | my123 wrote:
               | GA102 is 1:1 FP16/FP32 rate, but with a ton of FP32 (30
               | Tflops for the RTX 3080).
        
           | mechanical_berk wrote:
           | The Raspberry Pi 4 3D hardware has pretty extensive support
           | for 16-bit floats. eg All 32-bit FP ops support "free"
           | conversion from/to 16-bit float on input/output, and there
           | are a few 16-bit float ops (eg multiply) which work on 16-bit
           | vec2s (effectively giving double the throughput if they can
           | be used). I don't know how well the Mesa driver supports any
           | of that stuff though. Do you know if there was something
           | specific that was slow?
        
       | MangoCoffee wrote:
       | hopefully, Apple M1(ARM) and AMD Zen(x86) can push Intel, Intel
       | stuck at 14nm is what happened when there is no viable
       | competitor.
        
         | pengaru wrote:
         | Competition is great but it's not like Intel has been resting
         | on their laurels; they've invested heavily in iterating on
         | process nodes beyond 14nm.
         | 
         | Intel has simply failed.
        
       | ksec wrote:
       | >Some speculate it might descend from PowerVR GPUs, as used in
       | older iPhones, while others believe the GPU to be completely
       | custom. But rumours and speculations are no fun when we can peek
       | under the hood ourselves!
       | 
       | As per IMG CEO, Apple has never not been an IMG customer. (
       | Referring to period between 2015 and 2019.) Unfortunately that
       | quote, along with that article has simply vanished. It was said
       | during an Interview on a British newspaper / web site if I
       | remember correctly.
       | 
       |  _" On 2 January 2020, Imagination Technologies announced a new
       | multi-year license agreement with Apple including access to a
       | wider range of Imagination's IP in exchange for license fees.
       | This deal replaced the prior deal signed on 6 February 2014."_
       | [1]
       | 
       | The Apple official Metal Feature set document [2], All Apple A
       | Series SoC including the latest A14 supports PVRTC, which stands
       | for PowerVR Texture Compression[3].
       | 
       | It could be a Custom GPU, but it still has plenty of PowerVR tech
       | in it. Just like Apple Custom CPU, it is still an ARM.
       | 
       | Note: I am still bitter at what Apple has done to IMG / PowerVR.
       | 
       | [1] https://www.imaginationtech.com/news/press-
       | release/imaginati...
       | 
       | [2] https://developer.apple.com/metal/Metal-Feature-Set-
       | Tables.p...
       | 
       | [3] https://en.wikipedia.org/wiki/PVRTC
        
         | remexre wrote:
         | > Note: I am still bitter at what Apple has done to IMG /
         | PowerVR.
         | 
         | I'm unfamiliar with this; are you bitter about a lack of
         | attribution that they produced lots of the IP their GPUs are
         | built on?
        
           | ksec wrote:
           | Apple stated they will go Custom GPU, depending on how you
           | define Custom, it has plenty of PowerVR tech in it. ( IMG
           | even deleted that Press Release, which you could still find a
           | copy here [1] )
           | 
           |  _" Apple expects that they will no longer be using
           | Imagination's IP for new products in 15 to 24 months."_
           | 
           | But at no point in time did Apple announced they stop
           | supporting PVRTC. Nor did they make any depreciation notice.
           | 
           | That announcement caused IMG's stock price fall by nearly
           | 80%. IMG was later sold to a Chinese VC. ( Which somehow had
           | some tiny connection linking to Apple, but that is in the
           | conspiracy field so I wont go into it )
           | 
           | And if you look back now, Apple were not telling truth or
           | lying by omission.
           | 
           | Which is what got me to scrutinise every details when they
           | had a battle with Qualcomm. And it was a very different view
           | to what mainstream media tells you.
           | 
           | [1] https://www.displaydaily.com/article/press-
           | releases/discussi...
        
           | masklinn wrote:
           | They're probably talking about Apple suddenly announcing
           | they'd be dropping Imagination out of the blue in 2017.
        
             | raverbashing wrote:
             | But in the end it seems they didn't drop? Or are they just
             | licensing a basic IP and building on top?
             | 
             | I remember the IMG stuff being full of bugs
        
               | masklinn wrote:
               | It's really not clear. A licensing agreement was signed
               | between the two early 2020 but the interim hurt IMG a
               | lot, they lost a number of engineers, had to sell off
               | their MIPS business, and the board ultimately put the
               | company up for sale.
        
       ___________________________________________________________________
       (page generated 2021-01-07 23:00 UTC)