[HN Gopher] Snapdragon 8 Gen 1's iGPU: Adreno Gets Big
___________________________________________________________________
Snapdragon 8 Gen 1's iGPU: Adreno Gets Big
Author : rbanffy
Score : 81 points
Date : 2024-03-07 12:00 UTC (11 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| xlazom00 wrote:
| Does anyone know if there is any processor(in phone) with
| SVE/SVE2? As SVE/SVE2 SHOULD be in all new ARMv9.0-A + CPU cores.
|
| I did have a chance to test only Qualcomm Snapdragon cores (
| Samsung Galaxy S23).
| fotad wrote:
| Snapdragon 888 probably the first one have SVE, Snapdragon 7/8
| gen all have Qualcomm(r) Type-1 Hypervisor, so what's your
| test, abel to run any linux on it?
| xlazom00 wrote:
| I am using default android on Samsung Galaxy S23 with
| userland
| https://play.google.com/store/apps/details?id=tech.ula So any
| advice how to run it?
| monocasa wrote:
| Looking at the docs, it doesn't look like ARMv9-A strictly
| requires FEAT_SVE to be implemented.
|
| What that means in practice is anyone's guess.
| my123 wrote:
| Qualcomm disables SVE (masked at the hypervisor level) on all
| their silicon. If you want SVE on a phone roday, your options
| are Tensor G3, Exynos 2200/2400 or MediaTek phones with ARMv9
| CPUs.
|
| Or if you have hypervisor execution level code exec (including
| an unfused phone), you can patch up that limitation.
| xlazom00 wrote:
| So I have this
| https://en.wikipedia.org/wiki/Windows_Dev_Kit_2023
|
| And we can run ubuntu on that
| https://github.com/armbian/build
|
| There isn't any hypervisor running on that and still no SVE
| So any advice ?
| rayiner wrote:
| Snapdragon 800 -> Snapdragon 8 Gen N is terrible marketing. It
| makes everything sound like a point release.
| causi wrote:
| My headcanon is that Snapdragon 875 would've been a great chip
| unlike the infamously overheating 888.
| nfriedly wrote:
| I'm on a Snapdragon 870 (moto g100), and it's great.
| Performance and battery life are both good enough that I
| don't think about it very often, and I've never had any
| overheating issues.
|
| I think the 870 was the highest end Snapdragon that didn't
| have any overheating issues for a while.
| joecool1029 wrote:
| 5nm Samsung process was just bad. Needs active cooling.
|
| I've had the sm8250 (865 in my case) and it's a great chip
| that's faster in real-world conditions than the 888 hand
| warmer was. Now I have an 8 gen 2 device and it has no
| overheating issues. The 8 gen 1+ doesn't appear to have any
| either. I avoided the 8 gen 1 after having the bad
| experience with the 888.
| nfriedly wrote:
| Aah, ok, I edited my comment to just say that it was the
| best for a while, not still the best. Agree with you
| about Samsung's 5nm process.
| rbanffy wrote:
| I believe their biggest failure in branding is to not put out
| Raspberry Pi-like small boards for people to experiment. Anyone
| with a slight interest in computers knows an M3 is a beast and
| that an i3 is meh. Almost nobody outside Qualcomm knows why a
| Gen 8 N is better than an 800 and what the difference would be.
| RuggedPineapple wrote:
| They do in fact put those out. That said the pricing is not
| at all Pi like. The Snapdragon 8 Gen 2 board goes for north
| of a thousand dollars.
|
| https://www.lantronix.com/products/snapdragon-8-gen-2-mobile.
| ..
| jsheard wrote:
| That's only Pi-like in the sense of being an SBC, unlike
| the Pi it's meant to be an evaluation/reference platform
| for integrators who want to build their own board around
| the SOC, not something you would buy to use for its own
| sake.
| rbanffy wrote:
| And this is where they fail - the people who gets their
| boards have almost zero interest in upstreaming whatever
| hardware enablement they do to make the boards sort of
| work for their specific use cases.
|
| I really don't think undercutting their evaluation board
| business with affordable SBCs would hurt their bottom
| line.
| jsheard wrote:
| The people who buy those eval boards probably aren't even
| _allowed_ to upstream whatever they do with it, they 'll
| have signed an NDA with Qualcomm to get access to the
| documentation.
| GuB-42 wrote:
| But maybe they could make it more Pi-like, that is
| pricing it so that it can interest both integrators and
| hobbyists, with a bootloader that makes it easy to tinker
| with the OS (not necessarily the firmware).
| rbanffy wrote:
| I am sure that if the RPi Foundation can do it, Qualcomm
| can as well.
| jsheard wrote:
| Broadcom is the counterpart to Qualcomm in that
| comparison, and those two have similar attitudes towards
| the hobbyist/enthusiast market - they don't care in the
| slightest. It took a entity outside of Broadcom which
| nonetheless had deep connections to them (Eben Upton was
| there prior to starting RPi) in order to broker a
| compromise where the Pi could happen, and even then
| Broadcom kept most of the documentation tied up in NDAs
| and the bare SOCs not available for sale to the general
| public.
|
| The Raspberry Pi is an anomaly that's unlikely to be
| replicated.
| monocasa wrote:
| There's economic reasons behind the cheap SoCs you see.
| Invariably it's when an SoC was made in bulk expecting some
| market that never materialized.
|
| For instance the original Pi SoC was very clearly intended
| for some set top box OEM that didn't pick it up for some
| reason.
|
| When that happens, after a while the chip manufacturer is
| willing to sell them for way below cost just to get anything
| back from their inventory that from their perspective looks
| like a complete loss.
|
| So you get an industrious cottage industry that takes those
| to cheap to make sense chips, cost reduces the reference
| design, and ships them with the just barely building sdk from
| the chip manufacturer.
|
| At the end of the day Qualcomm doesn't care about this market
| because they are pretty good about keeping their customers on
| the hook for their bulk orders. So they're focused on
| supporting current bulk customers where a $1k dev board is
| actually a really reasonable price.
| jsight wrote:
| That's a good point. Perhaps even something like the NUC
| form-factor would work well.
| MikusR wrote:
| The soc alone is 160+ dollars.
| Dalewyn wrote:
| Intel Core i3 is to Apple M3, i5 is to M3 Pro, i7 is to M3
| Max, and i9 is to M3 Ultra.
|
| If you think an i3 is "meh", you know nothing about
| computers. For the vast majority of users including gamers,
| an i3 is overkill.
| bartekpacia wrote:
| I remember having Snapdragon 800 in my LG G2. It was a beast at
| the time!
| rbanffy wrote:
| Most people switch phones on a cadence that makes every new
| one feel like a beast (my personal phone is a 2nd gen iPhone
| SE and my new work phone is a 13, which feels like a beast
| next to the other).
|
| It's also incredibly rare to find someone who knows what chip
| is in their phones. I have the vague notion my personal phone
| has an A15 or something like that.
| umanwizard wrote:
| The title should say 8+, not 8 (@dang)
| nfriedly wrote:
| You're correct. But I believe they have an identical design -
| the only difference is that the 8+ runs a little faster and
| more efficiently, because it was manufactured by TSMC, whereas
| the 8 was manufactured by Samsung.
| clamchowder wrote:
| The 8+ runs at slightly faster clocks and has a smaller die.
| Somehow TSMC's 4 nm is better than Samsung's 4 nm.
|
| But yeah everything I wrote there should be applicable to the
| Snapdragon 8 Gen 1 as well, just without the 10% GPU clock
| speed increase that Qualcomm says the 8+ gets.
| cubefox wrote:
| Any explanation why Qualcomm uses "tile based rendering" while
| Nvidia and AMD don't?
| charcircuit wrote:
| Both Nvidia and AMD do use tile based rendering
| bpye wrote:
| And for what it's worth Apple and Imagination do also -
| Imagination being where tiled rendering really started.
| kllrnohj wrote:
| Not quite, not in the same way mobile GPUs do. Also I don't
| think AMD ever did the transition. They announced it (draw
| stream binning), but don't seem to have ever shipped it?
| kllrnohj wrote:
| It's significantly more bandwidth efficient to do tile based
| rendering however it has more performance cliffs and requires
| more game developer care to avoid hitting issues. For mobile
| SoCs (powervr, adreno, mali, and whatever apple calls their
| powervr-derivative) you can't just throw GDDR and get gobs of
| bandwidth, and also bandwidth is power expensive, so the
| savings more than offsets the performance cliffs and developer
| complications
| MikusR wrote:
| Nvidia has been using it for 10 years.
| https://www.realworldtech.com/tile-based-rasterization-nvidi...
| kbolino wrote:
| If you can afford to render the entire screen, why would you
| tile? Tiling complicates certain kinds of shaders, tiles have
| to be stitched together to make the final image, and tiling
| redraws (different parts of) the same triangles multiple times.
|
| On a dedicated GPU with lots of memory bandwidth, there's
| probably no benefit (and maybe even some penalty) to use tiling
| with lower resolutions (e.g. 1080p). However, 4K rendering
| might benefit from it and 8K rendering probably requires it.
| kllrnohj wrote:
| The output resolution doesn't matter. The reason you tile is
| to improve locality and thus have more cache hits. For mobile
| GPUs, the cache is literally a specific tile buffer, but for
| nvidia (who also do this) the cache is just L2. But by tiling
| the geometry, they spend more time in L2 and fewer times
| hitting DRAM. This is a performance _and_ power win. The
| actual resolution is irrelevant as the tiles are very small,
| even on nvidia GPUs. 16x16 and 32x32 are common tile sizes,
| see https://www.techpowerup.com/231129/on-nvidias-tile-based-
| ren... and https://www.youtube.com/watch?v=Nc6R1hwXhL8 where
| this was reverse engineered before nvidia actually talked
| about it.
|
| You might be confusing tiled rendering with various upscaling
| technologies or variable rate shading?
| kllrnohj wrote:
| For the 'CPU to GPU Copy Bandwidth' section the more likely
| reason GPU -> CPU copy is slow is there's no reason to do it.
| Adreno is unified memory, you can just mmap that GPU buffer on
| the CPU. This is done on Android via the "gralloc" HAL, also
| called (A)HardwareBuffer.
|
| CPU->GPU is still valuable in that it's where texture swizzling
| happens to optimize the data for non-linear access, and vendors
| are all cagey about documenting these formats. But I don't think
| there's a copy engine for it at all, i think it's just CPU code.
| If you run a perfetto trace you can see adreno actually using
| multiple threads for this, likely why CPU->GPU is then so much
| faster than the reverse. But you almost never need non-linear
| output, so since vendor-specific swizzling isn't helpful you just
| don't bother and use shared memory between the two.
| flakiness wrote:
| Great article, as always!
|
| The author not only looks into the spec sheet and the
| presentation, but also looks into the OSS mesa code and uses
| OpenGL introspection to reverse-engineer (well, not by himself
| but...) the architecture. For me this is one of the most detailed
| explanations of how mobile GPU architecture looks like.
|
| The comparison to the older NVIDIA GPU is also very helpful (it
| is like a 6 year gap between this and discrete NVIDIA GPU 1050
| GTX). Now I wonder how it compares to other mobile GPUs like
| Apple's or ARM's.
___________________________________________________________________
(page generated 2024-03-07 23:01 UTC)