[HN Gopher] Clip control on the Apple GPU
___________________________________________________________________
Clip control on the Apple GPU
Author : stefan_
Score : 207 points
Date : 2022-08-22 14:04 UTC (8 hours ago)
(HTM) web link (rosenzweig.io)
(TXT) w3m dump (rosenzweig.io)
| bla3 wrote:
| > Here's a little secret: there are two graphics APIs called
| "Metal". There's the Metal you know, a limited API that Apple
| documents for App Store developers, an API that lacks useful
| features supported by OpenGL and Vulkan.
|
| > And there's the Metal that Apple uses themselves, an internal
| API adding back features that Apple doesn't want you using.
|
| Apple does stuff like this so much and gets so little flak for
| it.
|
| I use macOS since it seems like the least bad option of you want
| a Unix but also don't want to spend a lot of time on system
| management, but this is a real turn-off.
| bri3d wrote:
| > Apple does stuff like this so much and gets so little flak
| for it.
|
| Why should they get flak for having internal APIs? The fact
| that the internal API is a superset of the external API is
| smart engineering.
|
| Think about it this way: Apple could just as well have made the
| "Metal that Apple uses themselves" some arcane "foocode" IR
| language or something, as I'm sure many shader compilers and
| OpenGL runtime implementations do, and nobody would be nearly
| as mad about it.
|
| The fact that they use internal APIs for external apps in their
| weird iOS walled garden is obnoxious, but having private,
| undocumented APIs in a closed-source driver is not exactly an
| Apple anomaly.
| LeifCarrotson wrote:
| > Why should they get flak for having internal APIs? The fact
| that the internal API is a superset of the external API is
| smart engineering.
|
| It's not about having good segmentation of user-facing and
| kernel-side libraries, no one faults them for that.
|
| It's about Apple building user-facing apps that use the whole
| API, and then demanding that other developers not use the
| features required to implement those apps because we're not
| trusted to maintain the look-and-feel, responsiveness, or
| battery life expectations of apps on the platform.
| dcx wrote:
| But isn't it kind of fair to say that when you look at the
| case studies presented by (a) the Android app store in the
| past decade and (b) Windows malware in the decade before
| that, this trust has in fact not been earned?
|
| I hate a walled garden as much as the next developer, and
| the median HN reader is probably more than trustworthy. But
| past performance does predict future performance.
| fezfight wrote:
| If you buy a hackintosh, you have to sometimes mess around to
| get stuff to work. Same goes for Linux on random hardware. If
| you check first and buy a machine that supports the OS you're
| using, you don't have to do anything special. It'll work as you
| expect.
|
| It's freeing not to be beholden to the likes of someone like
| Tim Cook who, it would seem, spends the majority of his waking
| hours figuring out how to hide anticonsumer decisions under
| rugs.
| gjsman-1000 wrote:
| > Apple does stuff like this so much and gets so little flak
| for it.
|
| To be fair, Windows has a _ludicrous_ amount of undocumented
| APIs for internal affairs as well, and you can get deep into
| the weeds very quickly, just ask the WINE Developers who have
| to reverse-engineer the havoc. There is no OS without Private
| APIs, but Windows is arguably the worst with more Private or
| Undocumented APIs than Apple.
|
| This actually bears parallels to Metal. Until DirectX 12,
| Windows had no official way to get low-level. Vulkan and OpenGL
| are only 3rd-party supported, not Microsoft-supported,
| Microsoft officially only supports DirectX. If you want
| Vulkan/OpenGL, that's on your GPU vendor. If you wanted low-
| level until 12, you _may_ have found yourself pulling some
| undocumented shenanigans. Apple hasn 't gotten to their DirectX
| 12 yet, but they'll get there eventually.
|
| As for why they are Private, there could be many reasons, not
| least of which that (in this case) Apple has a very complicated
| Display Controller design and is frequently changing those
| internal methods, which would break compatibility if third-
| party applications used them. Just ask Asahi about how the DCP
| changed considerably from 11.x to 13.x.
| chongli wrote:
| _Apple has a very complicated Display Controller design_
|
| Can anyone in the know give more information here? Why would
| Apple want to do this? What could they be doing that's so
| complicated in the display controller?
| gjsman-1000 wrote:
| https://twitter.com/marcan42/status/1549672494210113536
|
| and
|
| https://twitter.com/marcan42/status/1415360411260493826?lan
| g...
|
| and
|
| https://twitter.com/marcan42/status/1526104383519350785
|
| As to why? Well, if it ain't broke don't fix it from
| iPhone, but it is still a bit of a mystery.
|
| In a nutshell from those threads:
|
| 1. Apple's DCP silicon layout is actually massive,
| explaining the 1 external display limit
|
| 2. Apple implements half the DCP firmware on the main CPU
| and the other half on the coprocessor with RPC calls, which
| is hilariously complicated.
|
| 3. Apple's DCP firmware is versioned, with a different
| version for every macOS release. This is also why Asahi
| Linux currently uses a "macOS 12.3" shim, so they can focus
| on the macOS 12.3 DCP firmware in the driver, which will
| probably not work with the macOS 12.4+ DCP firmware or the
| macOS 12.2- firmware.
|
| I can totally see why Apple doesn't want people using their
| low-level Metal implementation that deals with the mess
| yet.
| chongli wrote:
| Yeah it makes perfect sense that they don't want to
| expose any of that complexity to 3rd parties and risk
| constant breakage with new models. I'm just really
| curious about what sort of complex logic they have going
| on in that silicon.
| phire wrote:
| The complexity with the firmware split across the main
| CPU and a coprocessor seems to be a historical artefact.
|
| Seems the DCP driver was originally all on the main CPU,
| and when apple got these cheap coprocessor cores, they
| took a lazy approach of just inserting a simple RPC layer
| in the middle. The complexity for Asahi comes from the
| fact that it's a c++ API that can change very dynamically
| from version to version.
|
| And yes, these ARM coprocessor cores are cheap, apple
| have put at least 16 of them [1] on the M1, on top the 4
| performance and 4 efficiency cores. They are an apple
| custom design that implement only the 64bit parts of the
| ARMv8 spec. I'm not entirely sure why the actual DCP is
| so big, but it's not because of the complex firmware.
| Potentially because the DCP includes enough dedicated RAM
| to store an entire framebuffer on-chip.
|
| If so, they will be doing this because it allows for
| lower power consumption. The main DRAM could be put in a
| power-saving mode and kept there for seconds or even
| minutes at a time without having to wake it up multiple
| times per frame, even when just showing a static image.
|
| [1]
| https://twitter.com/marcan42/status/1557242428876537856
| throwaway08642 wrote:
| @marcan42 said that on the M1 MacBook Pro models, the DCP
| also implements hardware-level antialiasing for the notch
| and rounded display corners.
| Pulcinella wrote:
| _Apple does stuff like this so much and gets so little flak for
| it._
|
| It would be one thing if the private APIs were limited to
| system frameworks and features while Apple's own apps weren't
| allowed to use them, but they do. E.g. The Swift Playgrounds
| app for iPad is allowed to share and compile code, run separate
| processes, etc. which isn't normally allowed in the AppStore.
| They also use blur and other graphical effects (outside of the
| background blur material and the SwiftUI blur modifier) that
| are unavailable outside of private APIs.
|
| It stinks because of the perceived hypocrisy and the inability
| to compete on a level playing field or leave the AppStore (and
| I say this as someone who normally doesn't mind the walled
| garden!)
| adrian_b wrote:
| Unfortunately such a behavior is not at all new.
|
| The best known example of these methods is how Microsoft has
| exploited the replacement of MS-DOS with Windows 3.0 and
| especially with Windows 95.
|
| During the MS-DOS years, the only Microsoft software products
| that were successful were their software development tools,
| i.e. compilers and interpreters, and even those had strong
| competition, mainly from Borland. Those MS products addressed
| only a small market and they could not provide large
| revenues. The most successful software products for MS-DOS
| were from many other companies.
|
| That changed abruptly with the transition to various Windows
| versions, when the Microsoft developers started to have a
| huge advantage over those from any other company, both by
| being able to use undocumented internal APIs provided by the
| MS operating systems and also by knowing in advance the
| future documented APIs, before they were revealed to
| competitors.
|
| Thus in a few years MS Office has transitioned from an
| irrelevant product, much inferior to the competition, to the
| dominant suite of office programs, which has eliminated all
| competitors and which has become the main source of revenue
| for MS.
| [deleted]
| [deleted]
| Jasper_ wrote:
| As a graphics engineer, good riddens to the old clip space,
| 0...1 really is the correct option. We also don't know what
| else "OpenGL mode" enables, and the details of what it does
| probably changes between GPU revisions -- the emulation stack
| probably has the details, and changes its own behavior of
| what's in hardware and what's emulated in the OpenGL stack
| depending on the GPU revision.
|
| Also, to Alyssa, if she's reading this: you're just going to
| have to implement support shader variants. Build your
| infrastructure for supporting them now. It's going to be far
| more helpful than just for clip control.
|
| But yes, the Vulkan extension was just poorly specified,
| allowing you to change clip spaces between draws in the same
| render pass is, again, ludicrous, and the extension should just
| be renamed VK_EXT_i_hate_tilers (like so many others of their
| kind). Every app is going to set it at app init and forget it;
| the implementation using the render pass bit and flushing on
| change will cover the 100% case, and won't be slow at all.
| garaetjjte wrote:
| >good riddens to the old clip space, 0...1 really is the
| correct option
|
| More like 1...0, which nicely improves depth precision.
| Annoyingly due to symmetric -1...1 range reverse-Z cannot be
| used on OpenGL out of the box, but it can be fixed with
| ARB_clip_control. https://developer.nvidia.com/content/depth-
| precision-visuali...
| bpye wrote:
| > you're just going to have to implement support shader
| variants
|
| I admittedly have zero experience with Mesa, but it seems
| like shader variants is something that should be common
| infrastructure? Though of course the reason that a variant is
| needed would be architecture specific.
| chaxor wrote:
| This is why the Asahi Linux project is so exciting!! You get
| the great performance at low-power (M* ARM processors) while
| still getting the more performant and useful Linux experience.
|
| I am really thankful to the Asahi Linux team, and specifically
| in this instance for the GPU, [Alyssa
| Rosenweig](https://github.com/alyssarosenzweig), [Asahi
| Lina](https://github.com/asahilina), and [Doug all
| Johnson](https://github.com/dougallj).
| rowanG077 wrote:
| Amazing that this works because of the herculean effort of just a
| handful of people.
| hrydgard wrote:
| There is no good reason to flip the flag dynamically at runtime
| and apps just don't do that, so flushing the pipeline should be
| perfectly fine, even in an implementation of the clip control
| extension.
| gjsman-1000 wrote:
| Optimistic that OpenGL 2.1 will be available by the end of the
| year on Asahi - well that is news. It's only 2.1, but that's
| enough (as stated) for a web browser, desktop acceleration, and
| old games.
|
| Also RIP all the countless pessimistic "engineers" here and
| elsewhere saying we'd be waiting for years more for _any_
| graphics acceleration.
|
| Edit: It is true though that AAA Gaming will wait: "Please temper
| your expectations: even with hardware documentation, an optimized
| Vulkan driver stack (with enough features to layer OpenGL 4.6
| with Zink) requires over many years of full time work. At least
| for now, nobody is working on this driver full time. Reverse-
| engineering slows the process considerably. We won't be playing
| AAA games any time soon."
|
| Still, even if that be the case, accelerated desktop is an
| accelerated desktop, much sooner than many expected.
| smoldesu wrote:
| It's pretty insane that OpenGL 2.1 is even functional on a GPU
| this strange, but remember; this is still an unfinished, hacky
| implementation (the author's own concession). Plus, you're
| going to be stuck on x11 until any serious GPU drivers get
| written, which in many people's opinion is just as bad as no
| hardware acceleration at all. No MacOS-like trackpad gestures
| either, you'll be waiting for Wayland support to get that too.
| It'll definitely be a boon for web browsing though, so I won't
| deny that. What I'm _really_ curious about is older WINE titles
| with Box86, if you could get DOS titles like Diablo 2 running
| smoothly, it could probably replace my Switch as a portable
| emulation machine...
| gjsman-1000 wrote:
| > pretty insane that OpenGL 2.1 is even functional on a GPU
| this strange,
|
| Well... you were one of the most vocal critics saying it
| wouldn't happen anytime soon.
|
| > unfinished, hacky implementation (the author's own
| concession)
|
| Still more stable than Intel's official Arc drivers, so who
| defines "hacky"? ;)
|
| > Plus, you're going to be stuck on x11 until any serious GPU
| drivers get written
|
| Only because it is running on macOS, which supports X11 but
| not Wayland. On Linux, Wayland or X11 will both work, no
| problem.
|
| > No MacOS-like trackpad gestures either, you'll be waiting
| for Wayland support to get that too
|
| Again, Wayland will work on Day 1, it's just a limitation of
| running the driver on macOS until the kernel support is
| ready. When it is on Linux, Wayland will be a full-go.
| smoldesu wrote:
| > Well... you were one of the most vocal critics saying it
| wouldn't happen anytime soon.
|
| Yep. Been beating that drum since 2020, looks history
| proved me right on this one.
|
| > Still more stable than Intel's official Arc drivers, so
| who defines "hacky"? ;)
|
| Apparently not me, I had no idea that the M1 supported
| Vulkan and DirectX 12.
| viraptor wrote:
| > if you could get DOS titles like Diablo 2
|
| Did you mean some other title? Even diablo 1 was a Windows
| game.
| [deleted]
| kirbyfan64sos wrote:
| I'm a bit confused as to why X11/wayland would be a huge
| issue here? The Mesa docs do say X11-only, but they're
| referring to running the driver on macOS (hence the XQuartz
| reference), where Wayland basically doesn't exist.
| smoldesu wrote:
| Ah, looks like I definitely missed that.
|
| In any case, I don't think Asahi/M1 has proper KWin or
| Mutter support yet. It's still going to take a while before
| you get a truly smooth desktop Linux experience on those
| devices, but some hardware acceleration is definitely
| better than none!
| rowanG077 wrote:
| I mean the signs were clear for basically one and a half
| years now. It was never a question of if. But a question of
| when. There were just so many voices that didn't know what
| they were talking about. Comparing it to nouveau for example.
| Miraste wrote:
| Why would you need Wayland for trackpad gestures?
| 3836293648 wrote:
| You technicay don't, but the implementations on X are kinda
| terrible and can't do 1:1
| pornel wrote:
| Apple could help by documenting this stuff. I remember the good
| old days when every Mac OS X came with an extra CD with Xcode,
| and Apple was regularly publishing Technical Notes detailing
| implementation details. Today the same level of detail is treated
| as top secret, and it seems that Apple doesn't want developers to
| even think beyond the surface of the tiny App Store sandbox.
| dagmx wrote:
| Even back in the day, those technical notes would not cover
| private APIs like this, because they're subject to change or
| are for internal use only.
|
| These are the same in any closed source OS
| madeofpalk wrote:
| Apple Platform Security: May 2022. 242 pages.
|
| https://help.apple.com/pdf/security/en_GB/apple-platform-sec...
| alberth wrote:
| I really wish Apple would do another "Snow Leopard" - go an
| entire year WITHOUT any new features and just fix bugs and
| documentation.
|
| This twitter thread is a perfect example of why it's needed
|
| https://twitter.com/nikitonsky/status/1557357661171204098
| argsnd wrote:
| I mean that thread is looking at pre-release software
| ianlevesque wrote:
| There is software that approaches barely functional after
| dozens of rounds of QA testing, and then there is software
| that is implemented on a solid foundation with care and
| happens to have a few bugs. Unfortunately that many bugs in
| a beta implies the former. I think the thread comes from a
| disappointment that Apple is moving from the second
| category to the first.
| buildbot wrote:
| But it is not even a "Consumer Beta" it is a developer
| beta- for catching bugs and allowing devs to create
| applications for new APIs while Apple polishes the build
| for release? Was snow leopard ever released as a dev beta
| even?
| DerekL wrote:
| Snow Leopard had few user-facing features, but it did have
| new APIs, such as Grand Central Dispatch and OpenCL, and also
| an optional 64-bit kernel.
|
| https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard
| sudosysgen wrote:
| OpenCL is not an OS level API. I guarantee you they were
| basically just redistributing Intel and NVidia
| implementations. GCD isn't OS level either, it's just a
| library, but it is at least a new API.
| madeofpalk wrote:
| Ahh yes, that "just fix bugs" release that would delete your
| main user account if you used a guest user
| https://www.engadget.com/2009-10-12-snow-leopard-guest-
| accou...
|
| Besides, rebuilding/redesigning the settings screen is a
| perfect "snow leopard" thing. That's not an actual Feature.
|
| The problem isn't doing features, the probably is doing a bad
| job.
| naillo wrote:
| This person has an awesome set of blog posts. One of the few rss
| feeds I keep track of.
| [deleted]
| bob1029 wrote:
| Clip space is the bane of my existence. I've been building a
| software rasterizer from scratch and implementing vertex/triangle
| clipping has turned into one of the hardest aspects. It took me
| about 50 hours of reading various references before I learned you
| cannot get away with doing this in screen space or any time after
| perspective divide.
|
| It still staggers me that there is not 1 coherent reference for
| how to do all of this. Virtually every reference about clipping
| winds up with something like "and then the GPU waves its magic
| wand and everything is properly clipped & interpolated :D". Every
| paper I read has some "your real answer is in another paper" meme
| going on. I've got printouts of Blinn & Newell, Sutherland &
| Hodgman, et. al. littered all over my house right now. About 4
| decades worth of materials.
|
| Anyone who works on the internals of OGL or the GPU stack itself
| has the utmost respect from me. I cannot imagine working in that
| space full-time. About 3 hours of this per weekend is about all
| my brain can handle.
| joakleaf wrote:
| Not sure if you got through clipping, but it was one of those
| things I had to go through first at some point in the mid 90s
| myself. I feel your pain, but after having implemented it about
| 5-10 times in various situations, variants and languages, I can
| promise it gets a lot easier.
|
| In my experience it is most elegant to clip against the 6
| planes of the view-frustrum in succession (one plane at a
| time). Preferably clipping against the near-plane first, as
| that reduces the set of triangles the most for subsequent
| clips.
|
| Your triangles can turn into convex polygons after a clip. So
| it is convenient to start with a generic convex polygon vs.
| plane-clipping algorithm; The thing to be careful about here is
| that points can (and will) lie on the plane.
|
| Use the plane equation (f(x,y,z)=ax+by+cz+d) to determine if a
| point is on one side, on the plane, or the other side.
|
| It is convenient to use a "mask" to designate the side a point
| v=(x,y,z) is on. So: 1 := inside_plane (f(x,y,z)>eps) 2 :=
| outside_plane (f(x,y,z)<-eps) 3 := on_plane (f(x,y,z)>=eps &&
| f(x,y,z)<=eps) Let m(v_i) be the mask of v_i.
|
| When you go through each edge of the convex polygon
| (v_i->v_{i+1}), you can check if you should clip the edge using
| the mask. I.e.:
|
| if (m(v_i)&m(v_{i+1})==0 the points are on opposite side =>
| clip [determine intersection point].
|
| Since you are just clipping to the frustrum, just return a list
| of the points that are inside or on the plane (i.e.
| m(v_i)&1==1) and the added intersection points.
|
| There are lots of potential for optimization, of course, but I
| wouldn't worry about that. There are lots of other places to
| optimize a software rasterizer with more potential, in my
| experience.
| fabiensanglard wrote:
| I wrote about this a few years ago
| (https://fabiensanglard.net/polygon_codec/index.php).
|
| It was a pain to lean indeed and the best resources were quite
| old:
|
| - "CLIPPING USING HOMOGENEOUS COORDINATES" by James F. Blinn
| and Martin E. Newell
|
| - A Trip Down the Graphics Pipeline by Jim Blinn (yes the same
| Blinn that co-authored the paper above).
| Jasper_ wrote:
| Vertex/triangle clipping is quite rare, and mostly used for
| clipping against the near plane (hopefully rare in practice).
| Most other implementations use a guard band as a fast path (aka
| doing it in screen space) -- real clipping is only used where
| your guard band doesn't cover you, precision issues mostly.
|
| I'm not sure what issues you're hitting, but I've never found
| clipping to be that challenging or difficult. Also, clip
| control and clip space aren't really specifically about
| clipping -- clip space is just the output space of your vertex
| shader, and the standard "clip control" extension just controls
| whether the near plane is at 0 or -1. And 0 is the correct
| option.
| Sharlin wrote:
| Guard band clipping is only really applicable to "edge
| function" type rasterizers. For the classic scanline-based
| algorithm, sure, you can easily clip to the right and bottom
| edges of the viewport while rasterizing, but the top and left
| edges are trickier. Clipping in clip space, before
| rasterization, is more straightforward, given that you have
| to frustum cull primitives anyway.
| bob1029 wrote:
| > clipping against the near plane (hopefully rare in
| practice)
|
| I am not sure I understand why this would be rare. If I am
| intending to construct a rasterizer for a first-person
| shooter, clipping is essentially mandatory for all but the
| most trivial of camera arrangements.
| Jasper_ wrote:
| Yes, of course, I was definitely imagining you were
| struggling to get simpler scenes to work. But also,
| proportionally few of your triangles in any given scene
| should be near-plane clipped. It's OK to have a slow path
| for it, and then speed it up later. I've never felt the
| math for the adjusted barycentrics is too hard, but it can
| take a bit to wrap your head around. Good luck :)
| Sharlin wrote:
| You and the GP have different rasterization algorithms in
| mind I think. The GP, I presume, is talking about a
| classic scanline-based rasterizer rather than an edge
| function "am I inside or not" type rasterizer that GPUs
| use.
| bpye wrote:
| Clipping or culling? I expect it's mostly the latter unless
| your camera ends up intersecting the geometry.
| royjacobs wrote:
| As an example, if you're writing a shooter then the floor
| might be a large square that will almost definitely be
| intersecting the near plane. You absolutely need clipping
| here.
| bob1029 wrote:
| This is precisely the first place I realized I needed
| proper clipping. Wasted many hours trying to hack my way
| out of doing it the right way.
| bob1029 wrote:
| Both. You almost always need both.
|
| Clipping deals with geometry that is partially inside the
| camera. Culling (either for backfaces or entire
| instances) is a preliminary performance optimization that
| can be performed in a variety of ways.
| bpye wrote:
| Even with a guard band don't you need to at least test the
| polygons for Z clipping prior to the perspective divide?
|
| Clipping in X and Y is simpler at least, and again the guard
| band hopefully mostly covers you.
| bob1029 wrote:
| > Even with a guard band don't you need to at least test
| the polygons for Z clipping prior to the perspective
| divide?
|
| Yes. Guard band is an optimization that reduces the amount
| of potential clipping required. You still need to be able
| to clip for fundamental correctness.
|
| If you totally reject a vertex for a triangle without
| determining precisely where it intersects the desired
| planes, you are effectively rejecting the entire triangle
| and creating yucky visual artifacts.
| nauful wrote:
| You have to clip against planes in 4D space (xyzw) before
| perspective divide (xyz /= w), not 3D (xyz).
|
| This simplified sample shows Sutherland-Hodgman with 4D
| clipping:
| https://web.archive.org/web/20040713023730/http://wwwx.cs.un...
| The main difference is the intersect method finds the
| intersection of a 4D line segment against a 4D plane.
| Sharlin wrote:
| I also implemented clipping in my software rasterizer a while
| ago and can definitely sympathize! (Although I've written
| several simple scanline rasterizers in my life, this was the
| first time I actually bothered to implement proper clipping. I
| actually reinvented Sutherland-Hodgman from scratch which was
| pretty fun.) The problematic part is actually only the near
| plane due to how projective geometry works. At z=0 there's a
| discontinuity in real coordinates after z division, which means
| there can be no edges that cross from negative to positive z. Z
| division turns such an edge [a0, a1] into an "inverse" edge
| (-[?], a0'] [?] [a1', [?]) which naturally makes rendering a
| bit tricky. In projective/homogenous coordinates, however, it
| is fine, because the space "wraps around" from positive to
| negative infinity. All the other planes you can clip against in
| screen space / NDC space if you wish, but I'm not sure there
| are good reasons to split the job like that.
| samstave wrote:
| Be the documentation you want to see in the world.
| hashishen wrote:
| "RTFM" - The manual
| samstave wrote:
| "Look up error code on stack exchange to find the error in
| question seeking a solution, but its you from 5 years ago."
| sph wrote:
| "What was I working on? What did I see?!"
|
| https://xkcd.com/979/
| moondev wrote:
| Can you run a PCIE enclosure over thunderbolt on asahi Linux yet?
| Could this enable GPUs that already work on aarch64 Linux?
| hishnash wrote:
| I would assume all linux GPU drivers would need to be adapted
| at least a little to support the larger page size (most linux
| AARCH64 kernel level code is writing assuming 4kb pages).
| kmeisthax wrote:
| Yes, but NOT for GPUs. Apple Silicon does not support non-
| Device mappings over Thunderbolt, so eGPUs will never work.
| andrewmcwatters wrote:
| OpenGL on macOS is so frustrating, that I and many other
| developers have basically abandoned it, and not in favor of using
| Metal--the easier alternative is to just no longer support Macs.
|
| Yes, OpenGL on macOS is now implemented over Metal, but
| unfortunately a side effect of this is that implementation-level
| details that were critical to debugging and profiling OpenGL just
| no longer exist for tools to work with. Anything is possible?
| Maybe? I'm sure Apple Graphics engineers could make old tooling
| work with the new abstraction layer, but it's not happening.
|
| Tooling investment is all on Metal now. But so much existing NON-
| LEGACY software relied on OpenGL.
|
| So what do you do? You debug and perf test on Windows and Linux
| and hope that fixing issues there addresses concerns on macOS,
| and hopefully your problems aren't platform-specific.
|
| This is how some graphics engineers, including myself, continue
| to ship for macOS while never touching it.
|
| Edit: Also, Vulkan is a waste of time for anyone who isn't a
| large studio. No one wants to write this stuff. The most common
| argument is "You only write it once." No, you don't.
|
| You have to support this stuff. If it were that easy, bgfx would
| have been written in a month and it would have been considered
| "done" afterwards.
| [deleted]
| fbanon wrote:
| Couldn't you just pre-multiply the projection matrix to remap the
| Z range from [-1,1] to [0,1]?
| NobodyNada wrote:
| What projection matrix?
|
| Remember that this translation needs to happen at the graphics
| driver level. For fixed-function OpenGL where the application
| actually passes the graphics driver a projection matrix this
| would be doable. But if your application is using a version of
| OpenGL newer than 2004, the projection matrix is a part of your
| vertex shader. The graphics driver can't tell what part of your
| shader deals with projection, and definitely can't tell what
| uniforms it would need to tweak to modify the projection matrix
| -- many shaders might not even _have_ a projection matrix.
| fbanon wrote:
| I know. But the second sentence of the article starts with:
|
| "Neverball uses legacy "fixed function" OpenGL."
|
| But also you could simply remap the Z coordinate of
| gl_Position at the end of the vertex stage, do the clipping
| in [0,1] range, then map it back to [-1,1] for gl_FragCoord
| at the start of the fragment stage.
| NobodyNada wrote:
| > "Neverball uses legacy "fixed function" OpenGL."
|
| Sure, it'd work for Neverball, but the article is clear
| that they're looking for a general solution: something
| that'd work not just for Neverball, but for all OpenGL
| applications, and would ideally let them give applications
| control over the clip-control bit through OpenGL/Vulkan
| extensions.
|
| > But also you could simply remap the Z coordinate of
| gl_Position at the end of the vertex stage, do the clipping
| in [0,1] range, then map it back to [-1,1] for gl_FragCoord
| at the start of the fragment stage.
|
| Yes, that was the current state-of-the-art before this
| article was written:
|
| > As Metal uses the 0/1 clip space, implementing OpenGL on
| Metal requires emulating the -1/1 clip space by inserting
| extra instructions into the vertex shader to transform the
| Z coordinate. Although this emulation adds overhead, it
| works for ANGLE's open source implementation of OpenGL ES
| on Metal.
|
| > Like ANGLE, Apple's OpenGL driver internally translates
| to Metal. Because Metal uses the 0 to 1 clip space, it
| should require this emulation code. Curiously, when we
| disassemble shaders compiled with their OpenGL
| implementation, we don't see any such emulation. That means
| Apple's GPU must support -1/1 clip spaces in addition to
| Metal's preferred 0/1. The problem is figuring out how to
| use this other clip space.
| Jasper_ wrote:
| This is effectively what the vertex shader modification would
| do -- the same trick that ANGLE does: gl_Position.z =
| (gl_Position.z + gl_Position.w) * 0.5;
|
| This is the same as modifying a projection matrix -- you're
| doing the same post-multiply to the same column. But note that
| there's no guarantee there's ever a projection matrix. Clip
| space coordinates could be generated directly in the vertex
| shader.
| skocznymroczny wrote:
| I don't know why there's so much love for OpenGL in the
| communities still. Maybe it's the "open" part in the name, which
| was always confusing people, thinking it's an open source
| standard or something like that.
|
| The API is very antiquated, doesn't match modern GPU
| architectures at all and requires many workarounds in the driver
| to get the expected functionality, often coming at a performance
| cost.
|
| Vulkan is nice, but it goes into the other extreme. It's very low
| level and designed for advanced users. Even getting anything on
| the screen in Vulkan is intimidating because you have to write
| everything from scratch. To go beyond hello world, you even have
| to write your own memory allocator (or use an existing opensource
| one) because you can only do a limited amount of memory
| allocations and you're expected to allocate a huge block of
| memory and suballocate it as needed by your application.
|
| In comparison, DX12 is a bit easier to grasp. It has some nice
| abstractions such as commited resources, which take some of the
| pain away.
|
| Personally I like Metal as an API. It is lower level than OpenGL,
| getting rid of most nasty OpenGL things (state machine, lack of
| pipeline state objects), yet it is very approachable and easy to
| transition to from DX11/OpenGL. I was happy when I saw WebGPU was
| based on Metal at first. WebGPU is my go-to 3D API at the moment,
| especially with projects like wgpu-native which make it usable on
| native platforms too (don't let the Web in WebGPU confuse you).
| Teknoman117 wrote:
| > Vulkan is nice, but it goes into the other extreme. It's very
| low level and designed for advanced users. Even getting
| anything on the screen in Vulkan is intimidating because you
| have to write everything from scratch.
|
| I honestly believe that this is the major reason. Developing a
| hobby project with OpenGL is little more than using SDL or GLFW
| to get a window with a GLContext and then you can just start
| calling commands. Vulkan is much more complicated and unless
| you're really pushing performance limits, you're not getting
| much of a benefit for the extra headache.
| gary_0 wrote:
| OpenGL is what you use if you just want to render some
| triangles on the GPU with a minimum of hassle on the most
| platforms (which is quite a few if you include GLES, WebGL,
| and ANGLE). Most people aren't writing graphics engines for
| AAA games so OpenGL is all they need.
| mort96 wrote:
| You acknowledge that Vulkan is too low level for people who
| aren't investing billions into an AAA graphics engine. And you
| surely know that OpenGL and Vulkan are the only two cross-
| platform graphics APIs. Are you sure you can't infer why people
| like OpenGL from those two points? Especially in Linux-heavy
| communities where DX and Metal aren't even options?
|
| I assure you, none of the "love" for OpenGL comes from the
| elegance of its design.
| bitwize wrote:
| There should be more effort to support Direct3D under Linux.
| We have Wine and DXVK, but it should be easier to integrate
| the D3D support into Linux applications.
| skrrtww wrote:
| Despite the progress here, for me it raises a question: Most of
| the old games she mentions are x86 32bit games. What's the story
| for how these programs are actually going to run in Asahi? Box86
| [1] doesn't sound like it's projected to run on M1. Rosetta 2 on
| macOS allows 32-bit code to be run by a 64-bit process, which is
| the workaround CrossOver et. al. use (from what I understand),
| but that obviously won't be available?
|
| [1] https://box86.org
| TazeTSchnitzel wrote:
| QEMU has a "user mode" feature where it can transparently
| emulate a Linux process and translates syscalls. You can
| probably run at least old 32-bit Linux games that way, assuming
| you have appropriate userland libraries available. Windows
| content might be trickier.
| rowanG077 wrote:
| Rosetta 2 runs on Linux. There's also FEX.
| amluto wrote:
| Does it? Or does Rosetta 2 run on Mac OS with a Linux shim to
| ask the host to kindly Rosetta-ify a given binary?
| skrrtww wrote:
| I guess that's true, I forgot about Apple making Rosetta 2
| installable in Linux VMs.
|
| Also though, since Rosetta 2 was released, it's had an
| incredibly slow implementation of x87 FPU operations, and
| anything that relies on x87 floating point math (including
| lots of games) is currently running about 100x slower than it
| ought to. Apple is aware of it but it's still not fixed in
| Ventura.
|
| I hadn't heard of FEX before, looks interesting.
| mort96 wrote:
| Huh, I thought everyone used SSE floats these days. I
| suppose there may be old games compiled with x87 floats,
| but I'd expect those to be made for CPUs so old that even
| slow x87 emulation wouldn't be a big issue.
|
| What software do people have x87-related issues with?
| skrrtww wrote:
| The software I personally have the most issues with is
| Star Wars Episode 1: Racer, a 3d title from 1999 that
| from what I understand uses x87 math extensively. In
| Parallels (i.e. no Rosetta) it runs at 120fps easily,
| while in CrossOver the frame rate barely ekes above 20.
| Old titles like Half-Life, all other Source games,
| Fallout 3, SWTOR etc. all run vastly worse than they
| should, and many cannot run at playable framerates
| through Rosetta. Honestly, the problem most likely
| extends to more of Rosetta's floating point math than
| just x87.
|
| The author of REAPER has also written about it some:
| https://user.cockos.com/~deadbeef/index.php?article=842
|
| There's been lots of discussion about the issue in the
| Codeweavers forums, and Codeweavers points the blame
| squarely at Apple, who have been, predictably, very quiet
| about it.
| 58028641 wrote:
| Does Rosetta on Linux support 32 bit code? I believe FEX
| does.
| saagarjha wrote:
| Rosetta supports emulating 32-bit code.
| 58028641 wrote:
| On Linux? I know it has been confirmed on macOS. I
| haven't heard anyone say they ran 32 bit code on Linux.
| mort96 wrote:
| Someone would need to make an x86 -> ARM recompiler like
| Rosetta 2. That's not an easy task, but also not the task she's
| tackling with the GPU driver.
|
| It's not unprecedented in the open-source space though; the
| PCSX2 PlayStation 2 emulator for example contains a MIPS -> x86
| recompiler, and the RPCS3 PlayStation 3 emulator contains a
| Cell -> x86 recompiler.
| viktorcode wrote:
| Can someone explain to me why support OpenGL at all? Vulkan is
| easier to implement. Is there a need for OpenGL on Linux?
| dagmx wrote:
| Because Vulkan, despite the mystical reputation it has in
| gaming circles, actually has fairly low adoption vs OpenGL .
|
| Very few applications in the grand scheme of things use Vulkan,
| and a minority of games do.
|
| Therefore the ROI on supporting OpenGL is very high.
| 58028641 wrote:
| Doesn't implementing Vulkan give you DirectX with DXVK and
| VKD3D and OpenGL with Zink for free?
| Cu3PO42 wrote:
| Only if you support all of the necessary Vulkan features
| and extensions. The article states that getting to that
| point would be a multi-year full time effort, whereas
| "only" OpenGL seems to be within grasp for this year. And
| arguably having a lower OpenGL standard soon is better than
| OpenGL 4.6 in a few years.
| erichocean wrote:
| Yes, with appropriate (and reasonably-available) Vulkan
| extensions.
| phire wrote:
| Keep in mind that Mesa actually implements most of OpenGL for
| you. Its not like you are implementing a whole OpenGL driver
| from scratch, you are mostly implementing a hardware
| abstraction layer.
|
| My understanding is that this hardware abstraction layer for
| mesa is way easier to implement than a full vulkan driver,
| especially since the earlier versions of OpenGL only require a
| small subset of the features that a vulkan driver requires.
| Jasper_ wrote:
| Because of how mesa is structured. OpenGL is notoriously
| terrible to implement, so there's a whole framework called
| Gallium that does the hard work for you, and you slot yourself
| into that. Meanwhile, Vulkan is easier to implement from
| scratch, so there's a lot less infrastructure for it in mesa,
| and you have to implement more of the boring paperwork
| correctly.
|
| It's an accident of history more than anything else. Once the
| reverse engineering is further along, I expect a Vulkan driver
| to be written for it, and the Gallium one to be phased out in
| favor of Zink.
| gjsman-1000 wrote:
| On a reverse-engineered GPU like this, because of Vulkan's low-
| level design, implementing (early) OpenGL might actually be
| significantly easier.
|
| Also, Vulkan isn't popular with game developers because
| availability sucks. Vulkan doesn't run on macOS. Or iOS. Or 40%
| of Android phones. Or Xbox. Or PlayStation. Or Nintendo
| Switch[1].
|
| Unless you are targeting Windows (which has DirectX and OpenGL
| already), or those 60% of Android phones only, or Linux, why
| would you use Vulkan? On Windows, DirectX is a generally-
| superior alternative, and you get Xbox support basically free,
| and if you also support an older DirectX, much broader PC
| compatibility. On Android, just use OpenGL, and don't worry
| about separate implementations for the bifurcated Vulkan/OpenGL
| support. On Linux, just use Proton with an older DirectX. Whiz
| bang, no need for Vulkan whatsoever. Yes, some systems might
| perform better if you had a Vulkan over OpenGL, but is the cost
| worth it when you don't need it?
|
| [1] Technically, Vulkan does exist for Nintendo Switch, but it
| is so slow almost no production game uses it, and it is widely
| considered not an option. Nintendo Switch is slow enough
| without Vulkan making it slower. Much easier just to use the
| proprietary NVIDIA library.
| [deleted]
___________________________________________________________________
(page generated 2022-08-22 23:00 UTC)