[HN Gopher] 2D Graphics on Modern GPU (2019)
       ___________________________________________________________________
        
       2D Graphics on Modern GPU (2019)
        
       Author : peter_d_sherman
       Score  : 165 points
       Date   : 2021-03-15 03:31 UTC (19 hours ago)
        
 (HTM) web link (raphlinus.github.io)
 (TXT) w3m dump (raphlinus.github.io)
        
       | raphlinus wrote:
       | Hi again! This post was an early exploration into GPU rendering.
       | The work continues as piet-gpu, and, while it's not yet a
       | complete 2D renderer, there's good progress, and also an active
       | open source community including people from the Gio UI project.
       | I've not long ago implemented nested clipping (which can be
       | generalized to blend modes) and have a half-finished blog post
       | draft. I'm also working on this as my work as a researcher on the
       | Google Fonts team. Feel free to ask questions in this thread - I
       | probably won't follow up with everything, as the discussion is
       | pretty sprawling.
        
         | lame88 wrote:
         | One small error (I think) - I noticed your link to pathfinder
         | linked to someone's 2020 fork of the repository rather than the
         | upstream servo repository.
        
           | littlestymaar wrote:
           | > someone's 2020 fork
           | 
           | pcwalton was the developer behind pathfinder at Mozilla (but
           | was part of past summer's layoff).
        
       | danybittel wrote:
       | It would be fantastic if something like this were part of the
       | modern apis. Vulkan, Metal DX12. But I guess it's not as sexy as
       | raytracing.
        
         | moonchild wrote:
         | Nvidia tried to make it happen[0].
         | 
         | Sadly, it didn't catch on.
         | 
         | 0. https://developer.nvidia.com/nv-path-rendering
        
           | slimsag wrote:
           | It's still there on all NVIDIA GPUs as an extension, just
           | nobody uses it.
           | 
           | IMO it didn't catch on because all three of these points:
           | 
           | 1. It only works on NVIDIA GPUs, and is riddled with joint
           | patents from NVIDIA and Microsoft forbidding anyone like AMD
           | or Intel from supporting it.
           | 
           | 2. It's hard to use: you need to get your data into a format
           | it can consume, usage is non-trivial, and often video game
           | artists are already working with rasterized textures anyway
           | so it's easy to omit.
           | 
           | 3. Vector graphics suck for artists. The number of graphic
           | designers I have met (who are the most likely subset of
           | artists to work with vector graphics) that simply hate or do
           | not understand the cubic bezier curve control points in
           | Adobe, Inkscape, and other tools is incredible.
        
             | redisman wrote:
             | > Vector graphics suck for artists
             | 
             | Someone mentioned Flash in this thread and that was a very
             | approachable vector graphics tool. I don't know how many
             | games translate to the vector style though - it's almost
             | invariably a cartoonish look. The tools are very geometric
             | so it just kind of nudges you towards that. Pixels these
             | days are more like painting so it's no surprise artists
             | like that workflow (they all secretly want to be painting
             | painters).
        
               | monsieurbanana wrote:
               | > they all secretly want to be painting painters
               | 
               | This is going to be literally life changing for them!
               | Quick, someone inform the artists of that!
        
             | derefr wrote:
             | > It only works on NVIDIA GPUs, and is riddled with joint
             | patents from NVIDIA and Microsoft forbidding anyone like
             | AMD or Intel from supporting it.
             | 
             | Why do companies do this? What do they expect to get out of
             | creating an API that is proprietary to their specific non-
             | monopoly device, and that therefore very obviously nobody
             | will ever actually use?
        
               | numlock86 wrote:
               | Well, you could always implement a fallback for
               | unsupported devices and advertise for better performance
               | on supported devices. Implementation will usually be
               | "sponsored" by the one providing the technology. If the
               | feature then gets adopted enough others will have to come
               | up with something similar or support it native, too. This
               | happens all the time with GPU and CPU vendors, both in
               | gaming and industry.
        
               | derefr wrote:
               | Maybe, but I got the sense from the article that what
               | NVIDIA has patented are the semantics of the API itself
               | (i.e. the types of messages sent between the CPU and
               | GPU.) A polyfill for the API might still be infringing,
               | given that it would need to expose the same datatypes on
               | the CPU side.
               | 
               | And even if it wasn't, the path-shader semantics are _so_
               | different from those of regular 2D UI frameworks, that a
               | 2D UI framework implemented to use path-shading, that
               | falls back to using the polyfill, might be much _worse_
               | performing than one implemented using regular CPU-side 2D
               | plotting. It would very likely also suffer from issues
               | like flicker, judder, etc, which are much worse /more
               | noticable than just "increased CPU usage".
        
               | my123 wrote:
               | Intel and NVIDIA have a patent cross-licensing agreement
               | at the time, which is still ongoing for all patents filed
               | on or prior to March 31, 2017.
               | 
               | https://www.sec.gov/Archives/edgar/data/1045810/000119312
               | 511...
               | 
               | Patents here are used for mutual destruction in case one
               | tries to sue the other, NVIDIA won't use them against AMD
               | (and the reverse is true too) when regarding GPUs.
               | 
               | Intel and AMD would however enforce their patents if
               | NVIDIA tries to enter the x86 CPU market.
        
               | adwn wrote:
               | > _What do they expect to get out of creating an API that
               | is proprietary to their specific non-monopoly device, and
               | that therefore very obviously nobody will ever actually
               | use?_
               | 
               | You mean like Cuda, which is wildly successful, has a
               | huge ecosystem, and which basically ensures you'll have
               | to buy NVidia if you're serious about GPU computing?
        
               | derefr wrote:
               | CUDA is, essentially, a B2B product-feature.
               | 
               | A business is flexible when serving its own needs: if
               | they want to do GPGPU computations, they can evaluate the
               | alternatives and _choose_ a platform /technology like
               | CUDA to build on; and so _choose_ to lock _themselves_
               | into the particular GPUs that implement that technology.
               | But that's a choice they're only making for themselves.
               | They're using those GPUs to compute something; they're
               | not forcing anyone _else_ downstream of them to use those
               | same GPUs to _consume_ what they produce. The GPUs
               | they're using become part of their black box.
               | 
               | Path shaders, on the other hand, would--at least as
               | described in the article--be chiefly a B2C / end-user
               | product-feature. They'd be something games/apps would
               | rely on to draw 2D UI elements. But path shaders would
               | need to be _implemented_ + by the developers creating
               | those games/apps.
               | 
               | + (Yes, really, they _would_ need to be implemented by
               | the game /app developer, not game-engine framework devs.
               | Path shaders are a low-level optimization, like Vulkan --
               | something that is only advantageous to use when you use
               | it directly to take advantage of its low-level semantics,
               | rather than through a framework that tries to wrap it in
               | some other, more traditional semantics. As with OpenGL,
               | the ill-fitted abstractions of the traditional API were
               | precisely what made it slow!)
               | 
               | And those game/app developers, unlike the developers
               | doing GPGPU computations, have to make a decision based
               | on _what devices their customers have_ , rather than what
               | _their own company_ can buy. (Or, if they think they have
               | a "killer app", they can try to _force_ their customers
               | to buy into the platform they built on. But most
               | companies who think they've built a B2C killer app,
               | haven't.)
               | 
               | For a B2B product-feature, a plurality position is fine.
               | Even one forced to be permanent by patents.
               | 
               | For a B2C product-feature, a plurality position _by
               | itself_ is fine, because rivals will just ship something
               | similar and offer cross-support. But an _indefinite_
               | plurality position is death.
               | 
               | Compare, in the web ecosystem:
               | 
               | * Features shipped originally in one renderer as "clean,
               | standalone" ideas, e.g. -webkit-border-radius. These were
               | cloned (-moz-border-radius, etc.) and then standardized
               | (plain old border-radius.)
               | 
               | * Features shipped in one renderer as implementations
               | dependent on that renderer's particular
               | environment/ecosystem, e.g. IE's ActiveX-based CSS
               | "filter" stanza. Because of how they were implemented,
               | these had no potential to ever be copied by rivals, so
               | _nobody outside of Microsoft bothered to use them on
               | their sites_ , since they knew that only IE would render
               | the results correctly, and IE at that point already had
               | only a plurality position.
        
               | d110af5ccf wrote:
               | > What do they expect to get out of ...
               | 
               | Nvidia did exactly this with CUDA, so go take a look at
               | the ML world for an example of how it works. It seems to
               | be going quite well for them. A common enough refrain is
               | "I don't really _want_ to buy Nvidia, but my toolchain
               | requires CUDA ".
               | 
               | Pretty much every FPGA and SoC vendor does exactly this
               | as far as I understand things. It's why you can't
               | trivially run an open OS on most mobile hardware.
               | 
               | Apparently such schemes don't meaningfully affect the
               | purchasing decisions of a sufficiently large fraction of
               | people to disincentivize the behavior.
        
               | my123 wrote:
               | Intel and AMD also did the exact same thing with x86.
               | 
               | But yet, people still use it. (and Arm also has its fair
               | share of patents, but is a licensable architecture by
               | others)
        
             | Const-me wrote:
             | > just nobody uses it
             | 
             | AFAIK Skia and derived works (Chrome, Chromium, Electron,
             | etc.), are all using it when available.
        
           | Jasper_ wrote:
           | Stencil-and-cover approaches like NV_path_rendering have a
           | lot of drawbacks like documented below, but probably biggest
           | of all, they're still mostly just doing the tessellation on
           | CPU. A lot of the important things, like winding mode
           | calculations, are handled on the CPU. Modern research is
           | looking for ways out of that.
        
             | Lichtso wrote:
             | Actually, they calculate the winding per fragment on the
             | GPU [0]. They require polygon tessellation only for
             | stroking (which has no winding rule). The downside of their
             | approach is that it is memory bandwidth limited, precisely
             | because it does the winding on the GPU instead of using CPU
             | tessellation to avoid overlap / overdraw.
             | 
             | Curve filling is pretty much solved with implicit curves,
             | stencil-and-cover or scanline-intersection-sort approaches
             | (all of which can be done in a GPU only fashion). Stroking
             | is where things could still improve a lot as it is almost
             | always approximated by polygons.
             | 
             | [0]: https://developer.nvidia.com/gpu-accelerated-path-
             | rendering
        
         | raphlinus wrote:
         | I think the world is going into a more layered approach, where
         | it's the job of the driver API (Vulkan etc) to expose the power
         | of the hardware fairly directly, and it's the job of higher
         | levels to express the actual rendering in terms of those lower
         | level primitives. Raytracing is an exception because you do
         | need hardware support to do it well.
         | 
         | Whether there's hardware that can make 2D rendering work better
         | is an intriguing question. The NV path rendering stuff
         | (mentioned elsethread) was an attempt (though I think it may be
         | more driver/API than hardware), but I believe my research
         | direction is better, in that it will be higher quality, faster,
         | and more flexible with respect to the imaging model on standard
         | compute shaders than an approach using the NV path rendering
         | primitives. Obviously I have to back that up with empirical
         | measurements, which is not yet done :)
        
       | moonchild wrote:
       | Something else worth looking at: the slug font renderer[0]. Sadly
       | it's patented, but the paper[1] is there for those of you in the
       | EU.
       | 
       | 0. http://sluglibrary.com/
       | 
       | 1. http://jcgt.org/published/0006/02/02/paper.pdf
        
         | Jasper_ wrote:
         | Slug isn't great for lots of shapes since it does the winding
         | order scanning per-pixel on the pixel shader. It does have a
         | novel quadratic root-finder. Put simply, it's better suited to
         | fonts than large vector graphics.
        
           | vg_head wrote:
           | I've once implemented the basic idea behind the algorithm
           | used in slug(described in the paper [1], though without the
           | 'band' optimization, I just wanted to see how it works), and
           | I agree with you, the real innovation is in that quadratic
           | root-finder. It can tell you whether you are inside or
           | outside just by manipulating the three control points of a
           | curve, it's very fast, what remains to be done is to use an
           | acceleration data structure so that you don't have to check
           | for every curve. That works very well for quadratic Bezier
           | curves, in the paper it says that it can be easily extended
           | to cubics, though no example is provided(and I doubt it's
           | trivial). What I think would be hard with Slug's method is
           | extending it to draw gradients, shadows, basically general
           | vector graphics like you say. Eric Lengyel on his twitter
           | showed a demo [2] using Slug to render general vector
           | graphics, but I'm not sure of how many features it supports,
           | but it definitely supports cubic Bezier curves. I'd also like
           | to add that the algorithm didn't impress me with how the text
           | looks at small sizes, which I think is very important in
           | general, though maybe not so much for games(maybe I just
           | didn't implement it correctly).
           | 
           | [1] "GPU-Centered Font Rendering Directly from Glyph
           | Outlines" http://jcgt.org/published/0006/02/02/
           | 
           | [2]
           | https://twitter.com/EricLengyel/status/1190045334791057408
        
         | korijn wrote:
         | Any alternative solutions for the problem of GPU text rendering
         | (that are not patent infringing)?
        
           | pvidler wrote:
           | You can always render the text to a texture offline as a
           | signed distance field and just draw out quads as needed at
           | render time. This will always be faster than drawing from the
           | curves, and rendering from an SDF (especially multi-channel
           | variants) scales surprisingly well if you choose the
           | texture/glyph size well.
           | 
           | A little more info:
           | 
           | https://blog.mapbox.com/drawing-text-with-signed-distance-
           | fi...
           | 
           | MIT-licensed open-source multi-channel glyph generation:
           | 
           | https://github.com/Chlumsky/msdfgen
           | 
           | The only remaining issue would be the kerning/layout, which
           | is admittedly far from simple.
        
           | djmips wrote:
           | A signed distance field approach can be good depending on
           | what you're after.
           | https://github.com/libgdx/libgdx/wiki/Distance-field-fonts
        
             | onion2k wrote:
             | There's a great WebGL library for doing that on the web
             | using any .ttf, .otf, or .woff font - https://github.com/pr
             | otectwise/troika/tree/master/packages/t...
        
           | srgpqt wrote:
           | FOSS does not magically circumvent patents.
        
             | orhmeh09 wrote:
             | Is there a serious risk of patent enforcement in common
             | open source repositories ranging from GitHub to PPAs and
             | Linux package repositories located outside any relevant
             | jurisdictions?
        
             | korijn wrote:
             | Does that imply it's possible to implement 2D font/vector
             | graphics rendering on a GPU and end up getting burned by
             | patent law? I am having a hard time imagining they were
             | awarded such a generic patent.
             | 
             | Anyway, I will adjust my question based on your feedback.
        
         | Lichtso wrote:
         | In 2005 Loop & Blinn [0] found a method to decide if a sample /
         | pixel is inside or outside a bezier curve (independently of
         | other samples, thus possible in a fragment shader) using only a
         | few multiplications and one subtraction per sample.
         | - Integral quadratic curve: One multiplication         -
         | Rational quadratic curve: Two multiplications         -
         | Integral cubic curve: Three multiplications         - Rational
         | cubic curve: Four multiplications
         | 
         | [0] https://www.microsoft.com/en-us/research/wp-
         | content/uploads/...
        
           | vg_head wrote:
           | It's referenced in slug's algorithm description paper [1],
           | the main disadvantage with Loop-Blinn is the triangulation
           | step that is required, and at small text sizes you lose a bit
           | of performance. Slug only needs to render a quad for each
           | glyph. That is not to say that any one method is better than
           | the other though! They both have advantages and
           | disadvantages. I think the two most advanced techniques for
           | rendering vector graphics on the GPU are "Massively Parallel
           | Vector Graphics" [2] and "Efficient GPU Path Rendering Using
           | Scanline Rasterization" [3]. Though I don't know of any well
           | known usage of them. Maybe it's because it's very hard to
           | implement them, the sources attached to them are not trivial
           | to understand, even if you've read the papers. They also use
           | OpenCL/Cuda if I remember correctly.
           | 
           | [1] "GPU-Centered Font Rendering Directly from Glyph
           | Outlines" http://jcgt.org/published/0006/02/02/
           | 
           | [2] http://w3.impa.br/~diego/projects/GanEtAl14/
           | 
           | [3] http://kunzhou.net/zjugaps/pathrendering/
           | 
           | EDIT: I've only now seen that [2] and [3] are already
           | mentioned in the article
           | 
           | EDIT2: To compensate for my ignorance, I will add that one of
           | the authors of MPVG has a course on rendering vector
           | graphics: http://w3.impa.br/~diego/teaching/vg/
        
             | Lichtso wrote:
             | If I understand correctly the second link is basically an
             | extension of Loop-Blinns implicit curve approach with
             | vector textures in order to find the winding counter for
             | each fragment in one pass.
             | 
             | >> Slug only needs to render a quad for each glyph.
             | 
             | I don't know how many glyphs you want to render (to the
             | point that there are so many that you can't read them
             | anymore), but a modern GPU s are heavily optimized for
             | triangle throughput. So 2 or 20 triangles per glyph makes
             | only a little difference. The bigger problem is usually the
             | sample fill rate and memory bandwidth (especially if you
             | have to write to pixels more than once).
             | 
             | I have been eying the scanline-intersection-sort approach
             | (your third link) too. Sadly they have no answer to path
             | stroking (same as everybody else) and it also requires an
             | efficient sorting algorithm for the GPU (implementations of
             | such are hard to come by outside of CUDA, as you
             | mentioned).
        
               | vg_head wrote:
               | Indeed, most techniques that target the GPU have no
               | response to stroking, they recommend generating paths
               | beforehand so that it looks like it's stroked.
               | 
               | And yes, the number of triangles doesn't really make a
               | difference in general, but in Slug's paper they say:
               | 
               | "At small font sizes, these triangles can become very
               | tiny and decrease thread group occupancy on the GPU,
               | reducing performance"
               | 
               | I'm not experienced enough to say how true that is/how
               | much of a difference it makes.
               | 
               | > If I understand correctly the second link is basically
               | an extension of Loop-Blinns implicit curve approach with
               | vector textures in order to find the winding counter for
               | each fragment in one pass.
               | 
               | I've read the paper, but to be honest it's a bit over my
               | head right now, but AFAIK MPVG is an extension to this
               | [1], which looks like it's an extension to Loop-Blinn
               | itself, so I think you're right.
               | 
               | [1] "Random-Access Rendering of General Vector Graphics"
               | http://hhoppe.com/ravg.pdf
        
       | skohan wrote:
       | Is this approach novel? For instance is Apple's approach to
       | native UI rendering doing UI rendering on the CPU, or using a 3D
       | renderer?
        
         | stephen_g wrote:
         | The author mentions PathFinder, the GPU font renderer from
         | Servo a lot, so there do seem to be existing systems that do
         | things in that way.
         | 
         | I'm not 100% sure though about Apple and other's approach
         | though - definitely when compositing desktop environments were
         | new, the way it was done was to software render UI elements
         | into textures, and then use the GPU to composite it all
         | together. I assume more is being done on the GPU now but it may
         | not actually be all that performance critical for regular UIs
         | (he talks about things like CAD which are more performance
         | sensitive).
        
         | Animats wrote:
         | He's describing roughly the feature set of Flash, which is a
         | system for efficiently putting 2D objects on top of other 2D
         | objects.
        
           | skohan wrote:
           | No I mean is the approach of doing this on the GPU actually
           | novel
        
             | Animats wrote:
             | Games often use the GPU for their 2D elements. It's
             | inefficient to do a window system that way, because you
             | have to update on every frame, but if you're updating the
             | whole window on every frame anyway, it doesn't add cost. As
             | the original poster points out, it does run down the
             | battery vs. a "window damage" approach.
        
         | CyberRabbi wrote:
         | Depends on what you mean by novel. No other "mainstream" API
         | that implements the traditional 2D imaging model popularized by
         | Warnock et al. with PostScript is implemented this way, except
         | for Pathfinder. Apple does all 2D drawing operations on the CPU
         | and composites distinct layers using the GPU. This does a lot
         | more work on the GPU.
        
           | garethrowlands wrote:
           | What about Direct2D? Surely Windows counts as mainstream? The
           | docs are from 2018, https://docs.microsoft.com/en-
           | us/windows/win32/direct2d/comp...
        
             | CyberRabbi wrote:
             | > Rendering method
             | 
             | > In order to maintain compatibility, GDI performs a large
             | part of its rendering to aperture memory using the CPU. In
             | contrast, Direct2D translates its APIs calls into Direct3D
             | primitives and drawing operations. The result is then
             | rendered on the GPU. Some of GDI?s rendering is performed
             | on the GPU when the aperture memory is copied to the video
             | memory surface representing the GDI window.
             | 
             | I can't say for certain but I think the main point being
             | communicated here is that Direct2D uses 3D driver
             | interfaces to get its pixels on the screen. Not necessarily
             | that it renders the image using the GPU. I could be wrong.
        
           | The_rationalist wrote:
           | What about skia?
        
             | Jasper_ wrote:
             | Skia renders paths on the CPU. There was a prototype of a
             | GPU based approach called skia-compute but it was removed a
             | few years ago. I believe some parts of skia can use SDFs
             | for font rendering, but that's only really accurate at
             | small sizes.
        
               | raphlinus wrote:
               | The skia-compute project is now Spinel, and is under
               | Fuchsia. It is very interesting, perhaps the fastest way
               | to render vector paths on the GPU, but the code is almost
               | completely inscrutable, and it has lots of tuning
               | parameters for specific GPU hardware, so porting is a
               | challenge.
               | 
               | Skia has a requirement that rendering of paths cannot
               | suffer from conflation artifacts (though compositing
               | different paths can), as they don't want to regress on
               | any existing web (SVG, canvas) content. That's made it
               | difficult to move away from their existing software
               | renderer which is highly optimized and deals with this
               | well. Needless to say, I consider that an interesting
               | challenge.
        
               | The_rationalist wrote:
               | Wow the codebase looks quite small for such ambitions!
        
         | Jasper_ wrote:
         | Apple has an incredibly fast software 2D renderer (Quartz 2D),
         | and limited GPU 2D renderer and compositor (Quartz Compositor).
         | Doing PostScript rendering on the GPU is still an active
         | research project. And Raph is doing some of that research!
        
           | bumbada wrote:
           | That was in the past.
           | 
           | Apple has one of the best systems for drawing 2D and it is
           | accelerated on the GPU. It is used by Apple Maps and it made
           | the Offline maps of it much better than Google's.
           | 
           | But Apple is using trade secrets here, they are not
           | publishing it so everyone could copy it.
        
             | Pulcinella wrote:
             | Do you have a source for this? I would like to no more.
        
       | klaussilveira wrote:
       | Loosely related: Blend2D has been innovating a lot on this space.
       | 
       | https://blend2d.com/
        
       | Const-me wrote:
       | The quality is not good, and the performance is not even
       | mentioned.
       | 
       | I have used different approach: https://github.com/Const-
       | me/Vrmac#vector-graphics-engine
       | 
       | My version is cross-platform, tested with Direct3D 12 and GLES
       | 3.1.
       | 
       | My version does not view GPUs as SIMD CPU, it actually views them
       | as a GPU.
       | 
       | When rendering a square without anti-aliasing, the library will
       | render 2 triangles. When rendering a filled square with anti-
       | aliasing, the library will render about 10 triangles, large
       | opaque square in the center, and thin border about 1 pixel thick
       | around it for AA.
       | 
       | It uses hardware Z buffer with early Z rejection to save pixel
       | shaders and fill rate. It uses screen-space derivatives in the
       | pixel shader for anti-aliasing. It renders arbitrarily complex 2D
       | scenes with only two draw calls, one front to back with opaque
       | stuff, another back to front with translucent stuff. It does not
       | have quality issues with stroked lines much thinner than 1px.
        
         | fho wrote:
         | Dumb question ... but isn't the easiest AA method rendering at
         | a higher resolution and downsampling the result?
         | 
         | I see that it's not feasible for a lot of complex 3D graphics,
         | but 2D is (probably) a lot less taxing for modern GPUs?
        
           | dahart wrote:
           | High quality supersampling is about equally hard in 2D and
           | 3D, since the result is 2D. It is also the most common
           | solution in both 2D and 3D graphics, so your instinct is
           | reasonably good.
           | 
           | But, font and path rendering, for example in web browsers
           | and/or with PDF or SVG - these things can benefit in both
           | efficiency and in quality from using analytic antialiasing
           | methods. 2D vector rendering is a place where doing something
           | harder has real payoffs.
           | 
           | Just a fun aside - not all supersampling algorithms are
           | equally good. If you use the wrong filter, it's can be very
           | surprising to discover that there are ways you can take a
           | million samples per pixel or more and never succeed in
           | getting rid of aliasing artifacts. (An example is if you just
           | average the samples, aka use a Box Filter.) I have a 2D
           | digital art project to render mathematical functions that can
           | have arbitrarily high frequencies. I spent money making large
           | format prints of them, so I care a lot about getting rid of
           | aliasing problems. I've ended up with a Gaussian filter,
           | which is a tad blurrier than experts tend to like, because
           | everything else ends up giving me visible aliasing somewhere.
        
           | [deleted]
        
           | ishitatsuyuki wrote:
           | piet-gpu contributor here. You're right that supersampling is
           | the easiest way to achieve AA. However, its scalability
           | issues are immense: for 2D rendering it's typically
           | recommended to use 32x for a decent quality AA, but as the
           | cost of supersampling scales linearly (actually superlinearly
           | due to memory/register pressure), it becomes more than a
           | magnitude slower than the baseline. So if you want to do
           | anything that is real-time (e.g. smooth page zoom without
           | resorting to prerendered textures which becomes blurry),
           | supersampling is mostly an unacceptable choice.
           | 
           | What is more practical is some form of adaptive
           | supersampling: a lot of pixels are filled by only one path
           | and don't require supersampling. There's also some more
           | heuristics that can be used: one that I want to try out in
           | piet-gpu is to exploit the fact that in 2D graphics, most
           | pixels are only covered by at most two paths. So as a base
           | line we can track only two values per pixel plus a coverage
           | mask, then in rare occasions of three or more shapes
           | overlapping, fall back to full supersampling. This should
           | keep the cost amplification more under control.
        
           | Const-me wrote:
           | The short answer -- it depends.
           | 
           | What you described is called super-sampling. Supersampling is
           | indeed not terribly hard to implement, the problem is
           | performance overhead. Many parts of graphics pipeline scale
           | linearly with count of pixels. If you render at 16x16
           | upscaled resolution, that gonna result in 256x more pixel
           | shader invocations, and 256x the fill rate.
           | 
           | There's a good middle ground called MSAA
           | https://en.wikipedia.org/wiki/Multisample_anti-aliasing In
           | practice, 16x MSAA often delivers very good results for both
           | 3D and 2D. In case of 2D, even low-end PC GPUs are fast
           | enough with 8x or 16x MSAA level.
           | 
           | Initial versions of my library used that method.
           | 
           | The problem was, Raspberry Pi 4 GPU is way slower than PC
           | GPUs. The performance with 4x or 2x MSAA was too low for
           | 1920x1080 resolution, even just for 2D. Maybe the problem is
           | actual hardware, maybe it's a performance bug in Linux kernel
           | or GPU drivers, I have no idea. I didn't want to mess with
           | the kernel, I wanted a library that works fast on officially
           | supported 32-bit Debian Linux. That's why I bothered to
           | implement my own method for antialiasing.
        
             | esperent wrote:
             | > Maybe the problem is actual hardware
             | 
             | I think it is - as far as I know most modern GPUd implement
             | MSAA at the hardware level, and that's why even a mobile
             | GPU can handle. 8x MSAA at 1080p.
             | 
             | I don't know anything about the Raspberry Pi GPU, but maybe
             | you'd have better results switching to FXAA or SMAA there
             | (by which I mean faster, not visually better).
        
               | Const-me wrote:
               | > maybe you'd have better results switching to FXAA or
               | SMAA there
               | 
               | I've thought about that, but decided I want to prioritize
               | the quality. 16x MSAA was almost perfect in my tests, 8x
               | MSAA was still good. With levels lower than that, the
               | output quality was noticeably worse than Direct2D, which
               | was my baseline.
               | 
               | And another thing. I have found out stroked lines much
               | thinner than 1px after transforms need special handling,
               | otherwise they don't look good: too aliased, too thick,
               | and/or causing temporal artifacts while zooming
               | continuously. Some real-world vector art, like that
               | ghostscript's tiger from Wikipedia, has quite a lot of
               | such lines.
        
         | skohan wrote:
         | Does your approach allow for quality optimizations like sub
         | pixel rendering for arbitrary curves? It seems like this is
         | what is interesting about this approach.
         | 
         | Also in terms of "two draw calls", does that include drawing
         | text as part of your transparent pass, or are you assuming that
         | your window contents are already rendered to textures?
        
           | Const-me wrote:
           | > Does your approach allow for quality optimizations like sub
           | pixel rendering for arbitrary curves?
           | 
           | The library only does grayscale AA for vector graphics.
           | 
           | Subpixel rendering implemented for text but comes with
           | limitations: only works when text is not transformed (or
           | transformed in specific ways, like rotated 180deg, or
           | horizontally flipped), and you need to pass background color
           | behind the text to the API. Will only look good if the text
           | is on solid color background, or on slow-changing gradient.
           | 
           | Sub pixel AA is hard for arbitrary backgrounds. I'm not sure
           | many GPUs support the required blend states, and workarounds
           | are slow.
           | 
           | > does that include drawing text as part of your transparent
           | pass
           | 
           | Yes, that includes text and bitmaps. Here's two HLSL shaders
           | who do that, the GPU abstraction library I've picked
           | recompiles the HLSL into GLSL or others on the fly:
           | https://github.com/Const-
           | me/Vrmac/blob/master/Vrmac/Draw/Sha...
           | https://github.com/Const-
           | me/Vrmac/blob/master/Vrmac/Draw/Sha... These shaders are
           | compiled multiple times with different set of preprocessor
           | macros, but I did test with all of them enabled at once.
        
         | glowcoil wrote:
         | > The quality is not good, and the performance is not even
         | mentioned.
         | 
         | I notice that your renderer doesn't even attempt to render text
         | on the GPU and instead just blits glyphs from an atlas texture
         | rendered with Freetype on the CPU: https://github.com/Const-
         | me/Vrmac/blob/master/Vrmac/Draw/Sha...
         | 
         | In contrast, piet-gpu (the subject of the original blog post)
         | has high enough path rendering quality (and performance) to
         | render glyphs purely on the GPU. This makes it clear you didn't
         | even perform a cursory investigation of the project before
         | making a comment to dump on it and promote your own library.
        
           | Const-me wrote:
           | > instead just blits glyphs from an atlas texture rendered
           | with Freetype on the CPU
           | 
           | Correct.
           | 
           | > has high enough path rendering quality (and performance) to
           | render glyphs purely on the GPU
           | 
           | Do you have screenshots showing quality, and performance
           | measures showing speed? Ideally from Raspberry Pi 4?
           | 
           | > This makes it clear you didn't even perform a cursory
           | investigation of the project
           | 
           | I did, and mentioned in the docs, here's a quote: "I didn't
           | want to experiment with GPU-based splines. AFAIK the research
           | is not there just yet." Verifiable because version control:
           | https://github.com/Const-
           | me/Vrmac/blob/bbe83b9722dcb080f1aed...
           | 
           | For text, I think bitmaps are better than splines. I can see
           | how splines are cool from a naive programmer's perspective,
           | but practically speaking they are not good enough for the
           | job.
           | 
           | Vector fonts are not resolution independent because hinting.
           | Fonts include a bytecode of compiled programs who do that.
           | GPUs are massively parallel vector chips, not a good fit to
           | interpret byte code of a traditional programming language.
           | This means whatever splines you gonna upload to GPU will only
           | contain a single size of the font, trying to reuse for
           | different resolution will cause artifacts.
           | 
           | Glyphs are small and contain lots of curves. Lots of data to
           | store, and lots of math to render, for comparatively small
           | count of output pixels. Copying bitmaps is very fast, modern
           | GPUs, even low-power mobile and embedded ones, are designed
           | to output ridiculous volume of textured triangles per second.
           | Font face and size are more or less consistent within a given
           | document/page/screen. Apart from synthetic tests, glyphs are
           | reused a lot, and there're not too many of them.
           | 
           | When I started the project, the very first support of compute
           | shaders on Pi 4 was just introduced in the Mesa upstream
           | repo. Was not yet in the official OS images. Bugs are very
           | likely in versions 1.0 of anything at all.
           | 
           | Finally, even if Pi 4 had awesome support for compute shaders
           | back them, the raw compute power of the GPU is not that
           | impressive. Here in my Windows PC, my GPU is 30 times faster
           | than CPU in terms of raw FP32 performance. With that kind of
           | performance gap, you can probably make GPU splines work fast
           | enough, after spending enough time on development. Meanwhile,
           | on Pi 4 there's no difference, the quad-core CPU has raw
           | performance pretty close to the raw performance of the GPU.
           | To lesser extent same applies to low-end PCs: I only have a
           | fast GPU because I'm a graphics programmer, many people are
           | happy with their Intel UHD graphics, these are not
           | necessarily faster than CPUs.
        
         | davrosthedalek wrote:
         | If you render two anti-aliased boxes next to each other (i.e.
         | they share an edge), will you see a thin line there, or a solid
         | fill? Last time I checked, cairo-based PDF readers get this
         | wrong, for example.
        
           | Const-me wrote:
           | Good question.
           | 
           | If you use the fillRectangle method https://github.com/Const-
           | me/Vrmac/blob/1.2/Vrmac/Draw/iDrawC... to draw them I think
           | you should get solid fill. That particular API doesn't use AA
           | for that shape. Modern GPU hardware with their rasterization
           | rules https://docs.microsoft.com/en-
           | us/windows/win32/direct3d11/d3... is good at that use case,
           | otherwise 3D meshes would contain holes between triangles.
           | 
           | If you render them as 2 distinct paths, filled, not stroked,
           | and anti-aliased, you indeed gonna see a thin line between
           | them. Currently, my AA method shrinks filled paths by about
           | 0.5 pixels. For stroked paths it's the opposite BTW, the
           | output is inflated by half of the stroke width (the mid.point
           | of the strokes correspond to the source geometry).
           | 
           | You can merge boxes into a single path with 2 figures
           | https://github.com/Const-
           | me/Vrmac/blob/1.2/Vrmac/Draw/Path/P..., in this case C++ code
           | of the library should collapse the redundant inner edge and
           | the output will be identical to a single box, i.e. solid
           | fill. Will also render slightly faster because less triangles
           | in the mesh.
        
       | slmjkdbtl wrote:
       | always appreciate raph's work on rendering and UI programming,
       | but want to ask a somewhat unrelated question to this post: does
       | anyone have a lot of experience doing 2d graphics on the cpu? i
       | wonder if there'll be a day we're confident doing all 2d stuff on
       | the cpu since cpus are much easier to work with and have much
       | more control, i also read some old 3d games are also using
       | software rendering and did well on old hardwares, that gave me a
       | lot of confidence in software render every (lowrez) thing
        
         | bumbada wrote:
         | I have lots of experience, created several font renderers in
         | the CPU and GPU.
         | 
         | No, doing CPU drawing is too inefficient. Without too much
         | trouble you get 50x more efficiency in the GPU, that is, you
         | can draw 50frames per every frame in the CPU, using the same
         | amount of energy and time.
         | 
         | If you go deeper, low level with Vulkan-Metal and specially
         | control the GPU memory, you can get 200x, being way harder to
         | program.
         | 
         | CPU drawing is very useful for testing: You create the
         | reference you compare the GPU stuff with.
         | 
         | CPU drawing is the past, not the future.
        
           | slmjkdbtl wrote:
           | Thanks for the write up. Yeah i can see the huge performance
           | difference, one thing that GPU bothers me is now every vendor
           | provides a different set of API, you kinda have to use a
           | hardware abstraction layer to use the GPU if you really want
           | cross platform, and that's often a huge effort or dependency
           | and hard to do right, even in OpenGL days it's easy because
           | you only have to deal with one API instead of three
           | vulkan/metal/d3d. With CPU if ignore the performance lack
           | it's just a plain pixel array that can be easily displayed on
           | any environment and you have control over any bits of it, I
           | just can't get over the lightness and elegance difference
           | between the two..
        
             | bumbada wrote:
             | Vulkan/metal/d3d give you control to do things you could
             | not do with OpenGL, and they are very similar. All my code
             | works first in Vulkan and Metal, but d3d is not hard when
             | you have your vulkan code working.
             | 
             | OpenGL was designed by committee and politics(like not
             | giving you the option to compile shaders for a long time
             | while d3d could do it).
             | 
             | The hard part is thinking in term extreme parallelism. That
             | is super hard.
             | 
             | Once you have something working in the GPU, you can
             | translate it to electronics like using FPGAs.
             | 
             | The difference in efficiency is so big, that most GPU
             | approaches do not really work. They are really
             | approximations that fail with certain glyphs. Designers
             | create a model that works with most glyphs, and sometimes
             | they have a fallback, inefficient method for them.
             | 
             | With the CPU you can calculate the area under a pixel
             | exactly.
        
         | fulafel wrote:
         | The future is now: on many systems there's fairly little GPU
         | acceleration going on when runing eg a web browser and things
         | work fine.
        
         | Jasper_ wrote:
         | Yes? We know how to write scanline renderers for 2D graphics.
         | They're not that hard, a simple one can be done in ~100 lines
         | of code or so, see my article here
         | https://magcius.github.io/xplain/article/rast1.html
        
           | slmjkdbtl wrote:
           | Tanks for the amazing article! I wonder if you met any
           | performance annoyances / bottlenecks when doing actual GUI /
           | game dev with this?
        
             | iainmerrick wrote:
             | You just need to render your font on the CPU once, and
             | upload mipmaps to the GPU (either signed distance fields,
             | or just sharpened textures, that works absolutely fine
             | too).
             | 
             | I think all this GPU font rendering stuff is a bit of a red
             | herring. Are any popular apps or games actually making
             | heavy use of it?
        
           | olau wrote:
           | Enjoyable article, thanks!
        
       | choxi wrote:
       | I spent a couple years learning graphics programming to build an
       | iPad app for creating subtle animated effects. The idea was kind
       | of like Procreate but if you had "animated brushes" that produced
       | glimmer or other types of looping animated effects.
       | 
       | From what I've read, the technique behind most digital brushes is
       | to render overlapping "stamps" over the stroke. They're spaced
       | closely enough that you can't actually see the stamps.
       | 
       | But if you want to animate the stamps, you either have to store
       | the stroke data as a very large sequence of stamp meshes or you
       | can only work with the data in a raster format. The former is way
       | too many meshes even with instancing, and the latter loses a lot
       | of useful information about the stroke. Imagine you wanted to
       | create a brush where the edges of the stroke kind of pulsate like
       | a laser beam, you ideally want to store that stroke data in a
       | vector format to make it easier to identify e.g. centers and
       | edges.
       | 
       | But it turned out to be too challenging for me to figure out how
       | to 1) build a vector representation of a stroke/path without
       | losing some of the control over detail you get with the stamping
       | technique and 2) efficiently render those vectors on the GPU.
       | 
       | I'm not sure if this would help with the issues I ran into, but
       | I'm definitely excited to see some focus on 2D rendering
       | improvements!
        
       | ink_13 wrote:
       | This would have been a lot better with examples in the form of
       | rendered images or perhaps even a video. Maybe it's just my lack
       | of background in graphics, but I had a lot of trouble grasping
       | what the author was attempting to communicate without a concrete
       | example.
        
         | eternalban wrote:
         | It's about taking known 2D graphics and UI approaches which
         | were developed for CPUs and looking at effective rendering
         | engine architectures doing the same using GPUs. Terms such as
         | "scene graph", "retained mode UI", etc. are those existing 2d
         | graphics matter.
         | 
         | So the approach, afaiu, is a data layout for the scene graph
         | that basically is the more domain general concern of mapping
         | graph e.g. Linked List, datastructures (that are CPU friendly)
         | to array forms (GPU friendly) suitable for parallel treatment.
         | Other GPU concerns, such as minimizing global traffic by local
         | caching, and mapping thread groups to tiles. I found the idea
         | of having the scene graph resident in GPU to be interesting.
         | 
         | (note to author: "serialization" comes from networking roots of
         | serializing a data structure for transmision over the net. So,
         | definitely serial. /g)
        
         | MattRix wrote:
         | It wouldn't have been better, it just would have been more high
         | level and generalized, but I don't think that's what the author
         | was going for. I found the amount of detail refreshing, and as
         | someone about to make a GPU based 2D display tree renderer, it
         | was written at just the right level to be quite useful.
        
         | Agentlien wrote:
         | I understand where you are coming from. There is a lot of
         | jargon and it assumes familiarity with many concepts. I think
         | any explanatory images which would help someone unfamiliar with
         | the field would need to be accompanied by quite a bit of
         | explanation.
         | 
         | One thing which I think made reading this a bit more work than
         | necessary is that it feels like it's prattling on about a lot
         | of tangential details and never quite gets to the point.
         | 
         |  _edit: a prime example of an unnecessary aside is mentioning
         | the warp /wavefront/subgroup terminology. I feel anyone in the
         | target audience should know this already and it's not really
         | relevant to what's being explained._
        
           | skohan wrote:
           | It doesn't seem to be a finished work. I guess this is more
           | of a journal entry on the author's initial takeaways from a
           | week-long coding retreat.
        
       ___________________________________________________________________
       (page generated 2021-03-15 23:02 UTC)