[HN Gopher] 2D Graphics on Modern GPU (2019)
___________________________________________________________________
2D Graphics on Modern GPU (2019)
Author : peter_d_sherman
Score : 165 points
Date : 2021-03-15 03:31 UTC (19 hours ago)
(HTM) web link (raphlinus.github.io)
(TXT) w3m dump (raphlinus.github.io)
| raphlinus wrote:
| Hi again! This post was an early exploration into GPU rendering.
| The work continues as piet-gpu, and, while it's not yet a
| complete 2D renderer, there's good progress, and also an active
| open source community including people from the Gio UI project.
| I've not long ago implemented nested clipping (which can be
| generalized to blend modes) and have a half-finished blog post
| draft. I'm also working on this as my work as a researcher on the
| Google Fonts team. Feel free to ask questions in this thread - I
| probably won't follow up with everything, as the discussion is
| pretty sprawling.
| lame88 wrote:
| One small error (I think) - I noticed your link to pathfinder
| linked to someone's 2020 fork of the repository rather than the
| upstream servo repository.
| littlestymaar wrote:
| > someone's 2020 fork
|
| pcwalton was the developer behind pathfinder at Mozilla (but
| was part of past summer's layoff).
| danybittel wrote:
| It would be fantastic if something like this were part of the
| modern apis. Vulkan, Metal DX12. But I guess it's not as sexy as
| raytracing.
| moonchild wrote:
| Nvidia tried to make it happen[0].
|
| Sadly, it didn't catch on.
|
| 0. https://developer.nvidia.com/nv-path-rendering
| slimsag wrote:
| It's still there on all NVIDIA GPUs as an extension, just
| nobody uses it.
|
| IMO it didn't catch on because all three of these points:
|
| 1. It only works on NVIDIA GPUs, and is riddled with joint
| patents from NVIDIA and Microsoft forbidding anyone like AMD
| or Intel from supporting it.
|
| 2. It's hard to use: you need to get your data into a format
| it can consume, usage is non-trivial, and often video game
| artists are already working with rasterized textures anyway
| so it's easy to omit.
|
| 3. Vector graphics suck for artists. The number of graphic
| designers I have met (who are the most likely subset of
| artists to work with vector graphics) that simply hate or do
| not understand the cubic bezier curve control points in
| Adobe, Inkscape, and other tools is incredible.
| redisman wrote:
| > Vector graphics suck for artists
|
| Someone mentioned Flash in this thread and that was a very
| approachable vector graphics tool. I don't know how many
| games translate to the vector style though - it's almost
| invariably a cartoonish look. The tools are very geometric
| so it just kind of nudges you towards that. Pixels these
| days are more like painting so it's no surprise artists
| like that workflow (they all secretly want to be painting
| painters).
| monsieurbanana wrote:
| > they all secretly want to be painting painters
|
| This is going to be literally life changing for them!
| Quick, someone inform the artists of that!
| derefr wrote:
| > It only works on NVIDIA GPUs, and is riddled with joint
| patents from NVIDIA and Microsoft forbidding anyone like
| AMD or Intel from supporting it.
|
| Why do companies do this? What do they expect to get out of
| creating an API that is proprietary to their specific non-
| monopoly device, and that therefore very obviously nobody
| will ever actually use?
| numlock86 wrote:
| Well, you could always implement a fallback for
| unsupported devices and advertise for better performance
| on supported devices. Implementation will usually be
| "sponsored" by the one providing the technology. If the
| feature then gets adopted enough others will have to come
| up with something similar or support it native, too. This
| happens all the time with GPU and CPU vendors, both in
| gaming and industry.
| derefr wrote:
| Maybe, but I got the sense from the article that what
| NVIDIA has patented are the semantics of the API itself
| (i.e. the types of messages sent between the CPU and
| GPU.) A polyfill for the API might still be infringing,
| given that it would need to expose the same datatypes on
| the CPU side.
|
| And even if it wasn't, the path-shader semantics are _so_
| different from those of regular 2D UI frameworks, that a
| 2D UI framework implemented to use path-shading, that
| falls back to using the polyfill, might be much _worse_
| performing than one implemented using regular CPU-side 2D
| plotting. It would very likely also suffer from issues
| like flicker, judder, etc, which are much worse /more
| noticable than just "increased CPU usage".
| my123 wrote:
| Intel and NVIDIA have a patent cross-licensing agreement
| at the time, which is still ongoing for all patents filed
| on or prior to March 31, 2017.
|
| https://www.sec.gov/Archives/edgar/data/1045810/000119312
| 511...
|
| Patents here are used for mutual destruction in case one
| tries to sue the other, NVIDIA won't use them against AMD
| (and the reverse is true too) when regarding GPUs.
|
| Intel and AMD would however enforce their patents if
| NVIDIA tries to enter the x86 CPU market.
| adwn wrote:
| > _What do they expect to get out of creating an API that
| is proprietary to their specific non-monopoly device, and
| that therefore very obviously nobody will ever actually
| use?_
|
| You mean like Cuda, which is wildly successful, has a
| huge ecosystem, and which basically ensures you'll have
| to buy NVidia if you're serious about GPU computing?
| derefr wrote:
| CUDA is, essentially, a B2B product-feature.
|
| A business is flexible when serving its own needs: if
| they want to do GPGPU computations, they can evaluate the
| alternatives and _choose_ a platform /technology like
| CUDA to build on; and so _choose_ to lock _themselves_
| into the particular GPUs that implement that technology.
| But that's a choice they're only making for themselves.
| They're using those GPUs to compute something; they're
| not forcing anyone _else_ downstream of them to use those
| same GPUs to _consume_ what they produce. The GPUs
| they're using become part of their black box.
|
| Path shaders, on the other hand, would--at least as
| described in the article--be chiefly a B2C / end-user
| product-feature. They'd be something games/apps would
| rely on to draw 2D UI elements. But path shaders would
| need to be _implemented_ + by the developers creating
| those games/apps.
|
| + (Yes, really, they _would_ need to be implemented by
| the game /app developer, not game-engine framework devs.
| Path shaders are a low-level optimization, like Vulkan --
| something that is only advantageous to use when you use
| it directly to take advantage of its low-level semantics,
| rather than through a framework that tries to wrap it in
| some other, more traditional semantics. As with OpenGL,
| the ill-fitted abstractions of the traditional API were
| precisely what made it slow!)
|
| And those game/app developers, unlike the developers
| doing GPGPU computations, have to make a decision based
| on _what devices their customers have_ , rather than what
| _their own company_ can buy. (Or, if they think they have
| a "killer app", they can try to _force_ their customers
| to buy into the platform they built on. But most
| companies who think they've built a B2C killer app,
| haven't.)
|
| For a B2B product-feature, a plurality position is fine.
| Even one forced to be permanent by patents.
|
| For a B2C product-feature, a plurality position _by
| itself_ is fine, because rivals will just ship something
| similar and offer cross-support. But an _indefinite_
| plurality position is death.
|
| Compare, in the web ecosystem:
|
| * Features shipped originally in one renderer as "clean,
| standalone" ideas, e.g. -webkit-border-radius. These were
| cloned (-moz-border-radius, etc.) and then standardized
| (plain old border-radius.)
|
| * Features shipped in one renderer as implementations
| dependent on that renderer's particular
| environment/ecosystem, e.g. IE's ActiveX-based CSS
| "filter" stanza. Because of how they were implemented,
| these had no potential to ever be copied by rivals, so
| _nobody outside of Microsoft bothered to use them on
| their sites_ , since they knew that only IE would render
| the results correctly, and IE at that point already had
| only a plurality position.
| d110af5ccf wrote:
| > What do they expect to get out of ...
|
| Nvidia did exactly this with CUDA, so go take a look at
| the ML world for an example of how it works. It seems to
| be going quite well for them. A common enough refrain is
| "I don't really _want_ to buy Nvidia, but my toolchain
| requires CUDA ".
|
| Pretty much every FPGA and SoC vendor does exactly this
| as far as I understand things. It's why you can't
| trivially run an open OS on most mobile hardware.
|
| Apparently such schemes don't meaningfully affect the
| purchasing decisions of a sufficiently large fraction of
| people to disincentivize the behavior.
| my123 wrote:
| Intel and AMD also did the exact same thing with x86.
|
| But yet, people still use it. (and Arm also has its fair
| share of patents, but is a licensable architecture by
| others)
| Const-me wrote:
| > just nobody uses it
|
| AFAIK Skia and derived works (Chrome, Chromium, Electron,
| etc.), are all using it when available.
| Jasper_ wrote:
| Stencil-and-cover approaches like NV_path_rendering have a
| lot of drawbacks like documented below, but probably biggest
| of all, they're still mostly just doing the tessellation on
| CPU. A lot of the important things, like winding mode
| calculations, are handled on the CPU. Modern research is
| looking for ways out of that.
| Lichtso wrote:
| Actually, they calculate the winding per fragment on the
| GPU [0]. They require polygon tessellation only for
| stroking (which has no winding rule). The downside of their
| approach is that it is memory bandwidth limited, precisely
| because it does the winding on the GPU instead of using CPU
| tessellation to avoid overlap / overdraw.
|
| Curve filling is pretty much solved with implicit curves,
| stencil-and-cover or scanline-intersection-sort approaches
| (all of which can be done in a GPU only fashion). Stroking
| is where things could still improve a lot as it is almost
| always approximated by polygons.
|
| [0]: https://developer.nvidia.com/gpu-accelerated-path-
| rendering
| raphlinus wrote:
| I think the world is going into a more layered approach, where
| it's the job of the driver API (Vulkan etc) to expose the power
| of the hardware fairly directly, and it's the job of higher
| levels to express the actual rendering in terms of those lower
| level primitives. Raytracing is an exception because you do
| need hardware support to do it well.
|
| Whether there's hardware that can make 2D rendering work better
| is an intriguing question. The NV path rendering stuff
| (mentioned elsethread) was an attempt (though I think it may be
| more driver/API than hardware), but I believe my research
| direction is better, in that it will be higher quality, faster,
| and more flexible with respect to the imaging model on standard
| compute shaders than an approach using the NV path rendering
| primitives. Obviously I have to back that up with empirical
| measurements, which is not yet done :)
| moonchild wrote:
| Something else worth looking at: the slug font renderer[0]. Sadly
| it's patented, but the paper[1] is there for those of you in the
| EU.
|
| 0. http://sluglibrary.com/
|
| 1. http://jcgt.org/published/0006/02/02/paper.pdf
| Jasper_ wrote:
| Slug isn't great for lots of shapes since it does the winding
| order scanning per-pixel on the pixel shader. It does have a
| novel quadratic root-finder. Put simply, it's better suited to
| fonts than large vector graphics.
| vg_head wrote:
| I've once implemented the basic idea behind the algorithm
| used in slug(described in the paper [1], though without the
| 'band' optimization, I just wanted to see how it works), and
| I agree with you, the real innovation is in that quadratic
| root-finder. It can tell you whether you are inside or
| outside just by manipulating the three control points of a
| curve, it's very fast, what remains to be done is to use an
| acceleration data structure so that you don't have to check
| for every curve. That works very well for quadratic Bezier
| curves, in the paper it says that it can be easily extended
| to cubics, though no example is provided(and I doubt it's
| trivial). What I think would be hard with Slug's method is
| extending it to draw gradients, shadows, basically general
| vector graphics like you say. Eric Lengyel on his twitter
| showed a demo [2] using Slug to render general vector
| graphics, but I'm not sure of how many features it supports,
| but it definitely supports cubic Bezier curves. I'd also like
| to add that the algorithm didn't impress me with how the text
| looks at small sizes, which I think is very important in
| general, though maybe not so much for games(maybe I just
| didn't implement it correctly).
|
| [1] "GPU-Centered Font Rendering Directly from Glyph
| Outlines" http://jcgt.org/published/0006/02/02/
|
| [2]
| https://twitter.com/EricLengyel/status/1190045334791057408
| korijn wrote:
| Any alternative solutions for the problem of GPU text rendering
| (that are not patent infringing)?
| pvidler wrote:
| You can always render the text to a texture offline as a
| signed distance field and just draw out quads as needed at
| render time. This will always be faster than drawing from the
| curves, and rendering from an SDF (especially multi-channel
| variants) scales surprisingly well if you choose the
| texture/glyph size well.
|
| A little more info:
|
| https://blog.mapbox.com/drawing-text-with-signed-distance-
| fi...
|
| MIT-licensed open-source multi-channel glyph generation:
|
| https://github.com/Chlumsky/msdfgen
|
| The only remaining issue would be the kerning/layout, which
| is admittedly far from simple.
| djmips wrote:
| A signed distance field approach can be good depending on
| what you're after.
| https://github.com/libgdx/libgdx/wiki/Distance-field-fonts
| onion2k wrote:
| There's a great WebGL library for doing that on the web
| using any .ttf, .otf, or .woff font - https://github.com/pr
| otectwise/troika/tree/master/packages/t...
| srgpqt wrote:
| FOSS does not magically circumvent patents.
| orhmeh09 wrote:
| Is there a serious risk of patent enforcement in common
| open source repositories ranging from GitHub to PPAs and
| Linux package repositories located outside any relevant
| jurisdictions?
| korijn wrote:
| Does that imply it's possible to implement 2D font/vector
| graphics rendering on a GPU and end up getting burned by
| patent law? I am having a hard time imagining they were
| awarded such a generic patent.
|
| Anyway, I will adjust my question based on your feedback.
| Lichtso wrote:
| In 2005 Loop & Blinn [0] found a method to decide if a sample /
| pixel is inside or outside a bezier curve (independently of
| other samples, thus possible in a fragment shader) using only a
| few multiplications and one subtraction per sample.
| - Integral quadratic curve: One multiplication -
| Rational quadratic curve: Two multiplications -
| Integral cubic curve: Three multiplications - Rational
| cubic curve: Four multiplications
|
| [0] https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
| vg_head wrote:
| It's referenced in slug's algorithm description paper [1],
| the main disadvantage with Loop-Blinn is the triangulation
| step that is required, and at small text sizes you lose a bit
| of performance. Slug only needs to render a quad for each
| glyph. That is not to say that any one method is better than
| the other though! They both have advantages and
| disadvantages. I think the two most advanced techniques for
| rendering vector graphics on the GPU are "Massively Parallel
| Vector Graphics" [2] and "Efficient GPU Path Rendering Using
| Scanline Rasterization" [3]. Though I don't know of any well
| known usage of them. Maybe it's because it's very hard to
| implement them, the sources attached to them are not trivial
| to understand, even if you've read the papers. They also use
| OpenCL/Cuda if I remember correctly.
|
| [1] "GPU-Centered Font Rendering Directly from Glyph
| Outlines" http://jcgt.org/published/0006/02/02/
|
| [2] http://w3.impa.br/~diego/projects/GanEtAl14/
|
| [3] http://kunzhou.net/zjugaps/pathrendering/
|
| EDIT: I've only now seen that [2] and [3] are already
| mentioned in the article
|
| EDIT2: To compensate for my ignorance, I will add that one of
| the authors of MPVG has a course on rendering vector
| graphics: http://w3.impa.br/~diego/teaching/vg/
| Lichtso wrote:
| If I understand correctly the second link is basically an
| extension of Loop-Blinns implicit curve approach with
| vector textures in order to find the winding counter for
| each fragment in one pass.
|
| >> Slug only needs to render a quad for each glyph.
|
| I don't know how many glyphs you want to render (to the
| point that there are so many that you can't read them
| anymore), but a modern GPU s are heavily optimized for
| triangle throughput. So 2 or 20 triangles per glyph makes
| only a little difference. The bigger problem is usually the
| sample fill rate and memory bandwidth (especially if you
| have to write to pixels more than once).
|
| I have been eying the scanline-intersection-sort approach
| (your third link) too. Sadly they have no answer to path
| stroking (same as everybody else) and it also requires an
| efficient sorting algorithm for the GPU (implementations of
| such are hard to come by outside of CUDA, as you
| mentioned).
| vg_head wrote:
| Indeed, most techniques that target the GPU have no
| response to stroking, they recommend generating paths
| beforehand so that it looks like it's stroked.
|
| And yes, the number of triangles doesn't really make a
| difference in general, but in Slug's paper they say:
|
| "At small font sizes, these triangles can become very
| tiny and decrease thread group occupancy on the GPU,
| reducing performance"
|
| I'm not experienced enough to say how true that is/how
| much of a difference it makes.
|
| > If I understand correctly the second link is basically
| an extension of Loop-Blinns implicit curve approach with
| vector textures in order to find the winding counter for
| each fragment in one pass.
|
| I've read the paper, but to be honest it's a bit over my
| head right now, but AFAIK MPVG is an extension to this
| [1], which looks like it's an extension to Loop-Blinn
| itself, so I think you're right.
|
| [1] "Random-Access Rendering of General Vector Graphics"
| http://hhoppe.com/ravg.pdf
| skohan wrote:
| Is this approach novel? For instance is Apple's approach to
| native UI rendering doing UI rendering on the CPU, or using a 3D
| renderer?
| stephen_g wrote:
| The author mentions PathFinder, the GPU font renderer from
| Servo a lot, so there do seem to be existing systems that do
| things in that way.
|
| I'm not 100% sure though about Apple and other's approach
| though - definitely when compositing desktop environments were
| new, the way it was done was to software render UI elements
| into textures, and then use the GPU to composite it all
| together. I assume more is being done on the GPU now but it may
| not actually be all that performance critical for regular UIs
| (he talks about things like CAD which are more performance
| sensitive).
| Animats wrote:
| He's describing roughly the feature set of Flash, which is a
| system for efficiently putting 2D objects on top of other 2D
| objects.
| skohan wrote:
| No I mean is the approach of doing this on the GPU actually
| novel
| Animats wrote:
| Games often use the GPU for their 2D elements. It's
| inefficient to do a window system that way, because you
| have to update on every frame, but if you're updating the
| whole window on every frame anyway, it doesn't add cost. As
| the original poster points out, it does run down the
| battery vs. a "window damage" approach.
| CyberRabbi wrote:
| Depends on what you mean by novel. No other "mainstream" API
| that implements the traditional 2D imaging model popularized by
| Warnock et al. with PostScript is implemented this way, except
| for Pathfinder. Apple does all 2D drawing operations on the CPU
| and composites distinct layers using the GPU. This does a lot
| more work on the GPU.
| garethrowlands wrote:
| What about Direct2D? Surely Windows counts as mainstream? The
| docs are from 2018, https://docs.microsoft.com/en-
| us/windows/win32/direct2d/comp...
| CyberRabbi wrote:
| > Rendering method
|
| > In order to maintain compatibility, GDI performs a large
| part of its rendering to aperture memory using the CPU. In
| contrast, Direct2D translates its APIs calls into Direct3D
| primitives and drawing operations. The result is then
| rendered on the GPU. Some of GDI?s rendering is performed
| on the GPU when the aperture memory is copied to the video
| memory surface representing the GDI window.
|
| I can't say for certain but I think the main point being
| communicated here is that Direct2D uses 3D driver
| interfaces to get its pixels on the screen. Not necessarily
| that it renders the image using the GPU. I could be wrong.
| The_rationalist wrote:
| What about skia?
| Jasper_ wrote:
| Skia renders paths on the CPU. There was a prototype of a
| GPU based approach called skia-compute but it was removed a
| few years ago. I believe some parts of skia can use SDFs
| for font rendering, but that's only really accurate at
| small sizes.
| raphlinus wrote:
| The skia-compute project is now Spinel, and is under
| Fuchsia. It is very interesting, perhaps the fastest way
| to render vector paths on the GPU, but the code is almost
| completely inscrutable, and it has lots of tuning
| parameters for specific GPU hardware, so porting is a
| challenge.
|
| Skia has a requirement that rendering of paths cannot
| suffer from conflation artifacts (though compositing
| different paths can), as they don't want to regress on
| any existing web (SVG, canvas) content. That's made it
| difficult to move away from their existing software
| renderer which is highly optimized and deals with this
| well. Needless to say, I consider that an interesting
| challenge.
| The_rationalist wrote:
| Wow the codebase looks quite small for such ambitions!
| Jasper_ wrote:
| Apple has an incredibly fast software 2D renderer (Quartz 2D),
| and limited GPU 2D renderer and compositor (Quartz Compositor).
| Doing PostScript rendering on the GPU is still an active
| research project. And Raph is doing some of that research!
| bumbada wrote:
| That was in the past.
|
| Apple has one of the best systems for drawing 2D and it is
| accelerated on the GPU. It is used by Apple Maps and it made
| the Offline maps of it much better than Google's.
|
| But Apple is using trade secrets here, they are not
| publishing it so everyone could copy it.
| Pulcinella wrote:
| Do you have a source for this? I would like to no more.
| klaussilveira wrote:
| Loosely related: Blend2D has been innovating a lot on this space.
|
| https://blend2d.com/
| Const-me wrote:
| The quality is not good, and the performance is not even
| mentioned.
|
| I have used different approach: https://github.com/Const-
| me/Vrmac#vector-graphics-engine
|
| My version is cross-platform, tested with Direct3D 12 and GLES
| 3.1.
|
| My version does not view GPUs as SIMD CPU, it actually views them
| as a GPU.
|
| When rendering a square without anti-aliasing, the library will
| render 2 triangles. When rendering a filled square with anti-
| aliasing, the library will render about 10 triangles, large
| opaque square in the center, and thin border about 1 pixel thick
| around it for AA.
|
| It uses hardware Z buffer with early Z rejection to save pixel
| shaders and fill rate. It uses screen-space derivatives in the
| pixel shader for anti-aliasing. It renders arbitrarily complex 2D
| scenes with only two draw calls, one front to back with opaque
| stuff, another back to front with translucent stuff. It does not
| have quality issues with stroked lines much thinner than 1px.
| fho wrote:
| Dumb question ... but isn't the easiest AA method rendering at
| a higher resolution and downsampling the result?
|
| I see that it's not feasible for a lot of complex 3D graphics,
| but 2D is (probably) a lot less taxing for modern GPUs?
| dahart wrote:
| High quality supersampling is about equally hard in 2D and
| 3D, since the result is 2D. It is also the most common
| solution in both 2D and 3D graphics, so your instinct is
| reasonably good.
|
| But, font and path rendering, for example in web browsers
| and/or with PDF or SVG - these things can benefit in both
| efficiency and in quality from using analytic antialiasing
| methods. 2D vector rendering is a place where doing something
| harder has real payoffs.
|
| Just a fun aside - not all supersampling algorithms are
| equally good. If you use the wrong filter, it's can be very
| surprising to discover that there are ways you can take a
| million samples per pixel or more and never succeed in
| getting rid of aliasing artifacts. (An example is if you just
| average the samples, aka use a Box Filter.) I have a 2D
| digital art project to render mathematical functions that can
| have arbitrarily high frequencies. I spent money making large
| format prints of them, so I care a lot about getting rid of
| aliasing problems. I've ended up with a Gaussian filter,
| which is a tad blurrier than experts tend to like, because
| everything else ends up giving me visible aliasing somewhere.
| [deleted]
| ishitatsuyuki wrote:
| piet-gpu contributor here. You're right that supersampling is
| the easiest way to achieve AA. However, its scalability
| issues are immense: for 2D rendering it's typically
| recommended to use 32x for a decent quality AA, but as the
| cost of supersampling scales linearly (actually superlinearly
| due to memory/register pressure), it becomes more than a
| magnitude slower than the baseline. So if you want to do
| anything that is real-time (e.g. smooth page zoom without
| resorting to prerendered textures which becomes blurry),
| supersampling is mostly an unacceptable choice.
|
| What is more practical is some form of adaptive
| supersampling: a lot of pixels are filled by only one path
| and don't require supersampling. There's also some more
| heuristics that can be used: one that I want to try out in
| piet-gpu is to exploit the fact that in 2D graphics, most
| pixels are only covered by at most two paths. So as a base
| line we can track only two values per pixel plus a coverage
| mask, then in rare occasions of three or more shapes
| overlapping, fall back to full supersampling. This should
| keep the cost amplification more under control.
| Const-me wrote:
| The short answer -- it depends.
|
| What you described is called super-sampling. Supersampling is
| indeed not terribly hard to implement, the problem is
| performance overhead. Many parts of graphics pipeline scale
| linearly with count of pixels. If you render at 16x16
| upscaled resolution, that gonna result in 256x more pixel
| shader invocations, and 256x the fill rate.
|
| There's a good middle ground called MSAA
| https://en.wikipedia.org/wiki/Multisample_anti-aliasing In
| practice, 16x MSAA often delivers very good results for both
| 3D and 2D. In case of 2D, even low-end PC GPUs are fast
| enough with 8x or 16x MSAA level.
|
| Initial versions of my library used that method.
|
| The problem was, Raspberry Pi 4 GPU is way slower than PC
| GPUs. The performance with 4x or 2x MSAA was too low for
| 1920x1080 resolution, even just for 2D. Maybe the problem is
| actual hardware, maybe it's a performance bug in Linux kernel
| or GPU drivers, I have no idea. I didn't want to mess with
| the kernel, I wanted a library that works fast on officially
| supported 32-bit Debian Linux. That's why I bothered to
| implement my own method for antialiasing.
| esperent wrote:
| > Maybe the problem is actual hardware
|
| I think it is - as far as I know most modern GPUd implement
| MSAA at the hardware level, and that's why even a mobile
| GPU can handle. 8x MSAA at 1080p.
|
| I don't know anything about the Raspberry Pi GPU, but maybe
| you'd have better results switching to FXAA or SMAA there
| (by which I mean faster, not visually better).
| Const-me wrote:
| > maybe you'd have better results switching to FXAA or
| SMAA there
|
| I've thought about that, but decided I want to prioritize
| the quality. 16x MSAA was almost perfect in my tests, 8x
| MSAA was still good. With levels lower than that, the
| output quality was noticeably worse than Direct2D, which
| was my baseline.
|
| And another thing. I have found out stroked lines much
| thinner than 1px after transforms need special handling,
| otherwise they don't look good: too aliased, too thick,
| and/or causing temporal artifacts while zooming
| continuously. Some real-world vector art, like that
| ghostscript's tiger from Wikipedia, has quite a lot of
| such lines.
| skohan wrote:
| Does your approach allow for quality optimizations like sub
| pixel rendering for arbitrary curves? It seems like this is
| what is interesting about this approach.
|
| Also in terms of "two draw calls", does that include drawing
| text as part of your transparent pass, or are you assuming that
| your window contents are already rendered to textures?
| Const-me wrote:
| > Does your approach allow for quality optimizations like sub
| pixel rendering for arbitrary curves?
|
| The library only does grayscale AA for vector graphics.
|
| Subpixel rendering implemented for text but comes with
| limitations: only works when text is not transformed (or
| transformed in specific ways, like rotated 180deg, or
| horizontally flipped), and you need to pass background color
| behind the text to the API. Will only look good if the text
| is on solid color background, or on slow-changing gradient.
|
| Sub pixel AA is hard for arbitrary backgrounds. I'm not sure
| many GPUs support the required blend states, and workarounds
| are slow.
|
| > does that include drawing text as part of your transparent
| pass
|
| Yes, that includes text and bitmaps. Here's two HLSL shaders
| who do that, the GPU abstraction library I've picked
| recompiles the HLSL into GLSL or others on the fly:
| https://github.com/Const-
| me/Vrmac/blob/master/Vrmac/Draw/Sha...
| https://github.com/Const-
| me/Vrmac/blob/master/Vrmac/Draw/Sha... These shaders are
| compiled multiple times with different set of preprocessor
| macros, but I did test with all of them enabled at once.
| glowcoil wrote:
| > The quality is not good, and the performance is not even
| mentioned.
|
| I notice that your renderer doesn't even attempt to render text
| on the GPU and instead just blits glyphs from an atlas texture
| rendered with Freetype on the CPU: https://github.com/Const-
| me/Vrmac/blob/master/Vrmac/Draw/Sha...
|
| In contrast, piet-gpu (the subject of the original blog post)
| has high enough path rendering quality (and performance) to
| render glyphs purely on the GPU. This makes it clear you didn't
| even perform a cursory investigation of the project before
| making a comment to dump on it and promote your own library.
| Const-me wrote:
| > instead just blits glyphs from an atlas texture rendered
| with Freetype on the CPU
|
| Correct.
|
| > has high enough path rendering quality (and performance) to
| render glyphs purely on the GPU
|
| Do you have screenshots showing quality, and performance
| measures showing speed? Ideally from Raspberry Pi 4?
|
| > This makes it clear you didn't even perform a cursory
| investigation of the project
|
| I did, and mentioned in the docs, here's a quote: "I didn't
| want to experiment with GPU-based splines. AFAIK the research
| is not there just yet." Verifiable because version control:
| https://github.com/Const-
| me/Vrmac/blob/bbe83b9722dcb080f1aed...
|
| For text, I think bitmaps are better than splines. I can see
| how splines are cool from a naive programmer's perspective,
| but practically speaking they are not good enough for the
| job.
|
| Vector fonts are not resolution independent because hinting.
| Fonts include a bytecode of compiled programs who do that.
| GPUs are massively parallel vector chips, not a good fit to
| interpret byte code of a traditional programming language.
| This means whatever splines you gonna upload to GPU will only
| contain a single size of the font, trying to reuse for
| different resolution will cause artifacts.
|
| Glyphs are small and contain lots of curves. Lots of data to
| store, and lots of math to render, for comparatively small
| count of output pixels. Copying bitmaps is very fast, modern
| GPUs, even low-power mobile and embedded ones, are designed
| to output ridiculous volume of textured triangles per second.
| Font face and size are more or less consistent within a given
| document/page/screen. Apart from synthetic tests, glyphs are
| reused a lot, and there're not too many of them.
|
| When I started the project, the very first support of compute
| shaders on Pi 4 was just introduced in the Mesa upstream
| repo. Was not yet in the official OS images. Bugs are very
| likely in versions 1.0 of anything at all.
|
| Finally, even if Pi 4 had awesome support for compute shaders
| back them, the raw compute power of the GPU is not that
| impressive. Here in my Windows PC, my GPU is 30 times faster
| than CPU in terms of raw FP32 performance. With that kind of
| performance gap, you can probably make GPU splines work fast
| enough, after spending enough time on development. Meanwhile,
| on Pi 4 there's no difference, the quad-core CPU has raw
| performance pretty close to the raw performance of the GPU.
| To lesser extent same applies to low-end PCs: I only have a
| fast GPU because I'm a graphics programmer, many people are
| happy with their Intel UHD graphics, these are not
| necessarily faster than CPUs.
| davrosthedalek wrote:
| If you render two anti-aliased boxes next to each other (i.e.
| they share an edge), will you see a thin line there, or a solid
| fill? Last time I checked, cairo-based PDF readers get this
| wrong, for example.
| Const-me wrote:
| Good question.
|
| If you use the fillRectangle method https://github.com/Const-
| me/Vrmac/blob/1.2/Vrmac/Draw/iDrawC... to draw them I think
| you should get solid fill. That particular API doesn't use AA
| for that shape. Modern GPU hardware with their rasterization
| rules https://docs.microsoft.com/en-
| us/windows/win32/direct3d11/d3... is good at that use case,
| otherwise 3D meshes would contain holes between triangles.
|
| If you render them as 2 distinct paths, filled, not stroked,
| and anti-aliased, you indeed gonna see a thin line between
| them. Currently, my AA method shrinks filled paths by about
| 0.5 pixels. For stroked paths it's the opposite BTW, the
| output is inflated by half of the stroke width (the mid.point
| of the strokes correspond to the source geometry).
|
| You can merge boxes into a single path with 2 figures
| https://github.com/Const-
| me/Vrmac/blob/1.2/Vrmac/Draw/Path/P..., in this case C++ code
| of the library should collapse the redundant inner edge and
| the output will be identical to a single box, i.e. solid
| fill. Will also render slightly faster because less triangles
| in the mesh.
| slmjkdbtl wrote:
| always appreciate raph's work on rendering and UI programming,
| but want to ask a somewhat unrelated question to this post: does
| anyone have a lot of experience doing 2d graphics on the cpu? i
| wonder if there'll be a day we're confident doing all 2d stuff on
| the cpu since cpus are much easier to work with and have much
| more control, i also read some old 3d games are also using
| software rendering and did well on old hardwares, that gave me a
| lot of confidence in software render every (lowrez) thing
| bumbada wrote:
| I have lots of experience, created several font renderers in
| the CPU and GPU.
|
| No, doing CPU drawing is too inefficient. Without too much
| trouble you get 50x more efficiency in the GPU, that is, you
| can draw 50frames per every frame in the CPU, using the same
| amount of energy and time.
|
| If you go deeper, low level with Vulkan-Metal and specially
| control the GPU memory, you can get 200x, being way harder to
| program.
|
| CPU drawing is very useful for testing: You create the
| reference you compare the GPU stuff with.
|
| CPU drawing is the past, not the future.
| slmjkdbtl wrote:
| Thanks for the write up. Yeah i can see the huge performance
| difference, one thing that GPU bothers me is now every vendor
| provides a different set of API, you kinda have to use a
| hardware abstraction layer to use the GPU if you really want
| cross platform, and that's often a huge effort or dependency
| and hard to do right, even in OpenGL days it's easy because
| you only have to deal with one API instead of three
| vulkan/metal/d3d. With CPU if ignore the performance lack
| it's just a plain pixel array that can be easily displayed on
| any environment and you have control over any bits of it, I
| just can't get over the lightness and elegance difference
| between the two..
| bumbada wrote:
| Vulkan/metal/d3d give you control to do things you could
| not do with OpenGL, and they are very similar. All my code
| works first in Vulkan and Metal, but d3d is not hard when
| you have your vulkan code working.
|
| OpenGL was designed by committee and politics(like not
| giving you the option to compile shaders for a long time
| while d3d could do it).
|
| The hard part is thinking in term extreme parallelism. That
| is super hard.
|
| Once you have something working in the GPU, you can
| translate it to electronics like using FPGAs.
|
| The difference in efficiency is so big, that most GPU
| approaches do not really work. They are really
| approximations that fail with certain glyphs. Designers
| create a model that works with most glyphs, and sometimes
| they have a fallback, inefficient method for them.
|
| With the CPU you can calculate the area under a pixel
| exactly.
| fulafel wrote:
| The future is now: on many systems there's fairly little GPU
| acceleration going on when runing eg a web browser and things
| work fine.
| Jasper_ wrote:
| Yes? We know how to write scanline renderers for 2D graphics.
| They're not that hard, a simple one can be done in ~100 lines
| of code or so, see my article here
| https://magcius.github.io/xplain/article/rast1.html
| slmjkdbtl wrote:
| Tanks for the amazing article! I wonder if you met any
| performance annoyances / bottlenecks when doing actual GUI /
| game dev with this?
| iainmerrick wrote:
| You just need to render your font on the CPU once, and
| upload mipmaps to the GPU (either signed distance fields,
| or just sharpened textures, that works absolutely fine
| too).
|
| I think all this GPU font rendering stuff is a bit of a red
| herring. Are any popular apps or games actually making
| heavy use of it?
| olau wrote:
| Enjoyable article, thanks!
| choxi wrote:
| I spent a couple years learning graphics programming to build an
| iPad app for creating subtle animated effects. The idea was kind
| of like Procreate but if you had "animated brushes" that produced
| glimmer or other types of looping animated effects.
|
| From what I've read, the technique behind most digital brushes is
| to render overlapping "stamps" over the stroke. They're spaced
| closely enough that you can't actually see the stamps.
|
| But if you want to animate the stamps, you either have to store
| the stroke data as a very large sequence of stamp meshes or you
| can only work with the data in a raster format. The former is way
| too many meshes even with instancing, and the latter loses a lot
| of useful information about the stroke. Imagine you wanted to
| create a brush where the edges of the stroke kind of pulsate like
| a laser beam, you ideally want to store that stroke data in a
| vector format to make it easier to identify e.g. centers and
| edges.
|
| But it turned out to be too challenging for me to figure out how
| to 1) build a vector representation of a stroke/path without
| losing some of the control over detail you get with the stamping
| technique and 2) efficiently render those vectors on the GPU.
|
| I'm not sure if this would help with the issues I ran into, but
| I'm definitely excited to see some focus on 2D rendering
| improvements!
| ink_13 wrote:
| This would have been a lot better with examples in the form of
| rendered images or perhaps even a video. Maybe it's just my lack
| of background in graphics, but I had a lot of trouble grasping
| what the author was attempting to communicate without a concrete
| example.
| eternalban wrote:
| It's about taking known 2D graphics and UI approaches which
| were developed for CPUs and looking at effective rendering
| engine architectures doing the same using GPUs. Terms such as
| "scene graph", "retained mode UI", etc. are those existing 2d
| graphics matter.
|
| So the approach, afaiu, is a data layout for the scene graph
| that basically is the more domain general concern of mapping
| graph e.g. Linked List, datastructures (that are CPU friendly)
| to array forms (GPU friendly) suitable for parallel treatment.
| Other GPU concerns, such as minimizing global traffic by local
| caching, and mapping thread groups to tiles. I found the idea
| of having the scene graph resident in GPU to be interesting.
|
| (note to author: "serialization" comes from networking roots of
| serializing a data structure for transmision over the net. So,
| definitely serial. /g)
| MattRix wrote:
| It wouldn't have been better, it just would have been more high
| level and generalized, but I don't think that's what the author
| was going for. I found the amount of detail refreshing, and as
| someone about to make a GPU based 2D display tree renderer, it
| was written at just the right level to be quite useful.
| Agentlien wrote:
| I understand where you are coming from. There is a lot of
| jargon and it assumes familiarity with many concepts. I think
| any explanatory images which would help someone unfamiliar with
| the field would need to be accompanied by quite a bit of
| explanation.
|
| One thing which I think made reading this a bit more work than
| necessary is that it feels like it's prattling on about a lot
| of tangential details and never quite gets to the point.
|
| _edit: a prime example of an unnecessary aside is mentioning
| the warp /wavefront/subgroup terminology. I feel anyone in the
| target audience should know this already and it's not really
| relevant to what's being explained._
| skohan wrote:
| It doesn't seem to be a finished work. I guess this is more
| of a journal entry on the author's initial takeaways from a
| week-long coding retreat.
___________________________________________________________________
(page generated 2021-03-15 23:02 UTC)