[HN Gopher] Making Your Game Go Fast by Asking Windows Nicely
       ___________________________________________________________________
        
       Making Your Game Go Fast by Asking Windows Nicely
        
       Author : zdw
       Score  : 113 points
       Date   : 2022-01-16 18:21 UTC (1 days ago)
        
 (HTM) web link (www.anthropicstudios.com)
 (TXT) w3m dump (www.anthropicstudios.com)
        
       | Const-me wrote:
       | About switchable graphics, nVidia APIs do work. The problem with
       | them, there's no API to switch to the faster GPU, they only have
       | APIs to setup a profile for an application, ask for the faster
       | GPU in that profile, and the changes will be applied next time
       | the app launches.
       | 
       | I had to do that couple times for Direct3D 11 or 12 apps with
       | frontend written in WPF. Microsoft doesn't support exporting
       | DWORD variables from .NET executables.
       | 
       | Technical info there: https://stackoverflow.com/a/40915100
        
         | masonremaley wrote:
         | It's possible I'm misunderstanding the docs, but here's the
         | line that lead me to believe linking to one of their libraries
         | alone would be enough (and lead to my surprise when it didn't
         | work):
         | 
         | (https://docs.nvidia.com/gameworks/content/technologies/deskt..
         | .)
         | 
         | > For any application without an existing application profile,
         | there is a set of libraries which, when statically linked to a
         | given application executable, will direct the Optimus driver to
         | render the application using High Performance Graphics. As of
         | Release 302, the current list of libraries are vcamp110.dll,
         | vcamp110d.dll, nvapi.dll, nvapi64.dll, opencl.dll, nvcuda.dll,
         | and cudart _._.
        
           | Const-me wrote:
           | Can it be that you linked to one of these libraries, but
           | never called any function from that DLL, so your linker
           | dropped the unused DLL dependency?
           | 
           | However, I don't really like that method. The app will fail
           | to launch on computers without nVidia drivers, complaining
           | about the missing DLL. For languages like C++ or Rust, the
           | exported DWORD variable is the best way to go. The only
           | reason I bothered with custom installer actions, that method
           | wasn't available.
        
             | masonremaley wrote:
             | Hmm. I _think_ I tried calling into their API to rule that
             | out--but, it 's been a while, so it's 100% possible I
             | remember incorrectly which would explain why it didn't
             | work!
        
       | shawnz wrote:
       | > This isn't often relevant for games, but, if you need to check
       | how much things would have been scaled if you weren't DPI aware,
       | you can call GetDpiForWindow and divide the result by 96.
       | 
       | If you aren't scaling up text and UI elements based on the DPI
       | then it doesn't really sound like your application is truly DPI
       | aware to me. I don't see why that applies any differently to
       | games versus any other kind of application.
        
         | ziml77 wrote:
         | Games should either be aware of the user's preferred scaling or
         | at least offer their own UI scaling option. But they should
         | always register as DPI aware so they don't render the 3D scene
         | at a lower resolution than what's selected
        
         | [deleted]
        
         | jeroenhd wrote:
         | Unless the game engine is doing its own scaling, this does
         | sound like lying to the operating system to get out of the way
         | of those pesky user-friendly features to get more frames.
         | 
         | I think Microsoft made it this hard to enable the DPI-aware
         | setting exactly because it forces developers to think about
         | things like DPI. If everyone follows this guide and ignores it,
         | then I predict that in a few years this setting will be ignored
         | as well and a new DPI-awareness API will be released.
        
         | makomk wrote:
         | I think it's reasonably common for games to scale their text
         | and UI elements by the overall screen or window size, in which
         | case opting out of clever OS DPI tricks is the right choice.
         | Using actual DPI doesn't make much sense in general - the
         | player could be sitting right in front of their laptop screen
         | or feet away from a big TV, which obviously require very
         | diffent font sizes in real world units.
        
           | shawnz wrote:
           | But in those cases you'd expect the user to manually adjust
           | their scaling settings, which wouldn't be respected if
           | following the author's advice here.
        
           | masonremaley wrote:
           | Yup, you hit the nail on the head (author of the article
           | here). I guess I could've clarified that, I didn't expect
           | people to assume I was advocating against scaling your UIs to
           | fit the user's screen! Many games scale to fit the window by
           | default, and even offer additional controls on top of that.
        
             | shawnz wrote:
             | It's not as simple as just scaling the UI to the size of
             | the screen though, because the UI elements should be
             | _bigger_ at the same screen size if the scaling is higher.
             | That 's why, like you mention in the article, you'll be
             | able to tell when the setting has been changed simply by
             | looking at the scale of the UI: because it will be wrongly
             | too small once the setting is activated.
        
               | masonremaley wrote:
               | Yup! I'm aware of what DPI scale is for, I use it when I
               | write game tools. I don't use it in game, though--that's
               | an intentional tradeoff I'm making. It seems like a
               | pretty common tradeoff for games though!
               | 
               | If you want to see why, try mocking up a typical shooter
               | HUD. Now try scaling up/down all the elements by 50% and
               | see what happens. Feel free to play with the anchoring,
               | etc. Chances are you're not gonna like what you see!
               | Things get even more complicated when you consider that
               | players with controllers often change their view distance
               | when gaming and don't wanna reconfigure their display all
               | the time.
               | 
               | The typical solution is to fix the UI scale to the window
               | size, and keep the text large enough that it's readable
               | at a large range of DPIs and viewing distances. If you
               | can't get 100% there that way you'll typically add an in-
               | game UI scale option. (The key difference between that
               | and the built in UI scaling in Windows being that it's
               | specific to the game, so you'll set it to something
               | milder than you'd set the Windows option, and it will
               | only affect the game so you don't have to keep changing
               | it back and forth.)
               | 
               | [EDIT] I think I came up with a way to explain this that
               | saves you the trouble of drawing it out yourself. The
               | fundamental issue, view distance changes aside, is that
               | games are balancing a third variable most apps don't have
               | to: how much of the background--the actual game--is the
               | UI occluding?
        
       | TimTheTinker wrote:
       | > by linking with PowrProf.dll, and then calling this function
       | from powersetting.h as follows
       | 
       | > This function is part of User32.lib, and is defined in
       | winuser.h which is included in Windows.h.
       | 
       | This is one reason I think Windows is such a mess of an OS. (Look
       | at the contents of C:\Windows and tell me it's not, if you can do
       | so with a straight face!)
       | 
       | To make what ought to be a system call you have to load some DLL,
       | sys, or lib file at a random (but fixed) path and call a function
       | on it.
       | 
       | That combined with COM, and the registry, and I don't want to
       | touch it with a ten-foot pole.
        
         | Someone wrote:
         | > To make what ought to be a system call you have to load some
         | DLL, sys, or lib file at a random (but fixed) path and call a
         | function on it.
         | 
         |  _"Ought to be a system call"_ is a matter of opinion. Among
         | OSes, Linux is an outlier in that it keeps its system call
         | interface stable.
         | 
         | Many other OSes choose to provide a library with a stable
         | interface through which system calls can (and, in some cases
         | must. See https://lwn.net/Articles/806776/; discussed in
         | https://news.ycombinator.com/item?id=21859612) be called. That
         | allows them to change the system call ABI, for example to
         | retire calls that have been superseded by other ones.
         | 
         | (ideally, IMO, that library should not be the C library. There
         | should be two libraries, a "Kernel interface library" and a "C
         | library". That's a different subject, though)
        
       | andriosusanto wrote:
        
       | bobbyi wrote:
       | ASSERT(SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MO
       | NITOR_AWARE_V2));
       | 
       | If ASSERT is a no-op in release mode then you're only getting
       | your setting set here while in debug mode
        
         | masonremaley wrote:
         | It's not, in my codebase, but I'll edit that when I have the
         | chance so nobody blindly copy pastes it and ends up with
         | something super broken
        
       | sdflhasjd wrote:
       | PowerSetActiveScheme sets the system power plan, it's not
       | something a game should be doing without telling the user first.
        
         | usbqk wrote:
         | Well said.
         | 
         | https://devblogs.microsoft.com/oldnewthing/20081211-00/?p=19...
        
           | CamperBob2 wrote:
           | Translation: "We didn't provide the necessary API support, so
           | now we're going to whine about ad-hoc brute force solutions
           | that developers would never have had to resort to if we'd
           | done our jobs."
           | 
           | Why isn't there a function I can call that enforces full CPU
           | power, but only while my application is running? I never
           | _wanted_ to change global system-level settings, but if that
           | 's the only affordance provided by the Win32 API, then so be
           | it.
        
             | londons_explore wrote:
             | Perhaps if the user has a slow old CPU we could also order
             | them a new one on Amazon for use only while the game is
             | running too...
        
             | detaro wrote:
             | Because you're generally not supposed to overwrite the
             | users performance settings temporarily either?
        
               | CamperBob2 wrote:
               | It would be nice if it were that simple. Unfortunately,
               | power settings under Windows are incredibly (and
               | unnecessarily) complex, and I doubt that one in twenty
               | users even knows the options are available. Worse, the
               | Windows power settings tend to revert magically to
               | "energy saving" mode under various unspecified
               | conditions. This phenomenon almost cost me an expensive
               | session at an EMC test lab once, when data acquisition on
               | the device under test repeatedly timed out due to CPU
               | starvation.
               | 
               | It's entirely reasonable for performance-critical
               | applications (not just games!) to be able to request
               | maximum available performance from the hardware without
               | resorting to stupid tricks like the measures described in
               | this story, launching threads that do nothing but run
               | endless loops, and so forth.
               | 
               | I do agree with those who point out that this should be a
               | user-controlled option. On the application side, this
               | could be as simple as a checkbox labeled "Enable maximum
               | performance while running" or something similar. Ideally,
               | the OS would then switch back to the system-level
               | performance setting when the application terminates,
               | rather than leaving it up to the application to do the
               | right thing and restore it explicitly.
        
               | johncolanduoni wrote:
               | Sometimes those are the user's performance settings, but
               | more often the user has no idea what these performance
               | settings you speak of are and they just don't want to see
               | your game stutter. It would be nice to be able to
               | distinguish these cases, and this user would love if
               | games could temporarily disable aggressive power saving
               | automatically when I'm running a game and put it back the
               | rest of the time.
        
             | protastus wrote:
             | Alternative translation: "Our documentation is weak and our
             | engineering teams aren't held accountable for it, so we're
             | blaming third-party developers instead of doing our jobs".
        
         | classichasclass wrote:
         | Interestingly, Garage Band on my G5 kicks power management to
         | highest performance without asking, though it turns it back
         | down when it quits. Guess Apple didn't have a problem with it.
        
         | masonremaley wrote:
         | That's a good point--I'll look into whether Microsoft has any
         | guidelines on this, and add a disclaimer to the article when I
         | get a chance.
        
         | discreditable wrote:
         | I've had games do this and found it annoying since I like my PC
         | to run in balanced mode. Not so much to save power but to let
         | the machine idle when I'm not using it. Found I could work
         | around it by deleting the power plans other than balanced.
         | 
         | I've never played OP's game, so evidently a few games are out
         | there doing this.
        
         | connordoner wrote:
         | Yeah, this feels like really bad UX.
        
       | ahelwer wrote:
       | I used to work at a high-performance scientific computing
       | company. In the mid-2000s they ran into a weird issue where
       | performance would crater on customer PCs running windows, unless
       | that PC were currently running Windows Media Player. Something to
       | do with process scheduling priority. Don't know whether this was
       | a widely-disseminated old hand trick of the era or anything.
        
         | bee_rider wrote:
         | It is astonishing to me that someone would want to use Windows
         | for something HPC related. I'm not generally a Windows hater
         | (actually I am, but I see that there are legitimate business
         | reasons to use it), but the HPC ecosystem seems much more
         | Linux-friendly.
        
           | aldebran wrote:
           | The windows team works closely with hardware makers to
           | support new and upcoming specialized hardware. This enables
           | hardware makers to focus on hardware bring up and not worry
           | about OS support.
           | 
           | There are many technologies that work first or better until a
           | certain time on Windows. For example (not necessarily HPC
           | related) SMR drive support.
        
             | bee_rider wrote:
             | I definitely agree that if I had to get some random device
             | working, Windows is probably a good first OS to try. But
             | since Linux has such a large supercomputer/cluster/cloud
             | presence, the situation is sort of flipped for HPC. At
             | least as far as I've seen -- most numerical codes seem to
             | target the Unix-verse first, and the only weird drivers you
             | need are the GPU drivers (actually I haven't tried much
             | GPGPU out, but I believe the Linux NVIDIA GPGPU drivers
             | aren't the same horrorshow that their desktop counterparts
             | are).
        
               | pjmlp wrote:
               | Speaking for the time I used to be at CERN, while the
               | cluster is fully UNIX based, there is a big crowd of
               | researchers running visualisation software, and other
               | research related work on Windows e.g.
               | Matlab/Tableau/Excel, and nowadays I assume macOS as well
               | (it was early days for it 20 years ago).
        
               | bee_rider wrote:
               | I was thinking more of the number crunching bits, rather
               | than visualization, since the original issue was around
               | performance. But I guess visualization can be
               | computationally cruncy too.
        
           | ahelwer wrote:
           | It is, but there are a lot of applications that people like
           | to use on Windows PCs (think CAD or data analysis stuff) that
           | have computationally-intensive subroutines. In that company's
           | case it was GPU-accelerated electromagnetic wave simulations,
           | seismic imaging reconstruction, and CT scan reconstruction.
           | The company developed these libraries and licensed them for
           | use in larger CAD or data analysis software packages.
        
         | Const-me wrote:
         | Probably timeBeginPeriod WinAPI called by that media player:
         | https://docs.microsoft.com/en-us/windows/win32/api/timeapi/n...
        
           | vardump wrote:
           | Google Chrome had exactly that effect and at least in the
           | past running Google Chrome made some software to function
           | correctly. (Although perhaps there's also some software
           | timeBeginPeriod(1) affects negatively.)
           | 
           | Doesn't help when your testers run Google Chrome all the
           | time...
        
         | [deleted]
        
         | shmerl wrote:
         | Remind me this:
         | https://www.extremetech.com/computing/294907-why-moving-the-...
        
       | SamReidHughes wrote:
       | You can also see performance improvements in processes that do
       | I/O by having a low priority process running that does nothing
       | but run an infinite loop. This keeps the computer from switching
       | to idle CPU states during the I/O. This was on Linux, there is
       | probably an OS setting to accomplish the same thing, but it was
       | pretty counter-intuitive.
        
       | wruza wrote:
       | _missing vblank due to power management_
       | 
       | Ugh. Few years ago I've built a gaming rig with i5-8400 and
       | GTX1080 (both chosen for known workloads). Some games ran fine,
       | but some were jerky af, and frametime monitor was zigzag-y all
       | over the place. I thought that maybe 8400 was not a best choice
       | despite my research and bought i7-8700 only to see that situation
       | got much _worse_. After days of googling and discussions I found
       | the issue: mobo bios had C1E state enabled. In short, it allows
       | to drop the CPU frequency and voltage significantly when it 's
       | idling, but this technique isn't ready to operate 100+ times per
       | second. After drawing a frame, CPU basically did nothing for a
       | period of time (<10ms), which was enough to drop to C1E, but it
       | can't get out of it quickly for some reason. And of course 8700
       | was much better at sucking at it, since it had more free time to
       | fall asleep.
       | 
       | I understand that power saving is useful in general, but man,
       | when Direct3D sees every other frame skipped, maybe it's time to
       | turn the damn thing off for a while. Idk how a regular consumer
       | could deal with it. You basically spend a little fortune on a
       | rig, which then stutters worse than an average IGP because of
       | some stupid misconfiguration.
        
         | stedolph wrote:
         | I strongly agree with you, and it's just that I have i7, but
         | all of the experiences are quite similar.
        
         | Const-me wrote:
         | > but it can't get out of it quickly for some reason
         | 
         | As overclockers are aware, to achieve higher frequencies but
         | keep CPU stable, one gonna need higher CPU voltage. Works other
         | way too, lowing frequency allows to lower voltage, and that's
         | what mostly delivers the power saving from these low-power
         | states.
         | 
         | These chips can't adjust voltage instantly because wires inside
         | them are rather thin, and there's non-trivial capacity
         | everywhere. This means CPUs can drop frequency instantly, then
         | decrease the voltage over time. However, if they raise
         | frequency instantly without first raising the voltage, the chip
         | will glitch.
         | 
         | That's AFAIK the main reason why increasing the clock frequency
         | takes time. The chips first raise the voltage which takes time
         | because capacity, and only then they can raise the frequency
         | which is instant.
        
         | Bancakes wrote:
         | There's apps that disable C states and core parking, and boost
         | P states.
         | 
         | I use QuickCPU and max everything out. Yes it sounds like a
         | sham but works wonders.
         | 
         | https://coderbag.com/product/quickcpu
        
           | wruza wrote:
           | I simply disabled C1E in BIOS, because it's a desktop. But
           | still had to use EmptyStandbyList technique afterwards, which
           | helps with the rest of the issue (tested with and without for
           | few days, it really works).
        
       | splittingTimes wrote:
       | Is it possible to employ any of those API calls in Java? How
       | would the equivalents look here?
        
       | howdydoo wrote:
       | > As of April 5th 2017 with the release of Windows 10 Version
       | 1703, SetProcessDpiAwarenessContext used above is the replacement
       | for SetProcessDpiAwareness, which in turn was a replacement for
       | SetProcessDPIAware. Love the clear naming scheme.
       | 
       | This is the kind of thing I hate about "New Windows". Once upon a
       | time MS used to strive for backward compatibility. These days
       | every few years there's a new function you need to call. You
       | can't get optimal behavior just by writing good code from the
       | start. You need to do that, and also call the
       | YesIKnowHowPixelsWork api call, and set
       | <yesIAmCompetent>true</yesIAmCompetent> in your manifest to get
       | what should be the default behavior. It's a mess.
        
         | zamadatix wrote:
         | This is precisely the "Old Windows" way of doing things where
         | there are legacy APIs still supported for that forever
         | backwards compatibility and current APIs exist for ways you
         | probably want to do things in a new app.
         | 
         | For reference SetProcessDPIAware solidified over 15 years ago
         | whereas 15 years prior to that there wasn't even a taskbar so
         | of course it's going to be out of date from a UI API
         | perspective but that's what's needed if you want to also
         | support apps from 15 years ago well.
        
           | andrewf wrote:
           | Specific example from the good old days: EnableTraceEx2
           | "supersedes the EnableTrace and EnableTraceEx functions.".
           | https://docs.microsoft.com/en-
           | us/windows/win32/api/evntrace/...
           | 
           | Func -> FuncEx -> FuncExN was a common pattern. (Which I like
           | more than Func -> Funcness -> FuncnessContext, despite the
           | lack of creativity!) Another one was tagging structures with
           | their own length as the first member variable, so if a later
           | SDK creates a newer version of the struct, the callee can
           | tell the difference. eg https://docs.microsoft.com/en-
           | us/windows/win32/seccrypto/cry...
        
         | ziml77 wrote:
         | The reason it's so complex is _because_ of backwards
         | compatibility. Non-DPI aware applications from before DPI
         | settings were a thing can 't advertise that they're not DPI
         | aware, so if an application doesn't announce which it is,
         | Windows has to assume that it's not aware. A couple years ago,
         | Microsoft was able to make changes to the GDI libraries to
         | automatically adjust the size of elements its rendering which
         | makes a lot of things sharper. But things like images or
         | anything on screen not rendered by GDI will not magically
         | become sharp.
        
       | rossy wrote:
       | As someone who contributed to a (formerly) OpenGL-based video
       | player[1], these issues with waiting for vblank and frame time
       | variability on Windows are depressingly familiar. Dropping even
       | one frame is unacceptable in a video player, but we seemed to
       | drop them unavoidably. We fought a losing battle with frame
       | timings in OpenGL for years, which eventually ended by just
       | porting the renderer to Vulkan and Direct3D 11.
       | 
       | One thing that we noticed was that wakeups after wglSwapBuffers
       | were just more jittery than wakeups after D3D9/D3D11 Present()
       | with the same software on the same system. In windowed mode, this
       | could be mitigated by blocking on DwmFlush() instead of
       | wglSwapBuffers (it seems like GLFW does this too, but only in
       | Vista and 7.)
       | 
       | The developer might also get some mileage from using ANGLE (a
       | GLES 3.1 implementation on top of D3D11) or Microsoft's new
       | GLon12.
       | 
       | [1]: https://mpv.io/
        
       | nottorp wrote:
       | Two comments that kinda go against the flow:
       | 
       | 1. Please add options to _conserve_ battery too. A FPS limiter
       | would be good. Messing with the system power management when the
       | user doesn 't want to be tethered to a wall plug is Not Nice(tm).
       | 
       | 2. When you do UI scaling, especially if you're young with 20/20
       | eyesight, please allow scaling beyond what you think is big
       | enough.
        
       ___________________________________________________________________
       (page generated 2022-01-17 23:01 UTC)