[HN Gopher] 32 MiB Working Sets on a 64 GiB machine
       ___________________________________________________________________
        
       32 MiB Working Sets on a 64 GiB machine
        
       Author : nikbackm
       Score  : 127 points
       Date   : 2023-10-02 06:09 UTC (2 days ago)
        
 (HTM) web link (randomascii.wordpress.com)
 (TXT) w3m dump (randomascii.wordpress.com)
        
       | leptons wrote:
       | ..
        
         | Dylan16807 wrote:
         | If you want a powershell command to check your memory capacity,
         | try (Get-CimInstance Win32_PhysicalMemory).Capacity
         | 
         | My RAM sticks each have 8589934592 = 2^33 bytes.
        
         | HideousKojima wrote:
         | > don't know of any computers that have 64 GiB of RAM.
         | 
         | It's the other way around, most computers have ram in gibibytes
         | but list it as gigabytes. It's mostly hard drive and other
         | storage manufacturers that treat gigabytes as actual gigabytes
         | as a way to skimp out
        
       | NavinF wrote:
       | Wow, PROCESS_MODE_BACKGROUND_BEGIN is quite the footgun. I wonder
       | if this started out as a hack to reduce the impact of OEM
       | bloatware on benchmarks. Is there an easy way to list all
       | processes that use this priority class?
        
         | brucedawson wrote:
         | There probably is a way to find what processes have this mode
         | enabled, but it's probably more productive to search github for
         | references to the problematic constant.
        
           | fluoridation wrote:
           | Run Process Explorer and enable the Memory Priority column.
           | In the Select Columns dialog, its checkbox is on the Process
           | Memory tab. Priority 5 is normal; priority 1 is background.
        
       | Majromax wrote:
       | I wonder if this is some cross-pollination legacy of Windows
       | Phone. On such a device, background processes are still
       | necessary, memory is a much tighter constraint, and enough pages
       | might be mapped directly from device flash for auto-eviction to
       | make sense.
        
         | sznio wrote:
         | I remember using some software to increase the pagefile on my
         | Windows Phone. Multitasking worked so much better after that.
         | Apps loaded faster since it would just take the image of an app
         | suspended in the pagefile rather than starting it over.
        
         | toast0 wrote:
         | As a Windows Phone fan, the timing doesn't quite match up. They
         | reported this happens as far back as Windows 7, but Windows
         | Phone 7 was based on Windows CE; WP 8 was NT based, I would
         | have expected cross-pollination to start showing up in Windows
         | 8.
        
       | tedunangst wrote:
       | I feel like there must be another part to the story. How did only
       | one chrome user notice this?
        
         | remram wrote:
         | I mean, it's only the installer. I know I only run it once per
         | machine, usually right in the middle of installing all the rest
         | of my usual utilities. It would be easy to miss that it's much
         | too slow.
         | 
         | It's not like I have a good sense of what it does anyway, in
         | addition to copying files for a fraction of a second.
        
       | jrmg wrote:
       | _Trimming the working set of a process doesn't actually save
       | memory. It just moves the memory from the working set of the
       | process to the standby list. Then, if the system is under memory
       | pressure the pages in the standby list are eligible to be
       | compressed, or discarded (if unmodified and backed by a file), or
       | written to the page file. [... If] the system has gobs of free
       | and available memory then it may never do anything with the page,
       | making the trimming pointless. The memory isn't "saved", it's
       | just moved from one list to another. It's the digital equivalent
       | of paper shuffling._
       | 
       | This sounds like it should be basically free in a memory-
       | unconstrained system - is the system really just spending all its
       | time managing these lists? Why is it so expensive?
        
         | toast0 wrote:
         | When the address isn't in the process page table accessing it
         | traps into the kernel; then you've got to do all the cache
         | clearing stuff for speculative execution vulnerabilities; then
         | the paging logic needs to figure out what to do, in this case
         | because of the set limit, you've also got to demote at least
         | one page, which means updating the page table, but also sending
         | IPIs to all processors to flush that page/those pages from the
         | TLBs; and then you return to execution.
         | 
         | If you're accessing a lot of pages, mostly randomly, it's going
         | to be a mess. If you're decompressing files (as an installer
         | might do), and the compression window is large relative to the
         | limit (as it might be to get the highest compression ratio),
         | that could be a lot of mostly random access to pages, causing a
         | lot of page faults, and high cpu.
        
         | tetha wrote:
         | I don't know about Windows, but the Linux kernel is entirely
         | fine to just keep rarely used memory pages around as long as it
         | doesn't need the memory. If you don't need the memory
         | otherwise, having the page loaded and cached is better than
         | needing to load it.
         | 
         | Even though, then you end up with confused reports about "Linux
         | eating all the memory" and "an idle system using all the memory
         | (for caching)".
         | 
         | AFAIK, newer kernels also have a preemptive swap-out of idle
         | pages so they can evict the pages quicker, so an idle system
         | might even be swapping (out). This results in even more
         | confusion.
        
           | wongarsu wrote:
           | My impression is that Windows is somewhat aggressive at
           | paging or using memory compression (as a faster alternative
           | to a write to swap), with the goal of freeing more space for
           | cache use. Not very aggressive, but a lot more than Linux.
        
           | zamalek wrote:
           | > "Linux eating all the memory" and "an idle system using all
           | the memory (for caching)".
           | 
           | I find these reports bewildering (alongside CPU usage). You
           | would hope that something would be exploiting your expensive
           | machine to the fullest (so long as the work being done is
           | useful/intentional/desired - obviously not wasteful page
           | thrashing).
        
             | andrepd wrote:
             | There's the power consumption issue, of course.
        
               | wtallis wrote:
               | For CPU time, yes, but I don't think DRAM power
               | consumption for idle refreshing varies meaningfully
               | depending on the data stored in it.
        
               | vlovich123 wrote:
               | You don't see it in x86 I think, but I believe ARM mobile
               | SoCs can completely turn off banks that are unused. I
               | haven't followed to see if anyone actually does this yet,
               | but I recall hearing about attempts to turn off unused
               | RAM when in sleep (when awake, the RAM refresh cost is
               | negligible). To turn off the RAM implies that you might
               | want to prefer to place cache data in the banks that you
               | will want to turn off so that you can just drop caches on
               | sleep quickly. Of course, whether that's actually
               | beneficial in practice is hard to say - some caching you
               | want to be persistent as you'll need it on wake anyway.
               | 
               | Outside cases where you want to optimize for sleep
               | performance, you probably don't want to do this though
               | because NUMA systems have conflicting requirements for
               | storage (but thankfully for now the systems with NUMA and
               | the systems that benefit from turning off RAM refresh
               | when sleeping has 0 overlap). Consumer desktops are
               | probably not worth doing this on and laptops probably
               | isn't a huge difference because of battery size / usage
               | patterns.
        
           | placesalt wrote:
           | The effect of caching on Windows can be quite pronounced if
           | you process large datasets on a machine with a large amount
           | of RAM.
           | 
           | If, say, you have 256GB RAM and a 100GB folder of ~1GB files,
           | you will only ever have a few GB used actively. A first pass
           | of processing over the folder will take a long time (reading
           | from disk). Subsequent passes will be much faster, though,
           | because reading is done from the RAM-cached versions of the
           | files (the output from the previous run, was my
           | understanding).
        
           | 5e92cb50239222b wrote:
           | https://youtu.be/beefUhRH5lU
        
       | Borg3 wrote:
       | Haha :) I just checked MSDN docs and vioala:
       | PROCESS_MODE_BACKGROUND_BEGIN - Not supported under Win2003 or
       | Windows XP.
       | 
       | Another improvement from Microsoft without really understanding
       | their own OS.
        
         | jacobgorm wrote:
         | I think they understood their OS and memory model better than
         | most commenters in this thread do. I was once paid to read most
         | of Windows Internals 5 and try to understand their memory
         | management concepts, some of which date back to VMS, and most
         | of them actually make a lot of sense.
         | 
         | The Background mode has its merits too, but I guess they never
         | imagined that a background process would need to spend huge
         | amounts of memory just to do some background work.
        
           | Borg3 wrote:
           | Sure, most of time im very happy with Win2003. Pretty solid
           | OS. There are some issues, too bad never resolved, like
           | PageFile management and Cache management for example. I bet
           | there are more...
        
           | Filligree wrote:
           | Huge? The limit is 32MiB. That isn't large by any measure.
        
             | Borg3 wrote:
             | Yeah, especially it was intruduced with Win7 AFAIR. So
             | processes were already pretty inflated ;)
        
               | jacobgorm wrote:
               | Win7 was supposed to be usable with 1GiB of RAM, so 32MiB
               | for some low priority background task seems quite
               | reasonable.
               | 
               | https://support.microsoft.com/en-
               | us/windows/windows-7-system...
        
       | kristianp wrote:
       | So he didn't report the bug to Microsoft? His final comments were
       | just to say:
       | 
       | "This issue has been known for eight years, on many versions of
       | Windows, and it still hasn't been corrected or even documented. I
       | hope that changes now."
        
         | brucedawson wrote:
         | I reported the documentation bug. I don't use Feedback Hub
         | because it never works for this type of issue. I know that some
         | Microsoft developers will see this blog post and I hope that
         | internal bugs get filed.
         | 
         | But ultimately as a non-paid tester of Windows I'm under no
         | obligation to report non-security bugs in any particular way. I
         | like reporting them through twitter and blog posts.
        
           | dabbz wrote:
           | Back when I worked on Windows, we would triage Windows
           | Feedback hub reported issues once a sprint (~2 weeks). But
           | only those which had at least 2-5 upvotes on them. One-off
           | bugs usually meant they weren't being widespread impactful
           | and weren't as higher priority on our list (even if that
           | isn't the case).
           | 
           | But the feedback hub is an abyss of random complaints to
           | filter through. Sometimes we'd see items like "Windows
           | wouldn't save my word document and now I hate windows" and
           | being on the networking team were like "uhhh thanks"
        
             | ack_complete wrote:
             | As an external developer, it's become clear to me that some
             | bugs don't get fixed simply because feedback doesn't get
             | through to the product teams.
             | 
             | An example is the MediaFoundation AAC encoder in Windows
             | 10. It's unusable, due to a bug that randomly introduces
             | oink artifacts into the output. I submitted it to Feedback
             | Hub, but of course it was too niche to get upvotes. Looked
             | around, OBS Studio ran into the same issue and had to
             | switch to a different AAC encoder, so it wasn't a rare
             | problem. Someone even tried posting on Microsoft Answers
             | reproducing it with the Windows SDK sample, and got only a
             | generic response. Finally someone got through, and the
             | product team mentioned that this was the first they'd heard
             | of it... but at least it's actually fixed for Windows 11.
             | 
             | The design of Feedback Hub partly encourages the current
             | behavior. It used to have a very visible categorization on
             | the left side, but now they're easily missable drop-downs,
             | and the auto-suggestions are completely bonkers. You can
             | type up a bug about graphics errors and it suggests the
             | accessibility and first-time install categories. Lack of
             | curation also doesn't help; there's something to be said
             | for having a light touch, but there are posts consisting of
             | just random letters that have been there for years.
        
           | _a_a_a_ wrote:
           | I reported a probably Windows perf bug. Nothing happened. I
           | reported it again. Basically they weren't interested despite
           | my careful spelling it out. Time wasted. Fuck them.
           | 
           | Maybe it wasn't a bug (hah!) but they could have come back to
           | me and at least told me they'd looked at it and it wasn't.
           | I'd have appreciated that. But no.
        
           | theolivenbaum wrote:
           | Your blog is the most effective bug reporting tool for
           | Windows issues. Thanks for keeping it going!
        
         | markdog12 wrote:
         | He reported it in the best way possible. We all know what would
         | have happened if he filed a bug report (especially if they
         | didn't realize it was him).
        
         | glonq wrote:
         | I'd wager that the best solution is to get Raymond Chen's
         | attention. There's nothing that he can't do!
        
       | criddell wrote:
       | Bruce - any chance you could make your ETW videos available
       | again? I see they used to be available through a Microsoft
       | partnership of some type but that seems to have ended a few years
       | ago.
       | 
       | I'd love to learn more about ETW and your videos seem like a good
       | place to start. If anybody else has other recommendations, please
       | share!
        
         | a1o wrote:
         | ETW is Event Tracing for Windows if someone reads the above and
         | is wondering.
        
           | criddell wrote:
           | Thanks.
           | 
           | And the videos I'm talking about are linked from here:
           | 
           | https://randomascii.wordpress.com/2014/08/19/etw-training-
           | vi...
        
       | chrisbolt wrote:
       | Previous discussion (2 days ago):
       | https://news.ycombinator.com/item?id=37734685
        
         | dang wrote:
         | That one never made the front page so maybe we'll move the
         | comments hither and let this one run.
         | 
         | (It's on my list to do more for submissions like that which
         | accrue a bunch of upvotes slowly but never break the front
         | page...)
        
           | remram wrote:
           | Looks like you did the move. It's really weird because it
           | changed the timestamps on all the comments... They are even
           | different between my "threads" page and what I see when I
           | click through.
        
             | dang wrote:
             | Yes, sorry, I know it's confusing. I relativized the
             | timestamps for 2 reasons:
             | 
             | (1) if we didn't do that, commenters would start asking
             | "why are there a bunch of comments here older than the OP
             | they're commenting on?" - and our experience is that this
             | leads to more confusion than the other way around; plus
             | 
             | (2) it seems unfair to have all the comments on the earlier
             | thread plummet to the bottom of the new thread because
             | they're so much older.
             | 
             | This topic comes up regularly in a slightly different
             | context, which is re-upping stories that make it into the
             | second-chance pool
             | (https://news.ycombinator.com/item?id=26998308). There are
             | a bunch of past explanations here in case helpful: https://
             | hn.algolia.com/?dateRange=all&page=0&prefix=true&que....
        
       | [deleted]
        
       | jacobgorm wrote:
       | But why would Chrome's setup.exe need a 32MiB working set? If all
       | it does is download and unpack some files in the background, it
       | would seem like something that is doable in 32MiB or less.
       | 
       | The Windows BACKGROUND mode is useful for stuff like virus
       | scanning or database log compaction, where you want to make some
       | progress, but only if you can be certain not to hurt any
       | foreground workloads. Back when we had hard drives, this wasn't
       | actually trivial to achieve, because it is very easy to interfere
       | with the foreground workload just by doing a few disk seeks every
       | now and then.
       | 
       | I agree the 32MiB working setlimit is somewhat arbitrary and
       | should be documented, but Windows is full of these arbitrary
       | constants, like the 32kiB paging chunk for instance.
       | 
       | My recommendation for Chrome would be to stick with the
       | background mode, and fix whatever problem is causing the working
       | set to exceed 32MiB.
        
         | macqm wrote:
         | It actually does a lot more than you think.
         | 
         | For example it does pretty interesting diffing - Chrome
         | downloads a patch and uses the previously left-behind copy of
         | the installer to re-build new binaries from the small diff it
         | pulled. See more:
         | 
         | -
         | https://chromium.googlesource.com/chromium/src/+/HEAD/compon...
         | 
         | - https://www.chromium.org/developers/design-
         | documents/softwar...
        
         | toast0 wrote:
         | The offline installer reports as 103 mb, depending on the
         | compression settings, it might need a significant amount of
         | that mapped in at once.
        
           | jacobgorm wrote:
           | Probably because the files to be binary patched are now quite
           | large, and are loaded into memory verbatim. Perhaps they
           | could memory-map them instead, and ensure that their binary
           | diffing tool works from left to right to avoid thrashing?
        
         | stefan_ wrote:
         | Your recommendation is to stick to a pathological
         | implementation where every page fault results in the equivalent
         | of doing a Java GC run? All independent of the actual memory
         | pressure?
         | 
         | This isn't even about the exact limit, this implementation just
         | makes no sense. Add a hysteresis at least because running this
         | on every page fault is guaranteed to cause a lot of CPU load
         | while not returning very much to the system.
        
         | iforgotpassword wrote:
         | > The Windows BACKGROUND mode is useful for stuff like virus
         | scanning or database log compaction, where you want to make
         | some progress, but only if you can be certain not to hurt any
         | foreground workloads.
         | 
         | But you are hurting the overall system performance, or battery
         | life if this is a laptop and you keep thrashing the working set
         | for no good reason. I don't see how this proactive approach has
         | any advantages over just swapping out the process first if you
         | run into actual memory pressure, or maybe start paging out
         | pages of that process if they've been idle for a few minutes if
         | you insist on some kind if proactive measure.
        
           | jacobgorm wrote:
           | The background mode is not hurting anyone, it is the
           | thrashing process that is hurting itself by using too much
           | memory. This shows up as high CPU but, because it is in the
           | background, the rest of the system performance should be
           | largely unaffected. Anything with foreground priority will
           | still be scheduled to run whenever it needs to.
           | 
           | Had the background process been allowed to keep growing its
           | working set then foreground applications would be forced to
           | page out, defeating the goal of having a background mode in
           | the first place.
        
             | saagarjha wrote:
             | Doing useless work nobody asked for is always a problem.
             | People use systems with batteries and they would appreciate
             | their system not doing that.
        
             | AHTERIX5000 wrote:
             | But the limit of 32 MB is arbitrary and undocumented which
             | is a huge problem.
        
         | dist-epoch wrote:
         | In general using a 32 MB or larger compression dictionary
         | really improves compression size.
         | 
         | Not sure if that's why the Chrome installer uses 32MB+.
        
       | Varriount wrote:
       | I found this paragraph particularly interesting:
       | 
       | > Trimming the working set of a process doesn't actually save
       | memory. It just moves the memory from the working set of the
       | process to the standby list. Then, if the system is under memory
       | pressure the pages in the standby list are eligible to be
       | compressed, or discarded (if unmodified and backed by a file), or
       | written to the page file. But "eligible" is doing a lot of heavy
       | lifting in that sentence. The OS doesn't immediately do anything
       | with the page, generally speaking. And, if the system has gobs of
       | free and available memory then it may never do anything with the
       | page, making the trimming pointless. The memory isn't "saved",
       | it's just moved from one list to another. It's the digital
       | equivalent of paper shuffling.
       | 
       | I'd always been under the impression that as soon as memory was
       | trimmed from the working set. Perhaps this _was_ the case at some
       | point, and was a reason for the PROCESS_MODE_BACKGROUND_BEGIN
       | priority? As the blog mentions, the SetPriorityClass call has had
       | this behavior since at least 2015, though I wouldn 't be
       | surprised if this behavior has existed for much longer.
       | 
       | As for why this "bug" hasn't been fixed, my guess is that it's
       | due to a couple of factors:
       | 
       | - Windows has become fairly good over the years at keeping the
       | core UI responsive even when the system is under heavy load.
       | 
       | - There are plenty of ways to reduce memory/CPU usage that _don
       | 't_ involve a call to SetPriorityClass. I'd wager that setting a
       | process's priority class is not the first thing that would come
       | to mind.
       | 
       | - As a result of the previous two points, the actual number of
       | programs using that call is quite small. I'd actually be
       | interested in knowing what, if any, parts of Windows use it.
       | 
       | (As a side note, if there _was_ a bug in a Windows API function,
       | how would you even report it?
        
         | Bjartr wrote:
         | > Windows has become fairly good over the years at keeping the
         | core UI responsive even when the system is under heavy load.
         | 
         | I would say that's debatable. There are fewer complete freezes
         | that require a restart, but things like the task manager (can't
         | get much more core than that), which used to be instantly
         | available and responsive if the machine was recoverable at all,
         | now can take tens of seconds to show up and and respond to
         | interactions under heavy load.
        
       | mrguyorama wrote:
       | I'm not sure I understand, why not just manage your own memory
       | instead of asking the OS to magically understand your memory use-
       | case and properly manage/limit?
       | 
       | If the memory isn't owned by the OS, that's probably the wrong
       | actor to manage it.
       | 
       | They say that, instead, they used an undocumented API they
       | obviously didn't understand. Why? If you are concerned about
       | using up too much memory, then _do something about it_
        
         | remram wrote:
         | What's undocumented?
        
           | mrguyorama wrote:
           | >They didn't document this (!)
           | 
           | Is the fourth line of the article linked.
        
             | tiagod wrote:
             | The article makes it clear that the API is documented. The
             | 32MiB limit isn't.
        
               | fluoridation wrote:
               | To be fair, the limit is part of the API's semantics. If
               | the limit is undocumented, the API is at least partly
               | undocumented. It's as if a car had a headlights button
               | that the manual had this to say about it: The headlights
               | button turns on the headlights and the headlights
               | indicator light on the dashboard. However, nothing in the
               | manual states that the headlights indicator shares a
               | circuit with a valve that drains the fuel tank, so in
               | effect when you turn the headlights on, you're causing
               | the fuel to leak. I would say the button is not properly
               | documented in that case.
        
       ___________________________________________________________________
       (page generated 2023-10-04 23:01 UTC)