[HN Gopher] The case of the UI thread that hung in a kernel call
___________________________________________________________________
The case of the UI thread that hung in a kernel call
Author : luu
Score : 81 points
Date : 2025-04-15 17:13 UTC (5 hours ago)
(HTM) web link (devblogs.microsoft.com)
(TXT) w3m dump (devblogs.microsoft.com)
| simscitizen wrote:
| Oh I've debugged this before. Native memory allocator had a
| scavenge function which suspended all other threads. Managed
| language runtime had a stop the world phase which suspended all
| mutator threads. They ran at about the same time and ended up
| suspending each other. To fix this you need to enforce some sort
| of hierarchy or mutual exclusion for suspension requests.
|
| > Why you should never suspend a thread in your own process.
|
| This sounds like a good general princple but suspending threads
| in your own process is kind of necessary for e.g. many GC
| algorithms. Now imagine multiple of those runtimes running in the
| same process.
| hyperpape wrote:
| > suspending threads in your own process is kind of necessary
| for e.g. many GC algorithms
|
| I think this is typically done by having the compiler/runtime
| insert safepoints, which cooperatively yield at specified
| points to allow the GC to run without mutator threads being
| active. Done correctly, this shouldn't be subject to the
| problem the original post highlighted, because it doesn't rely
| on the OS's ability to suspend threads when they aren't
| expecting it.
| ot wrote:
| On Linux you'd do this by sending a signal to the thread you want
| to analyze, and then the signal handler would take the stack
| trace and send it back to the watchdog.
|
| The tricky part is ensuring that the signal handler code is
| async-signal-safe (which pretty much boils down to "ensure you're
| not acquiring any locks and be careful about reentrant code"),
| but at least that only has to be verified for a self-contained
| small function.
|
| Is there anything similar to signals on Windows?
| dblohm7 wrote:
| The closest thing is a special APC enqueued via QueueUserAPC2
| [1], but that's relatively new functionality in user-mode.
|
| [1] https://learn.microsoft.com/en-
| us/windows/win32/api/processt...
| jvert wrote:
| Or SetThreadContext() if you want to be hardcore. (not
| recommended)
| zavec wrote:
| I knew from seeing a title like that on microsoft.com that it was
| going to be a Raymond Chen post! He writes fascinating stuff.
| eyelidlessness wrote:
| I thought the same thing. It's usually content that's well
| outside my areas of familiarity, often even outside my areas of
| interest. But I usually find _his writing_ interesting enough
| to read through anyway, and clear enough that I can usually
| follow it even without familiarity with the subject matter.
| pitterpatter wrote:
| Reminds me of a hang in the Settings UI that was because it would
| get stuck on an RPC call to some service.
|
| Why was the service holding things up? Because it was waiting on
| acquiring a lock held by one of its other threads.
|
| What was that other thread doing? It was deadlocked because it
| tried to recursively acquire an exclusive srwlock (exactly what
| the docs say will happen if you try).
|
| Why was it even trying to reacquire said lock? Ultimately because
| of a buffer overrun that ended up overwriting some important
| structures.
| rat87 wrote:
| Reminds me of a bug that would bluescreen windows if I stopped
| Visual Studio debugging if it was in the middle of calling the
| native Ping from C#
| bob1029 wrote:
| I've been able to get managed code to BSOD my machine by simply
| having a lot of thread instances that are aggressively
| communicating with each other (i.e., via Channel<T>). It's
| probably more of a hardware thing than a software thing. My
| Spotify fails to keep the audio buffer filled when I've got it
| fully saturated. I feel like the kernel occasionally panics
| when something doesn't resolve fast enough with regard to
| threads across core complexes.
| brcmthrowaway wrote:
| Can this happen with Grand Central Dispatch ?
| immibis wrote:
| did... did you understand what the bug was?
| markus_zhang wrote:
| Although I understand nothing from these posts, read Raymond's
| posts somehow always "tranquil" my inner struggles.
|
| Just curious, is this customer a game studio? I have never done
| any serious system programming but the gist feels like one.
| ajkjk wrote:
| I would guess it's something corporate. They can afford to
| pause the UI and ship debugging traces home more than a real-
| time game might.
| delusional wrote:
| Id actually expect a customer facing program more. Corporate
| software wouldn't care that the UI hung, you're getting paid
| to sit there and look at it.
| tedunangst wrote:
| The banker trying to close a deal isn't paid by the hour.
| immibis wrote:
| Unless the user's boss complained to the programmer's boss
| boxed wrote:
| I had a support issue once at a well known and big US defense
| firm. We got kernel hangs consistently in kernel space from
| normal user-level code. Crazy shit. I opened a support issue
| which eventually got closed because we used an old compiler. Fun
| times.
| makz wrote:
| Looking at the title, at first I thought "uh?", but then I saw
| microsoft and it made sense.
| frabona wrote:
| Such a clean breakdown. "Don't suspend your own threads" should
| be tattooed on every Windows dev's arm at this point
___________________________________________________________________
(page generated 2025-04-15 23:00 UTC)