Post AWTF4mrAnwrBwFdLSi by mattst88@fosstodon.org
 (DIR) More posts by mattst88@fosstodon.org
 (DIR) Post #AWTF4l2nXky2Jj3bIu by mattst88@fosstodon.org
       2023-06-07T22:42:08Z
       
       0 likes, 0 repeats
       
       I'm the only person maintaining #Gentoo support for DEC Alpha. I was testing GTK-4 on Alpha for https://bugs.gentoo.org/838709 and noticed that its test suite generated a lot of unaligned accesses (e.g. a load of 4-bytes from an address that isn't 4-byte aligned). These are slow on Alpha because the kernel  has to trap and emulate the memory operation.For debugging, you can use the `prctl` utility (search for "prctl" on https://wiki.gentoo.org/wiki/Project:Alpha/Porting_guide for instructions).
       
 (DIR) Post #AWTF4m75ZHzfdK4XdA by mattst88@fosstodon.org
       2023-06-07T22:46:23Z
       
       0 likes, 0 repeats
       
       But I decided to use Gentoo's #SPARC development system since it's a lot faster than any of my Alphas, and on SPARC unaligned accesses generate a SIGBUS and the process is killed.So I build GTK on the SPARC, see that about ~250 of the ~750 tests in its test suite fail with SIGBUS, and run the first in gdb.And gdb doesn't work...So I investigate, try newer versions, older versions, latest git. Everything I try is broken, until I go all the way back to the gdb-10-branchpoint tag from 09/2020
       
 (DIR) Post #AWTF4mrAnwrBwFdLSi by mattst88@fosstodon.org
       2023-06-07T22:50:45Z
       
       1 likes, 0 repeats
       
       I bisect between the gdb-10-branchpoint and gdb-11-branchpoint tags:> Bisecting: 1837 revisions left to test after this (roughly 11 steps)and eventually arrive at https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=05c06f318fd9a112529dfc313e6512b399a645e4 which landed in July 2021.I file a bug upstream https://sourceware.org/bugzilla/show_bug.cgi?id=30525 and I think we'll have this fixed in a few days. #gdb has been broken for non-trivial programs on SPARC64 for years. /o\In the mean time, I use the old, but working, gdb to inspect the unaligned accesses.
       
 (DIR) Post #AWTF4nWeJk2A0t2T6u by mattst88@fosstodon.org
       2023-06-07T22:56:02Z
       
       0 likes, 0 repeats
       
       The first GTK unit test I run in gdb shows an unaligned access in Mesa. Great! I have a couple thousand commits in Mesa, so I'm sure I can deal with this.The problem seems to be in Mesa's freelist memory allocator, and I start reading the code. I spot where the allocation is coming from (`gc_alloc_size`) but the function ends with>    assert(((uintptr_t)ptr & (align - 1)) == 0);>    return ptr;so it's asserting that the pointer it returns is properly aligned. So what's wrong?
       
 (DIR) Post #AWTF4oJvMXRuTi5oum by mattst88@fosstodon.org
       2023-06-07T23:00:29Z
       
       1 likes, 0 repeats
       
       I rebuild Mesa with assertions enabled and rerun the unit test... and it passes. Huh?I rebuild without assertions, assuming I made some kind of mistake. Unit test fails again.I confirm that the allocations are properly aligned when Mesa is built with assertions and unaligned when built without assertions. So I'm not going crazy, but WTF?Then I spot this in the definition of the `gc_block_header` struct (which `gc_alloc_size` allocates):> #ifndef NDEBUG>    unsigned canary;> #endifOh.
       
 (DIR) Post #AWTF4p4MZsb0njouIa by mattst88@fosstodon.org
       2023-06-07T23:04:39Z
       
       0 likes, 0 repeats
       
       So the size of the allocation's header (used for tracking purposes, placed immediately before the memory allocation returned to the API user) is dependent on whether assertions are enabled or not! /o\The result is that when assertions are enabled, allocations are always aligned. But when assertions are disabled, allocations are unaligned. Heisenbug—the problem is there until you look for it.It's late and I'm tired, so I file https://gitlab.freedesktop.org/mesa/mesa/-/issues/9166 with all the nitty-gritty details.
       
 (DIR) Post #AWTF4pkC4M3YtTOJV2 by mattst88@fosstodon.org
       2023-06-07T23:06:48Z
       
       1 likes, 0 repeats
       
       The author of the freelist allocator puts together a merge request with a fix and a new unit test to prevent this sort of thing from ever happening again (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23501)With that patch applied, 249 tests in GTK's test suite that were failing now pass on SPARC64.What of the other couple tests?
       
 (DIR) Post #AWTF4qQNXVnh0J80Fk by mattst88@fosstodon.org
       2023-06-07T23:23:11Z
       
       0 likes, 0 repeats
       
       In one I find that the unaligned access is again in Mesa. It's a pretty typical pointer-aliasing bug, but for some additional complication it occurs in some code generated at build-time by a Python program.I make a merge request to fix it (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23482) after confirming that the unit test passes. The commit I reference in the "Fixes: ..." tag in the commit message is from 2010.The next unit test fails with an unaligned access in GTK itself, finally :)
       
 (DIR) Post #AWTF4r8Ku4xjCdh6lk by mattst88@fosstodon.org
       2023-06-07T23:29:37Z
       
       1 likes, 0 repeats
       
       So in order to fix some unaligned accesses I noticed in #GTK on DEC #Alpha, I- used a SPARC64 system- discovered that gdb is broken on SPARC64- bisected and reported the gdb regression, which happened nearly 2 years ago- found a legitimate heisenbug in Mesa and reported it after fully understanding it- reviewed the fix and unit test, confirmed that it fixes the problem and the unit tests now pass- debugged a strict-aliasing violation in Mesa and made a merge request fixing a bug from 2010
       
 (DIR) Post #AWTF4tMac3YI7QNWGO by mattst88@fosstodon.org
       2023-06-07T23:31:44Z
       
       1 likes, 0 repeats
       
       I just reran the GTK test suite on Alpha with the patches applied, and found out that there are more. Many, many more. /o\The saga continues...Edit: I was wrong! I failed to apply the second patch. With it applied, the test suite passes and only a single test has unaligned accesses (which is the one I was aware of already)Ok:                           722Expected Fail:        0Fail:                          0Unexpected Pass: 0   Skipped:                  1Timeout:                  0