https://github.com/cisco/ChezScheme/pull/879 Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By company size + Enterprises + Small and medium teams + Startups By use case + DevSecOps + DevOps + CI/CD + View all use cases By industry + Healthcare + Financial services + Manufacturing + Government + View all industries View all solutions * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} cisco / ChezScheme Public * Notifications You must be signed in to change notification settings * Fork 986 * Star 7k * Code * Issues 114 * Pull requests 4 * Actions * Projects 0 * Wiki * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Sign up for GitHub By clicking "Sign up for GitHub", you agree to our terms of service and privacy statement. We'll occasionally send you account related emails. Already on GitHub? Sign in to your account Jump to bottom workaround Clang v15 AArch64 miscompile that affects parallel collection #879 Merged mflatt merged 1 commit into cisco:main from mflatt:clang15-workaround Oct 14, 2024 Merged workaround Clang v15 AArch64 miscompile that affects parallel collection #879 mflatt merged 1 commit into cisco:main from mflatt:clang15-workaround Oct 14, 2024 +67 -0 Conversation 4 Commits 1 Checks 15 Files changed 4 Conversation This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters mflatt Copy link Contributor @mflatt mflatt commented Oct 13, 2024 This patch avoids a miscompile using Clang v15 on macOS. The default compiler on macOS was recently upgraded to Clang v16, which appears to fix the problem, and I have not been able to replicate the problem with Clang v15 variants that are available in Linux distributions. So, it might be ok to just ignore the problem. But since v15 installations are likely to hang around for a while in other macOS installations, since the workaround is simple, since Racket users who build themselves are affected, and since I spent a lot of time tracking down the problem, I'm inclined to include a workaround. For details on the miscompile at it affects Chez Scheme, see clang15-miscompile.zip. Sorry, something went wrong. All reactions @mflatt workaround Clang v15 AArch64 miscompile that affects parallel collection 39f9932 @mflatt Copy link Contributor Author mflatt commented Oct 13, 2024 I spent so long tracking this down that I'd like to tell you the long story, even though it doesn't really matter. The miscompile seems like a run-of-the-mill compiler error, but the way it affected Chez Scheme and Racket made it especially difficult to find. During 2022-2024, I've tried off and on to track down an occasional failure in Racket builds on my macOS M1/M2 laptops. Memory would get mangled late in the build -- specifically during documentation rendering for he "math" library, which uses libgmp and libmpfr in multi-threaded mode. Since the problem never happened on x86_64, and since it only happened during parallel documentation rendering, I was pretty sure that I was looking for some sort of race condition exposed by AArch64's weak memory coherence. Although I discovered that I could provoke a crash by just rebuilding documentation, even that step takes 10 minutes, and the crash would only happen rarely, so getting a crash would take hours. Any little change I made to try to gather information would make the crash go away or become much more difficult to provoke, so hours turned to days. Meanwhile, users of the Racket main distribution were not running into problems, which I chalked up to the fact that documentation is pre-rendered. Also, maybe more generally libgmp or libmpfr needed to be involved, so maybe it wasn't my problem. In any case, the lack of reports made the problem feel less of an emergency than I would normally consider crashing bugs, especially since I had so much trouble replicating the crash or pinpointing an issue. So, I'd burn a day or three on the issue every few months. In September 2024, I finally gathered evidence to suspect that the problem was in the GC's parallel mode. And with that suspicion, I was finally able to make a small Chez Scheme program with the right ingredients to crash, showing that the problem was independent of Racket and math libraries. The big difference was being able to provoke a crash within seconds instead of hours, and I found the problem over the next day. In retrospect, it's clear why the problem was so difficult to find. I was pretty sure I was looking for a memory race, but that turned out to be because only multi-threaded programs could reach the miscompiled code. And only during parallel collections. And only when the collector is looking at specific words within a thread representing virtual registers, which are not something that programs normally use directly. The effect of the miscompile was that a "does this object belong to me?" check would succeed when it shouldn't. That matters only when a thread has an object in its virtual register that was allocated by a different thread, which is an even more rare use of a virtual register. And even when it goes wrong, there's only a small chance that different collector threads will end up looking at the same object at the same time, and even concurrent traversal of the same object will turn out ok a lot of the time! Finally, and most perniciously, the miscompile creates a race that isn't in the source code, and in a code template that is put in place by a macro that is used dozens of times in the output (and compiled ok in all other other instances). Meanwhile, Racket distributions are compiled with Clang v12, which is why it hasn't been a problem for Racket users, even when they run programs with parallelism. 6 maoif, burgerrg, shhyou, gus-massa, johnklos, and colejohnson66 reacted with thumbs up emoji [?] 17 Bogdanp, brghena, ufo5260987423, jryans, jackfirth, countvajhula, hinkelman, samdphillips, yjqww6, anttih, and 7 more reacted with heart emoji All reactions * 6 reactions * [?] 17 reactions Sorry, something went wrong. jltaylor-us jltaylor-us approved these changes Oct 13, 2024 View reviewed changes Copy link Contributor @jltaylor-us jltaylor-us left a comment There was a problem hiding this comment. Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. [Choose a reason] Hide comment Great detective work, Matthew! Sorry, something went wrong. All reactions @maoif Copy link Contributor maoif commented Oct 14, 2024 Thanks for fix and sharing your experience of tracking down this tricky bug. All reactions Sorry, something went wrong. @ufo5260987423 Copy link ufo5260987423 commented Oct 14, 2024 You are the hero! All reactions Sorry, something went wrong. Hide details View details @mflatt mflatt merged commit fc577f2 into cisco:main Oct 14, 2024 15 checks passed mflatt added a commit to racket/racket that referenced this pull request Oct 14, 2024 @mflatt Chez Scheme: workaround Clang v15 AArch64 miscompile that affects par... ... e27d876 ...allel collection See cisco/ChezScheme#879 for more information. Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment Reviewers @jltaylor-us jltaylor-us jltaylor-us approved these changes Assignees No one assigned Labels None yet Projects None yet Milestone No milestone Development Successfully merging this pull request may close these issues. 4 participants @mflatt @maoif @ufo5260987423 @jltaylor-us Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later. Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.