[HN Gopher] Making my debug build run 100x faster so that it is ...
___________________________________________________________________
Making my debug build run 100x faster so that it is finally usable
Author : broken_broken_
Score : 40 points
Date : 2025-02-18 08:48 UTC (14 hours ago)
(HTM) web link (gaultier.github.io)
(TXT) w3m dump (gaultier.github.io)
| bean-weevil wrote:
| Why not just compile that particular object with optimizations on
| and the rest of the file with optimizations off?
| broken_broken_ wrote:
| Yes, that's the obvious (and boring!) answer, that I mention in
| the introduction and that's in a way the implicit conclusion.
| But that does not teach us SIMD then :)
| saurik wrote:
| Your article isn't really about, though, how to speed up a
| debug build, and I thereby think you're likely not going to
| find the right audience. Like, to be honest, I gave up on
| your article, because while I found the premise of speeding
| up a debug build really interesting, I (currently) have no
| interest in hand-optimizing SIMD... but, in another time, or
| if I were someone else, I might find that really interesting,
| but then would not have thought to look at this article.
| "Hand-optimizing SHA-1 using SIMD intrinsics and assembly" is
| just a very different mental space than "making my debug
| build run 100x faster", even if they are two ways to describe
| the same activity. "Using SIMD and assembly to avoid relying
| on compiler optimizations for performance" also feels better?
| I would at least get it if your title was a pun or a joke or
| was in some way fun--at which point I would blame Hacker News
| for pulling articles out of their context and not having a
| good policy surrounding publicly facing titles or subtitles--
| but it feels like, in this case, the title is merely a poor
| way to describe the content.
| mperham wrote:
| Yep, I was hoping to learn how to do this. Seems like a much
| better long term lesson.
| senkora wrote:
| For gcc: #pragma GCC optimize ("O0")
|
| For clang: #pragma clang optimize off
|
| For MSVC: #pragma optimize("", off)
|
| Put one of these at the top of your source file.
| molenzwiebel wrote:
| For this use-case, you can squeeze out even more performance by
| using the SHA-1 implementation in Intel ISA-L Crypto [1]. The
| SHA-1 implementation there allows for multi-buffer hashes, giving
| you the ability to calculate the hashes for multiple chunks in
| parallel on a single core. Given that that is basically your
| usecase, it might be worth considering. I doubt it'll provide
| much speedup if you're already I/O bound here though.
|
| [1]: https://github.com/intel/isa-l_crypto
| secondcoming wrote:
| I came across this repo recently and it looks great. It's a
| pity there doesn't seem to be an official Ubuntu package for it
| though. There is one for the Intelligent Storage Acceleration
| Library though.
| broken_broken_ wrote:
| Thank you, I will definitely have a look, and update the
| article if there's any interesting finding
| ack_complete wrote:
| SHA1 is difficult to vectorize due to a tight loop-carried
| dependency in the main operation. In an optimized build, I've
| only seen about a 15% speedup over the scalar version with x64
| SSSE3 without hardware SHA1 support. Debug builds of course can
| benefit more from the reduction in operations since the
| inefficient code generation is a bigger issue there than the
| dependency chains. I think the performance delta is bigger for
| ARM64 CPUs, but it's pretty rare to not have the Crypto extension
| (except notably some Raspberry Pi models).
|
| The comments in the SSE2 version are a bit odd as it references
| MMX, and the Pentium M and Efficeon CPUs. Those CPUs are
| _ancient_ -- 2003 /2004 era. The vectorized code you have also
| uses SSE2 and not MMX, which is important since SSE2 is double
| the width and has different performance characteristics from MMX.
| IIRC, Intel CPUs didn't start supporting SHA until ~2019 with Ice
| Lake, so the target for non-hardware-accelerated vectorized SHA1
| for Intel CPUs would be mostly Skylake-based.
| tvbusy wrote:
| I understand the post is about learning to speed up SHA1
| calculation, that I have no comment. However, the state file is a
| solved problem for me. It's a rare case where state files are
| corrupted and it's simple to just re-check the file. I cannot
| imagine a torrent client checking the hash of TBs of files for
| every single start. It's not a coincidence that many torrent
| clients have a feature to skip hash checking and just immediately
| assume the file is correct and start seeding immediately.
___________________________________________________________________
(page generated 2025-02-18 23:00 UTC)