[HN Gopher] SIMD Perlin Noise: Beating the Compiler with SSE (2014)
___________________________________________________________________
SIMD Perlin Noise: Beating the Compiler with SSE (2014)
Author : homarp
Score : 43 points
Date : 2025-07-21 05:28 UTC (2 days ago)
(HTM) web link (scallywag.software)
(TXT) w3m dump (scallywag.software)
| jesse__ wrote:
| Author here, AMA :)
| 0points wrote:
| Nice write-up, and congratulations on the result! Since it's
| about perlin and performance, have you had a look at
| opensimplex?
|
| PS. bonsai looks really cool! Checking it out right now
| jesse__ wrote:
| I haven't looked at opensimplex. I will when I get around to
| doing a simplex implementation.
|
| And thanks for the kind words!
| Keyframe wrote:
| pretty sweet! I'm mostly interested in how / what did you do to
| measure the performance and focus on a function. Is it perf
| pretty much with hist or visualizer or what?
| jesse__ wrote:
| I just called _rdtsc() before and after the noise gen once
| every iteration, and pushed the sample onto a fixed size
| buffer .. after some N iterations (4k maybe, can't remember)
| of samples, computed min/max/avg.
|
| There's a little project here that I used to benchmark in
| part 4
|
| https://github.com/scallyw4g/bonsai_noise_bench
| jokoon wrote:
| you should post the result at the end
|
| and yes, make a benchmark
|
| (although I would not know how to make one, or what reference
| point to use)
|
| what do you think about the fastnoiselite implementation used
| in godot?
| jesse__ wrote:
| I kinda did post results at the end of part 4 .. I beat the
| SOTA by 1.8x
|
| There's a benchmark utility here:
| https://github.com/scallyw4g/bonsai_noise_bench
|
| Fastnoise2 is a high quality library. Can't speak to
| fastnoiselite .. never looked at it.
| vlovich123 wrote:
| Which compiler & optimization settings did you use? Out of
| curiosity, any idea why the compiler failed to auto-vectorize
| the loops?
| jesse__ wrote:
| Clang -O2
|
| ..
|
| -O3 didn't seem to make any appreciable difference.
|
| Re. the auto-vectorization, I really don't know. I didn't
| even read the assembly the compiler generated until at least
| halfway through the process. Generally I've found that you
| basically can't rely on the compiler auto vectorizing
| anything, ever, if it actually matters.
| addaon wrote:
| Memories. As a personal project back in... 2003?... I decided to
| do something similar, implement 4D Perlin Noise in Altivec
| assembly. The only problem was that I had a G3 iBook; so I would
| write one instruction of assembly, then write a C function to
| interpret that assembly, building an interpreter for a very
| selective subset of PPC w/ Altivec that ran (slooooowly) on the
| G3. As I recall I got it down to ~200 instructions, and it worked
| perfectly the first time I ran it on a G4, which was pretty
| rewarding. Took me more than half a day, though. On an unrelated
| note, I got an intership with Apple's performance team that
| summer.
| rincebrain wrote:
| Did you profile the results with different compilers?
|
| The last time I tried doing this kind of microoptimization for
| fun, I ended up bundling actual assembly files, because the
| generated assembly for intrinsics was so variable in performance
| across compilers it was the only way to get consistent results on
| many platforms.
| jesse__ wrote:
| I only build the project this is embedded in with clang, so
| that's the only compiler I tested.
| llm_nerd wrote:
| HN loves SIMD, and there is a "how I hand crafted a SIMD
| optimization" post doing numbers on here regularly. They're fun
| posts, and it absolutely speaks to the fact that writing code
| that optimizing compilers can robustly and comprehensively turn
| into good SIMD branches is somewhat of a black art.
|
| Which is why you, _generally_ , shouldn't be doing either. You
| shouldn't rely upon the compiler to figure out your intentions,
| and you shouldn't be writing SIMD instructions directly unless
| you're writing a SIMD library or an optimizing compiler.
|
| Instead you should reach for one of the many available libraries
| that not only force you into appropriately structuring your data
| and calls for SIMD goodness, they're massively more portable and
| powerful.
|
| Google's Highway, for instance, will let you use their abstracted
| SIMD functions and it provides the optimization whether your
| target is SSE2-4, AVX, AVX2, AVX512, AVX10, or if you build for
| ARM NEON or SVE, for any conceivable vector size, or WASM's weird
| SIMD functions, or RISC-V's RVV, and several more, and when new
| widths and new options come out, the library adds the support and
| you might not have to change your code at all.
|
| There are loads of libraries like this (xsimd, EVE, SIMDe, etc).
| They all force you into thinking about structuring your code in a
| manner that is SIMDable -- instead of hoping the optimizing
| compiler will figure it out on its own -- and provide targeting
| for a vast trove of SIMD options without hand-writing for every
| option.
|
| I was going to quickly rewrite the example in Highway just to
| demonstrate but the Perlin stuff seems to be missing or
| significantly restructured.
|
| " _But that is obvious and I 'm mad that you commented this_" -
| no, it isn't obvious whatsoever, and this "I hand-rolled some SSE
| now my app is super awesome look at the microbenchmark results on
| a very narrow, specific machine" content appears on here
| regularly, betraying a pretty big influence of beginners who _don
| 't_ know that it's almost certainly the wrong approach.
| 63 wrote:
| This is a valuable viewpoint that lines up somewhat with some
| other discussion I've seen on the topic [0]. I'd like to see
| more posts about structuring code for the auto vectorizor (with
| libraries or otherwise) rather than writing simd by hand. Do
| you have any documentation you'd recommend?
|
| [0] https://matklad.github.io/2023/04/09/can-you-trust-a-
| compile...
| jesse__ wrote:
| I disagree pretty strongly with most of what you said, but I'd
| be very interested in seeing a Highway example and looking at
| the differences. Take a look through the comments, I left a
| link to the test bench I made, which contains all the code.
___________________________________________________________________
(page generated 2025-07-23 23:01 UTC)