Post AWS1mEVqGNfXeDWDTs by leviramsey@social.vivaldi.net
 (DIR) More posts by leviramsey@social.vivaldi.net
 (DIR) Post #AWS1mDa3jDRql0U4Po by SethTisue@fosstodon.org
       2023-06-07T01:00:00Z
       
       0 likes, 0 repeats
       
       “Warning: Do not ever use JMH on Apple's M-series hardware”, says @djspiewak (with characteristic comic overstatement, of course) — they're weird processors that have very different performance characteristics than the CPUs you'll likely deploy on. #ScalaDays
       
 (DIR) Post #AWS1mEVqGNfXeDWDTs by leviramsey@social.vivaldi.net
       2023-06-07T13:37:39Z
       
       0 likes, 0 repeats
       
       @SethTisue @djspiewak Likely even if the processors you end up deploying on are ARM.
       
 (DIR) Post #AWS1mFFDXfxtuwkSCu by djspiewak@fosstodon.org
       2023-06-07T13:58:39Z
       
       0 likes, 0 repeats
       
       @leviramsey @SethTisue Indeed. The problem isn’t ARM (usually). The problem is the SoC.
       
 (DIR) Post #AWS1mFdg4jWz8oBzKi by SethTisue@fosstodon.org
       2023-06-07T01:04:33Z
       
       0 likes, 0 repeats
       
       P.S. on Planet Spiewak, slow things are “even even even even even more worse” and fast things are "l-u-u-u-u--u-udicrously fast”
       
 (DIR) Post #AWS1mFwSwsYm54yzcO by alexelcu@social.alexn.org
       2023-06-07T14:41:44Z
       
       0 likes, 0 repeats
       
       @djspiewak I was aware that M1/M2 has specific optimizations that aren't representative of other ARM platforms, but not that it makes such a difference in benchmarks. Are there any resources we can read?@leviramsey @SethTisue
       
 (DIR) Post #AWS9MCDYyKKydP43MG by djspiewak@fosstodon.org
       2023-06-07T16:06:42Z
       
       0 likes, 0 repeats
       
       @alexelcu @leviramsey @SethTisue It hasn't been talked about too much publicly to my knowledge, but it really works out to the same factors which make the M-series chips so incredibly fast in practice. The memory bandwidth and physical proximity of main memory to the compute units is the most relevant difference (for non-graphical workloads).
       
 (DIR) Post #AWSHu8iRIXUnFiCe2a by alexelcu@social.alexn.org
       2023-06-07T17:42:30Z
       
       0 likes, 0 repeats
       
       @djspiewak Ah, so memory access is faster. Therefore, there's less pressure to optimize memory access patterns.@leviramsey @SethTisue
       
 (DIR) Post #AWSI5fkGpqnisRYhf6 by djspiewak@fosstodon.org
       2023-06-07T17:44:34Z
       
       0 likes, 0 repeats
       
       @alexelcu @leviramsey @SethTisue Exactly. In fact, memory access is so fast that it's almost like having a really, really large L3 cache. Since cache management is such a massive bottleneck in modern processes, this ends up significantly distorting performance.