[HN Gopher] Intel's "Cripple AMD" Function (2019)
       ___________________________________________________________________
        
       Intel's "Cripple AMD" Function (2019)
        
       Author : phab
       Score  : 251 points
       Date   : 2022-04-06 17:27 UTC (5 hours ago)
        
 (HTM) web link (www.agner.org)
 (TXT) w3m dump (www.agner.org)
        
       | mhh__ wrote:
       | This could arguably dated 2009 as that is when it was originally
       | discovered (approx.).
       | https://www.agner.org/optimize/blog/read.php?i=49
        
         | ReleaseCandidat wrote:
         | It has been discovered before that, at least in 2005:
         | 
         | https://techreport.com/news/8547/does-intels-compiler-crippl...
        
       | snvzz wrote:
       | RISC is less amenable to this category of BS.
       | 
       | Looking forward to RISC-V pushing x86 into retrocomputing
       | territory.
        
       | mechanical_bear wrote:
       | So it appears not only is this posting from 2019, but the most
       | recent information they reference is 2010. This seems to be no
       | longer relevant? I'd love it if submissions on HN had a small
       | blurb from the author explaining why their submission is
       | interesting/relevant.
        
         | shrx wrote:
         | It certainly is still relevant. Please read the article (and
         | the 2020 update below) before commenting on it.
        
           | mechanical_bear wrote:
           | I didn't state it was not relevant, I asked. Glad to see
           | there is an update. My main point stands though, I'd love to
           | see an explanation with posts.
        
           | Anunayj wrote:
           | yup, infact recently MATLAB applied a fix to this for their
           | software [1]
           | 
           | [1] https://www.extremetech.com/computing/308501-crippled-no-
           | lon...
        
             | MikePlacid wrote:
             | Thank you for the link. It helped me to find the actual
             | performance difference. It is significant:
             | 
             |  _AMD's performance improves by 1.32x - 1.37x overall...
             | changing what looked like a narrow victory [for Intel] over
             | the 3960X and a good showing against the 3970X into an all-
             | out loss._
             | https://www.extremetech.com/computing/302650-how-to-
             | bypass-m...
        
         | nonplus wrote:
         | There are 2020 updates around MKL (But you may be correct that
         | that content is about 2019 MKL optimizations).
         | 
         | At any rate though, based on Intel's track record I think this
         | content is still relevant and of value to engineers who don't
         | have domain knowledge in compilers or work downstream.
        
       | bee_rider wrote:
       | Has anyone tried a recent version of MKL on AMD? I assume they
       | were shunting AMD off into an AVX codepath because pre-Zen AMD
       | lacked AVX2 (well, Excavator had I guess...).
       | 
       | If they are sending Zen down the generic AVX2 codepaths by
       | default and those are competitive with, say, openBLAS, that seems
       | reasonable, right?
       | 
       | Hopefully BLIS will save us all from this kind of confusion
       | eventually.
        
       | penguin_booze wrote:
       | This sounds to me very much like VW cheat devices: detect the
       | current situation, and "act accordingly".
        
       | JoshTriplett wrote:
       | If Intel had shipped a library/compiler that _did_ just use
       | feature flags and didn 't check the CPU vendor, and the resulting
       | code used features that on AMD ran much more slowly than the
       | equivalent unoptimized code, would people blame AMD for the slow
       | instructions, or blame Intel for releasing a library/compiler
       | that they didn't optimize for their competitor's processor?
       | 
       | This isn't a hypothetical; quoting
       | https://en.wikipedia.org/wiki/X86_Bit_manipulation_instructi... :
       | 
       | > AMD processors before Zen 3[11] that implement PDEP and PEXT do
       | so in microcode, with a latency of 18 cycles rather than a single
       | cycle. As a result it is often faster to use other instructions
       | on these processors.
       | 
       | There's no feature flag for "technically supported, but slow,
       | don't use it"; you have to check the CPU model for that.
       | 
       | All that said, the _right_ fix here would have been to release
       | this as Open Source, and then people could contribute
       | optimizations for many different processors. But that would have
       | required a decision to rely on winning in hardware quality,
       | rather than sometimes squeezing out a  "win" via software even in
       | generations where the hardware quality isn't as good as the
       | competition.
        
         | jdsully wrote:
         | There are feature flags for formerly slow instructions that are
         | now fast. E.g. rep mov
         | 
         | https://www.phoronix.com/scan.php?page=news_item&px=Intel-5....
        
         | [deleted]
        
       | Certified wrote:
       | Last time this came up on Hacker News I discovered SolidWorks
       | 2021 was using an older MKL library that supports the
       | MKL_DEBUG_CPU_TYPE=5 environment variable. I'm on an AMD cpu and
       | measured a small solidworks fps and rebuild time improvement with
       | the flag enabled
        
         | eatonphil wrote:
         | The first comment seems to suggest that flag no longer works.
         | 
         | https://www.agner.org/forum/viewtopic.php?t=6#p82
        
           | Certified wrote:
           | Multiple versions of MKL dlls exist in the install directory
           | of Solidworks 2021. Indeed, the dlls supporting FloXpress and
           | simulation seem to be the updated MKL version that no longer
           | support the flag. However, the main executable only seems to
           | call sldmkl_parts.dll. It appears to be MKL version
           | 2018.1.156 that does support the flag
        
           | bee_rider wrote:
           | It would depend on the version of MKL. If Solidworks has
           | (just for example) statically linked to or bundled in an old
           | version of MKL, then it should work there, still.
        
       | dang wrote:
       | Related:
       | 
       |  _Intel 's "cripple AMD" function (2019)_ -
       | https://news.ycombinator.com/item?id=24307596 - Aug 2020 (104
       | comments)
       | 
       |  _Intel 's "Cripple AMD" Function_ -
       | https://news.ycombinator.com/item?id=21709884 - Dec 2019 (10
       | comments)
       | 
       |  _Intel 's "cripple AMD" function (2009)_ -
       | https://news.ycombinator.com/item?id=7091064 - Jan 2014 (124
       | comments)
       | 
       |  _Intel 's "cripple AMD" function_ -
       | https://news.ycombinator.com/item?id=1028795 - Jan 2010 (80
       | comments)
        
       | midjji wrote:
       | So blacklist intel compiler in favor or GCC and CLANG, seems
       | entirely reasonable!
        
       | phkahler wrote:
       | Worth noting that Intel has dropped their "old" compiler and the
       | newer "Intel" compilers are LLVM based. IMHO they will likely be
       | pulling similar anti-AMD tricks with it and they are keeping
       | their paid version closed source - which is allowed by LLVMs
       | license.
       | 
       | RMS was right that compilers should be GPL licensed to prevent
       | exactly this kind of thing (and worse things which are haven't
       | happened yet).
       | 
       | On another compiler related note, I find it insane that GCC had
       | not turned on vectorization at optimization -O2 for the x68-64
       | targets. The baseline for that arch has SSE2, so vectorization
       | has always made sense there. The upcoming GCC 12 will have it
       | enabled at -O2. I'd bet the Intel compiler always did
       | vectorization at -O2 for their 64bit builds.
        
         | pcwalton wrote:
         | > RMS was right that compilers should be GPL licensed to
         | prevent exactly this kind of thing (and worse things which are
         | haven't happened yet).
         | 
         | The problem with this is that it wouldn't solve the problem in
         | question: Intel would just have stuck with their old compiler
         | backend instead of LLVM.
         | 
         | Besides, LLVM wouldn't have gotten investment to begin with if
         | it were GPL licensed, since the entire reason for Apple's
         | investment in LLVM is that it wasn't GPL. Ultimately, LLVM
         | itself is a counterexample to RMS's theory that keeping
         | compilers GPL can force organizations to do things: given deep
         | enough pockets, a company can overcome that by developing non-
         | GPL competitors.
        
           | iib wrote:
           | From what I saw of him, he never said that it is impossible
           | to build non-GPL compilers, he said that the work free
           | software developers do should not "help" proprietary
           | software.
           | 
           | So yes, he basically said that if you want to develop a
           | proprietary compiler, it should cost you, and not take GCC as
           | a base to freeload. Intel basing their new compilers on LLVM
           | clearly saved them effort.
        
           | bayindirh wrote:
           | While clang is getting this support because it's not GPL,
           | it's also providing a well deserved competition for GCC, and
           | clang's presence woke the GCC devs to build a better
           | compiler.
           | 
           | All in all I avoid non-GPL compilers for my code, but I'm
           | happy that clang acted as a big (hard) foam cluebat for GCC.
           | 
           | In my opinion, we need a well polished GNU/GPL toolchain both
           | to show it's possible, and provide a good benchmark to
           | compete with. This competition is what drives us forward.
        
           | Beltalowda wrote:
           | > Besides, LLVM wouldn't have gotten investment to begin with
           | if it were GPL licensed. The entire reason for Apple's
           | investment in LLVM in the first place is that it wasn't GPL.
           | 
           | I don't think that's the case; Apple/LLVM actually offered to
           | sign over the copyright to the FSF, under the GPL; from
           | https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg00888.html
           | 
           | > The patch I'm working on is GPL licensed and copyright will
           | be assigned to the FSF under the standard Apple copyright
           | assignment. Initially, I intend to link the LLVM libraries in
           | from the existing LLVM distribution, mainly to simplify my
           | work. This code is licensed under a BSD-like license [8], and
           | LLVM itself will not initially be assigned to the FSF. If
           | people are seriously in favor of LLVM being a long-term part
           | of GCC, I personally believe that the LLVM community would
           | agree to assign the copyright of LLVM itself to the FSF and
           | we can work through these details.
           | 
           | The reason people worked on LLVM/clang is that GCC was (and
           | to some degree, is) not very good in various areas, and had a
           | difficult community making fixing those issues hard. There's
           | a reason a lot of these newer languages like Swift, Rust, and
           | Zig are based on LLVM and not GCC. See e.g. https://undeadly.
           | org/cgi?action=article&sid=20070915195203#p... for a run-down
           | (from 2007, I'm not sure how many of these issues persist
           | today; gcc has not stood still either of course, error
           | messages are much better than they were in 2007 for example).
           | 
           | GPL3 changed things a bit; I'm not sure Lattner would have
           | made the same offer with GPL3 around, but that was from 2005
           | when GPL3 didn't exist yet. But the idea that LLVM was
           | _primarily_ motivated by license issues doesn 't seem to be
           | the case, although it was probably seen as an additional
           | benefit.
        
         | mistrial9 wrote:
         | > x68-64 targets
         | 
         | thats a typo .. are you showing a case of AVX instructions not
         | generated by GCC? where are the details here? Is SSE2 from
         | twenty years ago?
         | 
         | https://en.wikipedia.org/wiki/SSE2
        
       | Kon-Peki wrote:
       | This has been discussed on HN before.
       | 
       | I don't condone Intel behavior, but let's be honest here: AMD
       | underinvests in software and expects others to pick up the slack.
       | That isn't acceptable.
        
         | hpcjoe wrote:
         | AMD[1], NVidia[2] do "make" their own compilers. AMD is
         | notorious for a "build it and they will come" mentality.
         | Despite the fact that this hasn't worked. AMD needs to make it
         | easy to adopt their hardware, and the way this is done is with
         | software.
         | 
         | When they finally get to the point that their driver/libs are
         | as easy to install as Nvidia's , it might be too late. I've
         | argued this with AMD folks before.
         | 
         | The barriers to adoption need to be low. Friction needs to be
         | low. They need to target ubiquity[3].
         | 
         | [1] https://developer.amd.com/amd-aocc/
         | 
         | [2] https://developer.nvidia.com/nvidia-hpc-sdk-downloads
         | 
         | [3] https://blog.scalability.org/2008/02/target-ubiquity-a-
         | busin...
        
           | Cloudef wrote:
           | AMD is best on the linux right now. But thats mostly thanks
           | to them opening up their hardware for driver developers.
        
             | hpcjoe wrote:
             | I was getting better performance out of the NVidia HPC SDK
             | compilers, but then again, the old PGI compilers it is
             | based upon (with an LLVM backend now), have always been my
             | go-to for higher performance code.
             | 
             | I've got some Epycs and Zen2s at home here, and I have both
             | compilers. Haven't done testing in recent months, but
             | they've been updating them, so maybe I should look into
             | that again. Thanks for the nudge!
        
           | ReleaseCandidat wrote:
           | > NVidia[2] do "make" their own compilers.
           | 
           | Actually Nvidia bought the Portland Compilers And Intel's
           | Fortran compiler is (has been, now its backend is LLVM) MS's
           | compiler via DEC/Compaq/and HP - MS Visual Fortran 4 -> DEC
           | Visual Fortran 5 -> Compaq Visual Fortran 6 -> Intel Visual
           | Fortran ;).
        
             | hpcjoe wrote:
             | I know about the PGI purchase ... was unaware of the Intel
             | link to MSFT. Huh.
        
           | salawat wrote:
           | You call Nvidia driver installation easy? Every bit of "ease"
           | about that is hardly Nvidia's doing.
        
             | hpcjoe wrote:
             | I'm not sure of what issue you have with my statement. For
             | me, it is a painless download + sh NVIDIA-....run. I have
             | mostly newer GPUs, though the 3 systems (1 laptop and 2
             | desktops) with older GTX 750ti and GT 560m run the nouveau
             | driver (as Nvidia dropped support for those).
             | 
             | Its a 13 year old laptop, and still running strong (linux
             | though). Desktops are Sandy Bridge based. The RTX2060 and
             | RTX3060 are doing fine with the current drivers. I usually
             | only update when CUDA changes.
             | 
             | But yeah, its pretty simple. I can't speak to non-linux
             | OSes generally, though my experiences with windows driver
             | updates have always been fraught with danger.
             | 
             | My zen2 laptop has an inbuilt Renior iGPU, and I use it
             | with the NVidia dGPU also built (GTX 1660ti). I leverage
             | the Linux Mint OSes packaging system there for the GPU
             | switcher. I run the AMD on the laptop panel and the NVidia
             | on the external display. Outside of weirdness with kernel
             | 5.13, I've not had any problems with this setup.
        
         | MereInterest wrote:
         | There's a variety of options that are available here, and I
         | don't buy the argument that AMD's behavior is automatically
         | unethical.
         | 
         | A. Company makes and sells hardware, and offers no software.
         | 
         | B. Company makes and sells uniquely featured hardware, and
         | offers software that uses those unique features.
         | 
         | C. Company makes and sells hardware that adheres to an industry
         | standard, and offers software that targets hardware adhering to
         | that standard.
         | 
         | D. Company makes and sells hardware that adheres to an industry
         | standard, then uses their position in related markets to give
         | themselves an unfair advantage in the hardware market.
         | 
         | Of these, options A, B, and C are all acceptable options. AMD
         | has traditionally chosen option A, which is a perfectly
         | reasonable option. There's no reason that a company is
         | obligated to participate in a complementary market. Option D is
         | the only clearly unethical option.
        
           | ReleaseCandidat wrote:
           | > AMD has traditionally chosen option A, which is a perfectly
           | reasonable option
           | 
           | AMD has optimized libraries https://developer.amd.com/amd-
           | aocl/ and their own compilers: https://developer.amd.com/amd-
           | aocc/
        
           | ncmncm wrote:
           | Intel's legitimate course is to make their CPUs run _actually
           | faster_ than the competition, instead of tricking people into
           | running slower code on the competition.
        
         | bri3d wrote:
         | I think it's more nuanced than that:
         | 
         | In the past, AMD just straight up had horrible software.
         | 
         | More recently, AMD have been investing more in open software,
         | probably with the goal that indeed, a community form and they
         | get "leverage" / ROI for their investment.
         | 
         | On the flip side, Intel invest heavily in high-quality but
         | jealously guarded and closed source software.
         | 
         | With this nuance, I'm not so sure it's clear cut which one is
         | "acceptable," and it's an interesting ethical question about
         | Open Source and open-ness in general.
        
           | midjji wrote:
           | AMD still has horrible software, compare cuda to whatever
           | crap AMD thinks you should use. Truth is its even hard to say
           | what their alternative is, not to mention how horribly poorly
           | they support what is, or at least should be their second if
           | not most important/lucrative target.
        
         | post_break wrote:
         | And Intel has sandbagged us with 4 cpu cores for ages, leading
         | to software that isn't being optimized for more cores. Suddenly
         | AMD starts pushing many cores with high single core performance
         | and Intel magically turns hyperthreading on for lower tier cpus
         | and starts putting out way more cores.
        
         | CalChris wrote:
         | AMD pays substantial royalties to Intel for x86.
         | 
         | https://jolt.law.harvard.edu/digest/intel-and-the-x86-archit...
         | 
         | However, this will become moot as even Intel is shifting
         | towards LLVM.
         | 
         | https://www.intel.com/content/www/us/en/developer/articles/t...
        
         | stavros wrote:
         | What should they have done instead? Built a compiler with a
         | "cripple Intel" function? So people would have to download the
         | executable that's fastest on their CPU, even though they use
         | the same instruction set?
         | 
         | The issue here is that they used a slower code path even on
         | CPUs that could run the faster one, just because they were made
         | by a competitor.
         | 
         | You say "AMD should have made their own compiler", but why?
         | What else should they have made? An OS? An office suite? Why?
        
           | benbenolson wrote:
           | Very likely, this was not done intentionally.
           | 
           | I think we can simply imagine a common scenario: some
           | employee working for Company X, developing a compiler suite,
           | and adding necessary optimizations for Company X's
           | processors. Meanwhile, Company Y's processors don't get as
           | much focus (perhaps due to the employee not knowing about
           | Company Y's CPUIDs, supported optimizations for different
           | models, etc.). Thus, Company Y's processors don't run as
           | quickly with this particular library.
           | 
           | Why does this have to be malicious intent? Surely it's not
           | surprising to you that Company X's software executes quicker
           | on Company X's processors: I should hope that it does! The
           | same would hold true if Company Y were to develop a compiler;
           | unique features of their processors (and perhaps not Company
           | X's) should be used to their fullest extent.
        
             | magila wrote:
             | No, this was definitely intentional. Intel is doing extra
             | work to gate features on the manufacturer ID when there are
             | feature bits which exist specifically to signal support for
             | those features (and these bits were defined by Intel
             | themselves!).
             | 
             | If they had fixed the issue shortly after it was publicly
             | disclosed it might have been unintentional, but this issue
             | has been notorious for over a decade and they still refuse
             | to remove the unnecessary checks. They know what they're
             | doing.
        
             | DiabloD3 wrote:
             | Thats not how these CPUs work.
             | 
             | The CPUID instruction allows software to query the CPU on
             | if an instruction set is supported. Code emitted by Intel's
             | compiler would only query if the instruction set exists if
             | the CPU is from Intel, instead of just always detecting.
             | 
             | AMD can choose to to implement (or not) any instruction set
             | that Intel specifies, and Intel can choose to implement (or
             | not) any instruction set AMD specifies, however, it would
             | in 100% of cases be wrong to check who made the CPU instead
             | of checking the implemented instruction set. AMD implements
             | MMX, SSE1-4, AVX1 and 2. Any software compatible with these
             | _must_ work on AMD CPUs that also implement these
             | instructions.
             | 
             | If AMD ever chooses to sue Intel over this (likely as a
             | Sherman Act violation, same as the 2005 case), a court
             | would likely side with AMD due to the aforementioned
             | previous case: Intel has an established history of
             | violating the law to further its own business interests.
        
               | mtklein wrote:
               | I'm with you generally, but having written some code
               | targeting these instructions from a disinterested third-
               | party perspective, there are big enough differences in
               | some instructions in performance or even behavior that
               | can sincerely drive you to inspect the particular CPU
               | model and not just the cpuid bits offered.
               | 
               | Off the top of my head, SSSE3 has a very flexible
               | instruction to permute the 16 bytes of one xmm register
               | at byte granularity using each byte of another xmm
               | register to control the permutation. On many chips this
               | is extremely cheap (eg 1 cycle) and its flexibility
               | suggests certain algorithms that completely tank
               | performance on other machines, eg old mobile x86 chips
               | where it runs in microcode and takes dozens or maybe even
               | hundreds of cycles to retire. There the best solution is
               | to use a sequence of instructions instead of that single
               | permute instruction, often only two or three depending on
               | what you're up to. And you could certainly just use that
               | replacement sequence everywhere, but if you want the best
               | performance _everywhere_, you need to not only look for
               | that SSSE3 bit but also somehow decide if that permute is
               | fast so you can use it when it is.
               | 
               | Much more seriously, Intel and AMD's instructions
               | sometimes behave differently, within specification. The
               | approximate reciprocal and reciprocal square root
               | instructions are specified loosely enough that they can
               | deliver significantly different results, to the point
               | where an algorithm tuned on Intel to function perfectly
               | might have some intermediate value from one of these
               | approximate instructions end up with a slightly different
               | value on AMD, and before you know it you end up with a
               | number slightly less than zero where you expect zero, a
               | NaN, square root of a negative number, etc. And this sort
               | of slight variation can easily lead to a user-visible
               | bug, a crash, or even an exploitable bug, like a buffer
               | under/overflow. Even exhaustively tested code can fail if
               | it runs on a chip that's not what you exhaustively tested
               | on. Again, you might just decide to not use these
               | loosely-specified instructions (which I entirely support)
               | but if you're shooting for the absolute maximum
               | performance, you'll find yourself tuning the constants of
               | your algorithms up or down a few ulps depending on the
               | particular CPU manufacturer or model.
               | 
               | I've even discovered problems when using the high-level C
               | intrinsics that correspond to these instructions across
               | CPUs from the same manufacturer (Intel). AVX512 provided
               | new versions of these approximations with increased
               | precision, the instruction variants with a "14" in their
               | mnemonic. If using intrinsics, instruction selection is
               | up to your compiler, and you might find compiling a piece
               | of code targeting AVX2 picks the old low precision
               | version, while the compiler helpfully picks the new
               | increased-precision instructions when targeting AVX-512.
               | This leads to the same sorts of problems described in the
               | previous paragraph.
               | 
               | I really wish you could just read cpuid, and for the most
               | part you're right that it's the best practice, but for
               | absolutely maximum performance from this sort of code,
               | sometimes you need more information, both for speed and
               | safety. I know this was long-winded, and again, I
               | entirely understand your argument and almost totally
               | agree, but it's not 100%, more like 100-epsilon%, where
               | that epsilon itself is sadly manufacturer-dependent.
               | 
               | (I have never worked for Intel or AMD. I have been both
               | delighted and disappointed by chips from both of them.)
        
             | djmips wrote:
             | I don't think you read the article. Go read it first before
             | you make your hypothesis. If it was as easy to fix as using
             | a environment variable (which no longer works) then it was
             | done intentionally.
        
               | bally0241 wrote:
               | I don't think the fact that it can be enabled/disabled by
               | environmental variable indicates malicious intent. It
               | could be as simple as that Intel doesn't care to test
               | there compiler optimizations on competitors' CPU's. If
               | have to distribute two types of binaries (one which were
               | optimized but could break, vs un-optimized and unlikely
               | to break), I would default over to distributing the un-
               | optimized version. Slow is better than broken.
               | 
               | I understand some end users may not be able to re-compile
               | the application for there machines, but I wouldn't say
               | its Intel's fault, but rather the distributors of that
               | particular application. For example, if AMD users want
               | Solidworks to run faster on their system, they should ask
               | Dassault Systemes for AMD-optimized binaries, not the
               | upstream compiler developers!
               | 
               | Anyways, for those compiling their own code, why would
               | anyone expect an Intel compiler to produce equally
               | optimized code for an AMD cpu? Just use gcc/clang or
               | whatever AMD recommends.
        
               | brasic wrote:
               | https://news.ycombinator.com/newsguidelines.html
               | 
               | > Please don't comment on whether someone read an
               | article.
        
             | colejohnson66 wrote:
             | The thing is: the bits to check for SSE, SSE2, ..., AVX,
             | AVX2, AVX-512? They're in the same spot on Intel and AMD
             | CPUs. So you don't need to switch based on manufacturer.
             | The fact that they force a `GenuineIntel` check makes it
             | seems malicious to many.
        
               | vlovich123 wrote:
               | All browsers pretend to be MSIE (and all compilers
               | pretend to be GCC). You'd think AMD would make it trivial
               | to change the vendor ID string to GenuineIntel for
               | "compatibility".
        
           | not2b wrote:
           | AMD should concentrate on making LLVM and GCC work great on
           | AMD processors, by contributing the needed code. They are
           | already making some contributions but could be doing more,
           | and they could be funding experts to work on that and giving
           | those experts the information they need.
        
             | ReleaseCandidat wrote:
             | They do. Actually their own (LLVM based) compilers are
             | about as fast as GCC and LLVM
             | 
             | https://www.phoronix.com/scan.php?page=article&item=aocc32-
             | c...
        
               | Jap2-0 wrote:
               | I don't know if it necessarily says much that their LLVM-
               | based compiler is about as fast as LLVM.
        
             | DiabloD3 wrote:
             | But they already do this. AMD is one of the largest
             | corporate contributors to LLVM and GCC.
             | 
             | It's Intel that tends to phone this in and make everyone
             | else pick up the slack.
        
               | jcranmer wrote:
               | Per
               | https://www.phoronix.com/scan.php?page=news_item&px=LLVM-
               | Rec..., Intel actually contributes (slightly) more to
               | LLVM than AMD does.
        
             | pedrocr wrote:
             | To fix this problem AMD would have to work on making LLVM
             | and GCC work great on _Intel_ processors. That would be the
             | only way to make people not use the Intel compiler for
             | extra performance and ending up with binaries that are
             | crippled for AMD. Clearly that 's not a solution for this
             | problem.
        
           | mhh__ wrote:
           | AMD's software offerings (e.g. look at uProf vs vTune) are
           | functional at best. Intel's are much easier to use, have a
           | lot more documentation, and actually make your life easier
           | versus having basically just a firehose of data.
        
         | amelius wrote:
         | I think it's great if a hardware company leaves the software
         | for others. This leads to open specifications.
        
           | ethbr0 wrote:
           | At the firmware / driver level, fully open specifications for
           | high performance hardware is an impossible dream.
           | 
           | At best, detailed documentation is a lower priority item
           | below "make it work" and "increase performance".
           | 
           | At worst, it requires exposing trade secrets.
           | 
           |  _Edit_ : It'd probably be more productive for everyone if we
           | set incentives and work such that the goal we want (compilers
           | that produce code that runs optimally on Intel, AMD, and
           | other architectures) isn't contingent on Intel writing them
           | for non-Intel architectures. (Said somewhat curmudgeonly,
           | because everyone complains about things like this, but also
           | doesn't really how insanely hard and frustratingly edge-case-
           | ridden compiler work is)
        
           | sedatk wrote:
           | No, just don't falsely market your product as fair or
           | neutral.
        
             | dodobirdlord wrote:
             | It's the Intel MKL, I don't think Intel has ever even
             | endorsed using it on other vendors CPUs, much less claimed
             | that it is "fair" or "neutral".
        
               | ReleaseCandidat wrote:
               | Well:                   On November 12, 2009 AMD and
               | Intel Corporation announced a comprehensive settlement
               | agreement to end all outstanding legal disputes between
               | the companies, including antitrust and patent cross
               | license disputes. In addition to a payment of $1.25B that
               | Intel made to AMD, Intel agreed to abide by an important
               | set of ground rules that continue in effect until
               | November 11, 2019.               Customers and Partners
               | With respect to customers and partners, Intel must not:*
               | [...]         Intentionally include design/engineering
               | elements in its products that artificially impair the
               | performance of any AMD microprocessor.
               | 
               | https://www.amd.com/en/corporate/antitrust-ruling
               | 
               | I like that 'in effect until November 11, 2019.' part :D
        
           | mhh__ wrote:
           | If Intel did that there probably wouldn't be a software suite
           | at all for their processors.
           | 
           | Compare to vTune just about all open source profilers are
           | either a bad joke or like programming in Basic in a C++ age.
        
         | IntelThrowaway1 wrote:
         | The thing that gets me about Intel's culture, as someone who
         | worked there, was that Intel as an organisation was completely
         | unable to actually accept they'd done anything wrong. Ever.
         | 
         | There are lots of cases where Intel has either screwed up or
         | done things that were unarguably anti-competitive. It happens
         | at every company, I don't like Uber, but I'm not going to blame
         | Uber today for the fuckery that Kalanick got up to.
         | 
         | In each case you could ask the Intel HR, or Intel senior
         | management what they thought about it and it was never Intel's
         | fault. The answers to any questions about this sort of stuff
         | would be full of pettifogging, passsive voice, and legalese.
         | The result was the internal culture was an extremely low trust
         | environment since you _knew_ people were willing to be
         | transparantly intellectually dishonest to further their
         | careers. I haven 't been there since Gelsinger arrived but I
         | hope that changes, I wonder how much it can change in the legal
         | environment we're in.
        
           | kmeisthax wrote:
           | I don't think this is dishonesty - it's auteur mentality. In
           | Intel's view, AMD was a second-source vendor that went rogue,
           | and gets to free-ride on their patents because Intel couldn't
           | be arsed to extend x86 to 64-bit. If they had their way,
           | they'd own the x86 ISA interface and all their competition
           | would be incompatible architectures that you have to
           | recompile for. Crippling AMD processors with their C compiler
           | wasn't dishonest, it was DRM to protect their """intellectual
           | property"""[0].
           | 
           | Gelsinger was the head designer on the 486, so he was around
           | during the time when Intel was obsessed with keeping
           | competition out of their ISA and probably has a case of
           | auteur mentality, too.
           | 
           | [0] In case you couldn't tell, I _really hate_ this word. The
           | underlying concepts are, at best, necessary evils.
        
       | holdenk wrote:
       | Huh I had wondered why I saw so many Python packages blacklist
       | MKL now I know why.
        
         | dbcurtis wrote:
         | The philosophy behind MKL is that each CPU vendor provides an
         | MKL for their CPU. If you expect to mix and match MKLs and
         | CPUs, you don't understand the goals of MKL.
        
           | monocasa wrote:
           | Are there any implementations of MKL other than Intel's?
        
             | ReleaseCandidat wrote:
             | No. There are AMD's AOCL and Apple's 'Accelerate', but of
             | subsets of the MKL only AFAIK.
             | 
             | https://developer.amd.com/amd-aocl/
             | https://developer.apple.com/documentation/accelerate
        
               | stephencanon wrote:
               | Accelerate and MKL have some overlap (notably BLAS,
               | LAPACK, signal processing libraries and basic vectorized
               | math operations), but each also contains a whole bunch of
               | API that the other lacks. Neither is a subset of the
               | other.
               | 
               | They both contain a sparse matrix library, but exactly
               | what operations are offered is somewhat different between
               | the two. They both have image processing operations, but
               | fairly different ones. Accelerate has BNNS, MKL has its
               | own set of deep learning interfaces...
        
           | wyldfire wrote:
           | Each CPU vendor or each CPU architecture? (genuinely asking,
           | I don't know how it's intended)
        
             | wmf wrote:
             | Each vendor. Intel BLAS (MKL) has Intel-specific
             | optimizations and AMD BLAS has AMD-specific optimizations.
             | 
             | Intel is still acting in bad faith by allowing MKL to run
             | in crippled mode on AMD. They should either let it use all
             | available instructions or make it refuse to run.
        
               | danieldk wrote:
               | The latest oneMKL versions have sgemm/dgemm kernels for
               | Zen CPUs that are almost as fast as the AVX2 kernels
               | (that require disabling Intel CPU detection on Zen).
        
             | bee_rider wrote:
             | The expectation in the HPC community is that an interested
             | vendor will provide their own BLAS/LAPACK implementation
             | (MKL is a BLAS/LAPACK implementation, along with a bunch of
             | other stuff), which is well-tuned for their hardware. These
             | sort of libraries aren't just tuned for an architecture,
             | they might be tuned for a given generation or even
             | particular SKUs.
        
               | hallway_monitor wrote:
               | I learned about this recently when trying to optimize ML
               | test architecture running on Azure. It turns out having
               | access to Ice Lake chips would allow optimizations that
               | should decrease compute time and therefore cost by
               | 20-30%.
        
               | bee_rider wrote:
               | Some AVX-512 stuff I guess?
               | 
               | AVX-512 had a rough rollout, but it seems like it is
               | finally turning into something nice.
        
           | ReleaseCandidat wrote:
           | That would be 'each CPU vendor provides an optimized BLAS
           | library for their CPU'. The problem is that Intel's MKL is
           | more than just BLAS.
           | 
           | But AMD does have its own optimized libraries:
           | 
           | https://developer.amd.com/amd-aocl/
        
       | marginalia_nu wrote:
       | I'm getting flashbacks to the AARD code and Microsoft's attempts
       | to sabotage DR-DOS.
        
       ___________________________________________________________________
       (page generated 2022-04-06 23:01 UTC)