[HN Gopher] Intel Distribution for Python
___________________________________________________________________
Intel Distribution for Python
Author : EntICOnc
Score : 90 points
Date : 2021-07-21 06:29 UTC (16 hours ago)
(HTM) web link (software.intel.com)
(TXT) w3m dump (software.intel.com)
| RocketSyntax wrote:
| is there a pip package?
| mushufasa wrote:
| There is a longstanding issue around MKL and OpenBLAS
| optimization flags making intel systems artificially faster than
| amd ones for numpy computations.
| https://stackoverflow.com/questions/62783262/why-is-numpy-wi...
|
| If there are true optimizations to be had, wonderful. But those
| should be added to core binaries pypi / conda. I am worried that
| Intel here may be trying to again artificially segment their
| optimization work on their math libraries for business rather
| than technical reasons.
| jxy wrote:
| That SO performance benchmark would be so much more useful if
| the OP had also run OpenBlas on the xeon.
| mistrial9 wrote:
| what, no Debian/Ubuntu ? _sigh_
| zvr wrote:
| Of course: echo "deb
| https://apt.repos.intel.com/oneapi all main" | sudo tee
| /etc/apt/sources.list.d/oneAPI.list
|
| You can read the "apt" section of the package managers, if
| that's what you prefer. https://software.intel.com/content/ww
| w/us/en/develop/documen...
| dsign wrote:
| Thanks for bringing out that link, I had had that nagging
| question about how specific Intel performance libraries were to
| Intel hardware. At least in this case, it seems not much.
| gnufx wrote:
| At least single-threaded "large" OpenBLAS GEMM has always been
| similar to MKL once it has the micro-architecture covered. If
| there's some problem with the threaded version (which one?),
| has it been reported like it would be for use in Julia? Anyway,
| on AMD, why wouldn't you use AMD's BLAS (just a version of
| BLIS). That tends to do well multi-threaded, though I'm
| normally only interested in single-threaded performance. I
| don't understand why people are so obsessed with MKL,
| especially when they don't measure and understand the
| measurements.
| vitorsr wrote:
| > those should be added to core binaries pypi / conda
|
| They have.
|
| PyPI:
|
| https://pypi.org/user/Intel-Python
|
| https://pypi.org/user/IntelAutomationEngineering
|
| https://pypi.org/user/sat-bot
|
| Anaconda:
|
| https://anaconda.org/intel
| mhh__ wrote:
| Do AMD even have optimized packages available? Don't get me
| wrong, I'm not a huge fan of what Intel get up to but AMD's
| profiling software is dreadful so I'm not exactly surprised
| that Intel don't even entertain the option.
| thunkshift1 wrote:
| What do you mean by 'artificially faster'?
| jchw wrote:
| Intel libraries whitelist their own CPUs for using certain
| extension instruction sets, instead of using the relevant CPU
| ID feature flag for that feature as their own documentation
| tells you to.
| jeffbee wrote:
| CPUID is insufficient. CPUID can tell you that a CPU has a
| working PDEP/PEXT, but it can't tell you that a CPU's PDEP
| _sucks_ like the one on all AMD processors prior to Zen3.
| sitkack wrote:
| The real answer is to do feature probing and benchmarking
| the underlying implementation. In the cloud you never
| really know the hardware backing your instance.
| jchw wrote:
| This argument crops up every time but it's irrelevant;
| MKL does and always has worked absolutely fine on AMD
| processors with the checks disabled, and no,
| reproducibility is not a feature of MKL that is enabled
| by default and it never was. Intel even had to add a
| disclaimer that MKL doesn't work properly on non-Intel
| processors after legal threats, and they still ran with
| that for literally years despite knowing it could just be
| fixed.
|
| When this first cropped up, I was using _Digg_.
|
| Edit: removed note that they fixed the cripple AMD
| function; they didn't, they actually just removed the
| workaround that made it easier to disable the checks; I
| was misinformed. Apparently now some software does
| runtime patching to fix it, including Matlab...
| jeffbee wrote:
| Yeah I don't think all the hacks are out, yet. But my
| point is only that the availability of some feature is
| not the only input to the decision to use that feature at
| runtime. Some of these conditions may look suspiciously
| like shorthand for IsGenuineIntel(), even if they are
| legit, like blacklisting BMI2 on AMD, because BMI2 on AMD
| was useless over most of its history.
| gnufx wrote:
| Recent MKL will generate reasonable code for Zen if you
| set a magic environment variable, but it was very limited
| (possibly only sgemm and/or dgemm when I looked). Once
| you've generated AVX2 with a reasonable block size,
| you're most of the way there. But why not just use a free
| BLAS which has been specifically tuned for your AMD CPU
| (and probably your Intel one)?
| user5994461 wrote:
| Nope, they removed support for the magic environment
| variable in the latest MKL release.
| pletnes wrote:
| From a practical perspective you have to use _some_ BLAS
| library. If there is a working alternative from AMD, it would
| be great if you share it. They did have one in the past
| although I don't recall its name.
| rshm wrote:
| Looks like recompilation. I am guessing gains are on numpy and
| scipy. For python heavy code base, i doubt it can be performant
| than pypy.
| bananaquant wrote:
| Quite unsurprisingly, this distribution has no support for ARM:
| https://software.intel.com/content/www/us/en/develop/article...
|
| I once was excited about Intel releasing their own Linux distro
| (Clear Linux), but it has the same problem. It looks like Intel
| is trying to make custom optimized versions of popular open-
| source projects just to get people to use their CPUs, as they
| lose their leadership in hardware.
| smoldesu wrote:
| "Their" CPUs meaning x86 platforms, in this case.
|
| Plus, who's surprised? This is how Intel _makes money_. The
| consumer segment is a plaything for them, the real high-rollers
| are in the server segment, where they butter them up with fancy
| technology and the finest digital linens. Is it dumb? A little,
| but it 's hardly a "problem" unless you intended to ship this
| software on first-party hardware which, hint-hint, the license
| forbids in the first place.
|
| At the end of the day, this doesn't really irk me. I can buy a
| compatible processor for less than $50, that's accessible
| enough.
| mistrial9 wrote:
| the capital model for cost recovery and earnings is one
| thing, but in the modern times, the amount of money that
| flows through Intel Inc. is not the same thing. Intel played
| dirty for long years to crush competitors, not "make money"
| like they need it.. "Greed is good" - remember that ? so,
| no.. apologists count your quarterly dividends but you have
| no platform for social advocacy here IMO
| stonemetal12 wrote:
| No, Their CPUs as in ones from Intel. Intel has long done a
| thing in their compilers where they detect the CPU model, and
| run less optimized code if it isn't Intel. They claim it is
| because they can't be sure "Other" processors have correctly
| implemented SSE and other extensions. So Intel Linux is going
| to run faster on an Intel CPU because it was compiled with
| ICC.
| zorgmonkey wrote:
| I don't know much about it, but Intel's clear linux does
| not use icc this is in their FAQ
| https://docs.01.org/clearlinux/latest/FAQ/index.html#does-
| it...
| Sanguinaire wrote:
| This is trivially easy to defeat, just so you know. If
| anyone reading is ever in need of optimized math library
| performance on AMD, just speak to your hardware/cloud
| vendor; they all know the tricks.
| klelatti wrote:
| Link says Core Gen 10 or Xeon so you may be out of luck on
| AMD or at less than $50.
|
| I think this is more likely aimed at AMD than Arm - don't
| think Arm is yet a threat in this space - and whilst they're
| entitled to do what they want it does make me less enthused
| about Intel and frankly more likely to support their
| competitors.
| mumblemumble wrote:
| AMD has their own equivalent:
| https://developer.amd.com/amd-aocl/
|
| I'm not sure it's a sin for hardware manufacturers to
| support their products? In the days of yore, we even
| expected it of them.
| gnufx wrote:
| Yes. The difference is that may be "theirs", but I think
| it's all free software. At least the linear algebra stuff
| is. They supply changes for BLIS (which seem not to get
| included for ages). Their changes may well be relevant to
| Haswell, for instance. I don't remember what the
| difference in implementation was for Zen and Haswell, but
| they were roughly the same code at one time.
| klelatti wrote:
| Not a sin but it's not really just about supporting (or
| optimising) their products, its about doing so whilst
| trying to increase the lock-in beyond what is achieved on
| performance grounds alone.
|
| I may be wrong but my experience is that AMD has been a
| bit better on this is the past e.g their OpenCL libraries
| supported both Intel and AMD whereas Intel's were Intel
| only.
| mumblemumble wrote:
| I would assume that's not entirely a fair comparison,
| though. Intel's 3D acceleration hardware only ever
| appears in Intel-manufactured chipsets, which only ever
| contain Intel-manufactured CPUs.
|
| AMD, on the other hand, also supplies Radeon GPUs for use
| with Intel CPUs. For example, that's the setup in the
| computer on which I'm typing this.
|
| So I have a hard time seeing anything nefarious there.
| The one is obviously a business necessity, while the
| other would obviously be silly. Perhaps that changes with
| the new Xe GPUs?
| klelatti wrote:
| Sorry, should have been clearer - Intel's CPU OpenCL
| drivers only supported Intel and not AMD whereas the
| AMD's CPU OpenCL drivers supported both - so GPUs not
| relevant in this case.
|
| I can see how if you've invested a lot in software you'd
| like to get a competitive advantage over your nearest
| rival so maybe a price we have to pay.
| vel0city wrote:
| I wonder what features are missing from a Comet Lake
| generation Pentium, those can be had for ~$70 these days.
| Other than the feature of the box says "Core" on it instead
| of "Pentium".
|
| EDIT: Ah, I found it, AVX2.
| mumblemumble wrote:
| I'm not sure I see why you would expect anything different? The
| entire point of this framework is to provide a bunch of tools
| for squeezing the most you can out of SSE, which is specific to
| x86.
|
| I don't know if there's an ARM-specific equivalent, but, if you
| want to use TensorFlow or PyTorch or whatever on ARM, they'll
| work quite happily with the Free Software implementations of
| BLAS & friends. If you code at an appropriately high level, the
| nice thing about these libraries is that you get to have
| vendor-specific optimizations without having to code against
| vendor-specific APIs. Which is _great_. I sincerely wish I had
| that for the vector-optimized code I was writing 20 years ago.
| In any case, if ARM Holdings or a licensee wants to code up
| their own optimized libraries that speak the same standard APIs
| (and assuming they haven 't already), that would be awesome,
| too. The more the merrier. How about we all get in on the
| vendor-optimized libraries for standard APIs bandwagon. Who
| doesn't want all the vendor-specific optimizations without all
| the vendor lock-in?
|
| Alternatively, if you would rather get really good and locked
| in to a specific vendor, you could opt instead to spam the CUDA
| button. That's a popular (and, as far as I'm concerned, valid,
| if not necessarily suited to my personal taste) option, too.
| mhh__ wrote:
| Alder Lake looks seriously impressive if the rumoured
| performance is even close to accurate, so I wouldn't count them
| out just yet - that being said, they will never get a run like
| they did over the last 10 years again.
| gnufx wrote:
| Clear Linux looked unconvincing to me. When I looked at their
| write-up, the example of what they say they do with
| vectorization was FFTW. That depends on hand-coded machine-
| specific stuff for speed, and the example was actually for the
| testing harness, i.e. quite irrelevant. I did actually run the
| patching script for amusement.
| agloeregrets wrote:
| I wonder who the person is who saw python and was like "You know
| what this needs? INTEL."
| amelius wrote:
| Maybe I'm missing something but it seems to me that this can only
| cause fragmentation in the Python space.
|
| Why not use the original distributions?
| gnufx wrote:
| Mystique (PR)?
| lbhdc wrote:
| There are a number of alternate interpreters available. The
| selling point typically is that they are faster, and seems to
| be the value proposition of intels.
|
| One use might be improving throughput of a compute bound
| system, like an etl written in python, with little effort.
| Ideally just downloading the new interpreter.
| amelius wrote:
| Ok. If they offer Python without the GIL then I'm all ears :)
| gautamdivgi wrote:
| I don't think python is ever going to get rid of the GIL. I
| haven't looked but there's two things that may speed things
| up quite a bit: - Use native types - Provide the ability to
| turn "off" the GIL if you know you will not be using multi-
| threading within a process.
|
| I guess that is my naive wish list for a short term speed
| up :)
| TOMDM wrote:
| A pythonic language that included something analogous to
| Golangs channels/goroutines would be my ideal.
| borodi wrote:
| Julia does have channels similar to those of Go, although
| if you want to call it pythonic or not is up to you.
| TOMDM wrote:
| I've seen hype for Julia over and over, but this is the
| first piece of information that's made me genuinely
| interested.
|
| Thanks for the heads up!
|
| EDIT: Oh god it's 1 indexed
| borodi wrote:
| While people discuss a lot about it, in the end 1
| indexing doesn't really matter. I think it comes from
| fortran/matlab.
| TOMDM wrote:
| I agree, it doesn't really matter, but I've been
| programming long enough that I can see it being that top
| step that's always half an inch too tall that I'm going
| to stub my toe on.
| borodi wrote:
| For sure, I switch between python, C/C++ an julia a lot
| and well, lets say bounds errors are pretty common for
| me.
| oscardssmith wrote:
| My advice would be to use begin and end. Then you don't
| have to think about the indexing.
| dec0dedab0de wrote:
| Jython doesn't have a gil, but It doesn't support
| python3, and I've never used it.
| gautamdivgi wrote:
| Jython would also have issues with the many c libraries
| that python code relies on today.
| [deleted]
| shepardrtc wrote:
| Numba might be what you're looking for:
| http://numba.pydata.org/
| _joel wrote:
| Why are they making their own distro and not putting code back
| into mainline if it's useful? Do they have some particular IP
| that makes this impossibe?
| SkipperCat wrote:
| I think there is a pretty big base of people who do big data
| work using Numpy and Pandas (Fintech, etc). They want to
| squeeze every bit of computing power out of the specific Intel
| chipset, GPUs, etc and Intel's distro really helps them out.
|
| A 10% speed improvement on 1000's of jobs could in theory save
| you a nice chunk of time. This becomes very important in the
| financial market where you need batch jobs to be finished
| before markets open, or you just want to save 10% on your EC2
| bill.
| gnufx wrote:
| 10% is around the noise level for HPC, especially for
| throughput depending on scheduling. I rather doubt you
| couldn't do the same as free software.
| Sanguinaire wrote:
| You are correct, nothing Intel provides in their Python
| distro cannot be obtained elsewhere - this is just a nice
| wrapper.
| LeifCarrotson wrote:
| Here's the list of CPUs which incorporate the AVX2 instructions
| that enable some of these optimizations:
|
| https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPU...
|
| You could write your distro to check for flags that will tell
| it whether or not you have these using flags from
| /proc/cpuinfo. Or you could check whether it's in the Intel
| half of the list or the AMD half of the list. Or you could
| write your own distro that only runs on the first half of the
| list.
|
| I get that Intel's contributions aren't purely altruistic.
| There are likely to be subtle tuning problems that require
| slight changes to optimize on different platforms, and they
| can't really be expected to do free work for AMD. But it looks
| to me like they're being unecessarily anticompetitive.
| falcor84 wrote:
| >being unecessarily anticompetitive
|
| Isn't setting up barriers to entry generally considered to be
| a part of healthy competition? I'd hazard to say that as long
| as a company is playing within the boundaries of what's
| allowed, there's nothing they could do that's
| anticompetitive; at the most, you could accuse them of being
| somewhat unsportsmanlike.
| dec0dedab0de wrote:
| _Isn 't setting up barriers to entry generally considered
| to be a part of healthy competition? _
|
| No, it is not. This is better described as vendor lock-in,
| than a barrier to entry. But vendor lock-in is also against
| healthy competition.
|
| Healthy competition means that users choose your product
| because it suits their needs the best, not because they are
| somehow forced to choose your product.
| DasIch wrote:
| Competition is desirable because it aligns with society's
| goals of innovation and progress which also imply increased
| productivity and lower prices.
|
| Artificial barriers to entry are contrary to that and if
| they're not illegal they should be.
| falcor84 wrote:
| Where do you define this line of barriers becoming
| 'artificial'?
| LeifCarrotson wrote:
| It's artificial when the vendor expends additional time,
| effort, or funds to construct a barrier, or chooses an
| equally-priced non-interoperable design that a rational,
| informed consumer with a choice would reject. If you're
| expending great effort to write custom DRM or to reinvent
| open industry standards that you could have installed
| cheaply, that's artificial.
|
| I fully admit that there are natural barriers that occur
| at times. I don't think that you should be expected to
| reverse-engineer your competitor's products and bend over
| backwards to make them work better.
|
| Here, for a concrete example, Intel had a clear choice to
| test whether a processor supported a feature by checking
| a feature flag - It's in the name, they're literally
| implemented for that exact purpose - or they could expend
| extra effort in building their own feature flag database
| by checking manufacturer and part number. They could have
| either expended extra effort to launch and distribute
| their own entire custom Python distribution, or submitted
| pull requests to the existing main distribution. For
| another example, Apple could have used industry-standard
| Phillips or Torx screws in their hardware: Manufacturers
| had lines to produce them, distributors had inventory of
| the fasteners, users had tools to turn them. Instead,
| they went to great expense to build their own
| incompatible tri-lobe screws, requiring probably millions
| of dollars in investment in custom tooling and production
| lines, all for the sake of creating an artificial
| barrier.
| Sanguinaire wrote:
| We could start with something similar to the concept of
| Pareto optimality; Intel could have delivered their
| maximum performance without preventing optimizations from
| being applied equally on AMD hardware, but instead they
| choose to disadvantage AMD without providing anything
| extra on top of what they could do while remaining
| "neutral".
| gnufx wrote:
| I don't know what Intel did for the proprietary version, but the
| first thing you should do for Python is to compile with GCC's
| -fno-semantic-interposition. I don't know if there's a benefit
| from vectorization, for instance, in parts of the interpreter, or
| whether -Ofast helps generally if so, but I doubt there's
| anything Intel CPU-specific involved if there is. I've never
| looked at it, has the interpreter not been well-profiled and such
| optimizations provided? Anyway, if you want speed, don't use
| Python.
|
| It's obviously not relevant to Python per se, but you get
| basically equivalent performance to MKL with OpenBLAS or,
| perhaps, BLIS, possibly with libxsmm on x86. BLIS may do better
| on operations other than {s,d}gemm, and/or threaded, than
| OpenBLAS, but they're both generally competitive.
| Rd6n6 wrote:
| > the Intel CPU dispatcher does not only check which instruction
| set is supported by the CPU, it also checks the vendor ID string.
| If the vendor string says "GenuineIntel" then it uses the optimal
| code path. If the CPU is not from Intel then, in most cases, it
| will run the slowest possible version of the code, even if the
| CPU is fully compatible with a better version.[1]
|
| I've been a little shy about using intel software since reading
| about this years ago
|
| [1] https://www.agner.org/optimize/blog/read.php?i=49
| ciupicri wrote:
| Python 3.7.4 when 3.10 is just around the block.
| TOMDM wrote:
| To me this just looks like Intel saw what Nvidia has accomplished
| with CUDA, locking in large portions of the scientific computing
| community with a hardware specific API and going "yeah me too
| thanks"
|
| Thankfully, accelerated math libraries already exist for Python
| without the vendor lockin.
| bostonsre wrote:
| Intel has been releasing mkl/math kernel libraries for Java for
| a really long time. Hopefully core python devs can learn a few
| tricks and similar changes can make it upstream.
| vitorsr wrote:
| You can easily try it yourself [1]: conda
| create -n intel -c intel intel::intelpython3_core
|
| Or [2]: docker pull
| intelpython/intelpython3_core
|
| Note that it is quite bloated but includes many high-quality
| libraries.
|
| You can think of it as a recompilation in addition to a
| collection of patches to make use of their proprietary libraries.
|
| Other useful links to reduce the noise in this thread: [3], [4],
| [5], [6].
|
| [1]
| https://software.intel.com/content/www/us/en/develop/article...
|
| [2]
| https://software.intel.com/content/www/us/en/develop/article...
|
| [3] https://www.nersc.gov/assets/Uploads/IntelPython-NERSC.pdf
|
| [4] https://hub.docker.com/u/intelpython
|
| [5] https://anaconda.org/intel
|
| [6] https://github.com/IntelPython
| tkinom wrote:
| Any benchmarks comparison data? For example:
| .... benchmarks with this python is XXX % higher than ... (std
| python, AMD, ARM)
| mumblemumble wrote:
| I haven't done a comparison in a long time, and, even then,
| it wasn't very thorough, so take this with a grain of salt.
|
| But, 6 years ago, when I was in grad school, just swapping to
| the Intel build of numpy was an instant ~10x speedup in the
| machine learning pipeline I was working on at the time.
|
| No idea if that's typical or specific to what I was doing at
| the time. I don't use MKL anymore because ops doesn't want to
| deal with it and the standard packages are already plenty
| good enough for what I'm doing nowadays. If you forced me to
| guess, I guess I'd have to guess that my experience was
| atypical.
___________________________________________________________________
(page generated 2021-07-21 23:00 UTC)