https://www.matecdev.com/posts/numpy-julia-fortran.html [logo_full] Home Topics * Scientific Computing * Geospatial Analysis Languages * Python * Julia * Fortran About Julia vs Numpy vs Fortran Assembling a large matrix of complex exponentials I'm switching to Julia after running this comparison of Julia vs Numpy and Fortran, for performance and code simplicity. Scientific Computing Julia Python Julia: faster than Fortran, cleaner than Numpy Last updated: 2021-06-15 Julia is a pretty new language, which, among other things, aims to solve the so-called "two-language problem" in scientific computing. That is, we usually test ideas in a rapid-prototyping language like Matlab or Python, but when the testing is done, and its time to do some serious computation, we need to rely on a different (compiled) programming language. Many tools exist to ease the transition, and wrapping Fortran libraries into Python has been my preference so far. For example, wrapping up some Fortran with F2PY seems like a very convenient way to use (and distribute) efficient Fortran code that anybody can run. I also keep track of various ways of using Fortran in Python in this post. Now, Julia aims to solve this issue in a radical way. The idea is to use a single programming language, which has both an interactive mode, suitable for rapid prototyping, but that can also be compiled and executed at C/Fortran performance. Some of the most basic characteristics of the alternatives under consideration in this article are: Language Vectorization Code readability Multithreaded Numpy Manual Regular No Fortran Automatic Good Yes Julia Automatic Excellent Yes Julia vs Fortran vs Numpy: speed and code clarity summary Here are my conclusions after running this benchmark. * Numpy is restricted to single-threading in many cases, is hard to code and read, and can be much slower than the two alternatives. * Fortran offers great performance with pretty simple code, but some wrapping is required to call Fortran from high-level languages like Python. * Julia is faster and easier to write than Numpy and is even able to beat gfortran's performance. Honestly, I was shocked by what I found in my performance comparison. Julia is amazing. I didn't expect Julia would beat gfortran with compiler optimizations in my test. And it currently does so by a lofty margin. Code readability: the impact of manual vectorization vs for loops Opinions on code readability are, of course, subjective, but I want to explain why I think Numpy offers the worst quality experience. So far, interpreted languages have required manual re-writing of loops as vector operations. With some practice, this becomes and easy task, but after having done so extensively for many years (first in Matlab, then in Python), I came to the conclusion that I hate to manually vectorize code. I prefer to write for loops and have the compiler vectorize them for me! Consider the following Numpy code, for example. n = len(a) M = np.exp(1j*k*(np.tile(a,(n,1))**2 + np.tile(a.reshape(n,1),(1,n))**2)) Can you tell what it does, at first sight? Can you reason about the code's performance? Can you even guess the dimension of the output M? Now, here is a nested for loop which does the same as the above code. do j=1,n do i=1,n M(i,j) = exp( (0,1)*k * sqrt(a(i)**2 + a(j)**2) ) end do end do And here is the mathematical formula it represents: $$ M_{ij} = e^{i k \sqrt{a_i^2 + a_j^2}} $$ No wonder why they named Fortran "FORmula TRANslator" in the first place. Interestingly, the Julia syntax is exactly similar in this regard. So even though I agree with the idea than Python is a beautiful language to work with, I don't think the same holds for Numpy. Yes, Numpy allows us to stay within a Python framework and get things done with simple one-liners (which is pretty neat anyway) but the resulting Numpy code becomes hard to read afterward. My personalized micro-benchmark I believe there is no such thing as a definitive benchmark. Running a series of completely unrelated tests of toy problems, and claiming that a certain language is overall the best because it does well in most of the tests, is meaningless if we are interested in certain specific and more realistic cases. Rather than trying to rely on big, all-purpose benchmarks, I believe it is preferable to run, from time to time, some micro-benchmarks suited to our own projects. If we can predict what the major performance bottlenecks of our code will be,major performance bottleneck of our code We can then come up with a minimalistic, but spot-on example, and create a specific performance test that suits our needs. In my numerical projects, for example, I do a lot of array operations and I evaluate a lot of complex exponentials. So that's why this test is about just that. My benchmark was already described above, it's just: given a vector a, form the matrix $$ M_{ij} = e^{i k \sqrt{a_i^2 + a_j^2}} $$ Benchmark details and code All tests were run in an Intel i7-4700MQ CPU (3.40 GHz, 6MB of Cache) with 8 GB of DDR3 RAM The complete source code of the tests can be found here. For a quick reference, here is the complete Julia code: function eval_exp(N) a = range(0, stop=2*pi, length=N) A = Matrix{ComplexF64}(undef, N, N) @inbounds [email protected] for j in 1:N for i in 1:N A[i,j] = exp((100+im)*im*sqrt(a[i]^2 + a[j]^2)) end end return A end eval_exp(5) print(string("running loop on ", Threads.nthreads(), " threads \n")) for N in 1000:1000:10000 @time begin A = eval_exp(N) end end In case the above figure is hard to read, here are the actual numbers: N Numpy Fortran ST Julia ST Fortran MT Julia MT 1000 0.114 0.048 0.035 0.035 0.007 2000 0.204 0.191 0.126 0.054 0.031 3000 0.448 0.421 0.289 0.117 0.062 4000 0.778 0.756 0.554 0.205 0.116 5000 1.229 1.173 0.831 0.317 0.215 6000 1.741 1.686 1.108 0.456 0.274 7000 2.671 2.283 1.489 0.614 0.304 8000 4.190 2.994 1.948 0.884 0.402 9000 5.319 3.797 2.460 1.022 0.505 10000 6.847 4.895 3.039 1.353 0.616 note: ST and MT stand for single-threaded and multithreaded, respectively. Note that in all cases, the fast-math optimization flag was used, which allows compilers to reorder mathematical expressions to obtain accuracy, but in a way that could break IEEE compliance. Multithreading efficiency Interestingly, from the above table, we can compute the threading efficiency, that is, the speed-up factor when moving from single to multithreading. Threading efficiency In a quad-core CPU, Julia's Multithreading efficiency is about 4.9, whereas when relying on gfortran and OpenMP, we get about 3.6. So Julia is effectively taking advantage of hyperthreading, something which is rarely seen in OpenMP. N 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Fortran 1.389 3.520 3.601 3.682 3.701 3.697 3.717 3.386 3.714 3.618 Julia 4.897 3.976 4.660 4.769 3.866 4.036 4.890 4.844 4.866 4.928 Additional performance tweaks in Julia Surprisingly, the Julia performance story doesn't end up here. It turns out that Julia has further tools to increase performance. One of such tools is LoopVectorization.jl As suggested by Mason in this topic of the JuliaLang forum, at least the single-threaded version can be sped up quite significantly by resorting to that library. I'll continue to dig deeper on how to skeeze the last bit of performance out of computer hardware by resorting to Julia. Stay tuned! Related posts: * Why is Fortran still used. I had a fair degree of experience with Fortran, and I believe there are solid reasons to use Fortran for scientific computing today. * Will Julia Replace Fortran in HPC?. However, afeter comparing Fortran with Julia on various grounds, I tend to think that, over time, Julia seems on track to superseed Fortran, even in the HPC niche. [face2020] Welcome! I'm Martin D. Maas Ph.D, and in this website I share tutorials, software reviews, comments, and more, about mathematics, technology and software development. Read more... Table of contents: * Julia vs Fortran vs Numpy: speed and code clarity summary * Code readability: the impact of manual vectorization vs for loops * My personalized micro-benchmark + Benchmark details and code + Multithreading efficiency * Additional performance tweaks in Julia * Related posts: Copyright (c) Martin D. Maas PhD, 2021 Privacy policy | Cookie policy