[HN Gopher] Is Fortran easier to optimize than C for heavy calcu...
       ___________________________________________________________________
        
       Is Fortran easier to optimize than C for heavy calculations?
        
       Author : jokoon
       Score  : 76 points
       Date   : 2023-09-21 09:26 UTC (13 hours ago)
        
 (HTM) web link (stackoverflow.com)
 (TXT) w3m dump (stackoverflow.com)
        
       | saagarjha wrote:
       | The difference is mostly not relevant these days. Yes, restrict
       | is important, but anyone who is optimizing numerical computing
       | hard is probably already using it judiciously, and for them what
       | is more important is what is familiar to them and makes them most
       | productive.
        
         | pjmlp wrote:
         | The keyword is _judiciously_ , without triggering UB, and I bet
         | most don't.
        
         | gnufx wrote:
         | Unfortunately I see rather few people doing serious optimizing
         | knowledgeably at all in research computing. (I know how and
         | where it's done well, of course.) That has global effects on
         | productivity of users, though some of the biggest gains can
         | come from more global optimization, like in the use of MPI.
        
       | theodorethomas wrote:
       | People mention the no-aliasing, the compilers, the intrinsics,
       | the libraries, and the expressivity but one aspect of the
       | difference is ignored and it is this: C is a language for
       | specifying behaviour of hardware (time sequence of detailed
       | states), Fortran is a language for specifying computation of
       | values. Fortran abstracts far more of the hardware than C and
       | consequently, a Fortran compiler can benefit from quantum
       | processors, mind-readers or time-machines, should they ever be
       | invented.
       | 
       | A Fortran program that reads its input, calculates and finally
       | writes out its output does not have to execute any particular
       | instruction at all. As long as the answer is "AS IF" it had done
       | the user-specified computation, the Fortran compiler has done its
       | job. In between I/O, it submerges into the ineffable like a Cold
       | War SSBN.
       | 
       | C is about the instruments, the players and the conductor,
       | Fortran is about the music.
        
         | dTal wrote:
         | Maybe this was once true, but the hardware that C was designed
         | for specifying the behavior of was a PDP-11. Nowadays you are
         | programming an abstract C-ish virtual machine that provides
         | certain semantic guarantees that don't necessarily map terribly
         | well to the physical hardware. For example, if you write to a
         | memory address, and then read from the memory address in the
         | "next instruction", you expect the change to be immediate, even
         | though the code is actually running on a pipeline that could be
         | dozens of instructions deep, with several layers of cache
         | between the core and system memory. So in a sense there's not
         | really a qualitative difference between C and Fortran - they
         | are _both_ for specifying a sequence of operations on an
         | abstract machine, relying on the compiler to implement that
         | machine - and indeed modern optimizing C compilers actually
         | provide very few guarantees about specific assembly
         | instructions to be executed, happily rewriting or omitting code
         | so long as it executes  "as if" it ran on the C virtual
         | machine.
         | 
         | See "C is not a low-level language" -
         | https://queue.acm.org/detail.cfm?id=3212479
        
           | AnimalMuppet wrote:
           | > you expect the change to be immediate, even though the code
           | is actually running on a pipeline that could be dozens of
           | instructions deep, with several layers of cache between the
           | core and system memory.
           | 
           | I expect the _hardware_ to handle cache coherency in that
           | situation. What the compiler does should be irrelevant.
        
             | dTal wrote:
             | Right, but the point is that the hardware is still "meeting
             | you halfway" to present the _appearance_ of something which
             | isn 't actually happening. Those pointers in C aren't
             | really "memory addresses" at all, they're keys in a key-
             | value store managed by the hardware to preset the illusion
             | of flat, contiguous memory, as mandated by the C
             | programming model.
             | 
             | So maybe it's accurate to say that C is "more compatible"
             | with real hardware, in the sense that its abstract machine
             | is more isomorphic to what's really happening than
             | Fortran's is. But it's not exactly "closer to hardware" in
             | the way we might be tempted to think; it's more of a lingua
             | franca that your processor happens to speak.
             | 
             | If you're still tempted to consider C "close to hardware",
             | consider that you can compile the same code for a Z80 and a
             | Threadripper. What hardware exactly are you controlling
             | that's common to both?
        
               | AnimalMuppet wrote:
               | > as mandated by the C programming model.
               | 
               | As PhilipRoman said, this is also true of assembly (or
               | _any other programming language model_ [1]).
               | 
               | > If you're still tempted to consider C "close to
               | hardware", consider that you can compile the same code
               | for a Z80 and a Threadripper. What hardware exactly are
               | you controlling that's common to both?
               | 
               | In both of them I can write to a memory-mapped I/O
               | device, if it has one. I can write a custom memory
               | allocator for a pool that I'm managing myself. I can't do
               | either of those in Fortran or Javascript.
               | 
               | [1] Why does it have to be true of any other programming
               | language model? Well, maybe I exaggerate slightly. But
               | can you show me a (single threaded) programming language
               | where "a = 1" does not mean that on the next line, a will
               | be 1?
        
               | kmstout wrote:
               | > But can you show me a (single threaded) programming
               | language where "a = 1" does not mean that on the next
               | line, a will be 1?
               | 
               | MIPS I.
               | 
               | https://en.wikipedia.org/wiki/Delay_slot#Load_delay_slot
        
               | gpderetta wrote:
               | >But can you show me a (single threaded) programming
               | language where "a = 1" does not mean that on the next
               | line, a will be 1
               | 
               | Generally agree with your point, but just to play the
               | devil's advocate, in a CPU with exposed pipeline and no
               | interlocks, setting a register to a value doesn't
               | guarantee that a following instruction reading from that
               | register will see the last value written.
        
           | theodorethomas wrote:
           | I don't see how C can match Fortran's abstraction level and
           | still reliably control hardware that uses memory-mapped I/O.
           | 
           | C, as an operating system implementation language, is trying
           | to do something fundamentally different than Fortran.
           | 
           | You live by memory address, you die by memory address.
        
           | PhilipRoman wrote:
           | > For example, if you write to a memory address, and then
           | read from the memory address in the "next instruction", you
           | expect the change to be immediate
           | 
           | This would also be true for assembly, hardly a high level
           | language
        
             | lmm wrote:
             | On modern CPUs assembly is a high level language (or
             | rather, it's a language that doesn't have any of the
             | advantages of traditional low-level languages, even if it
             | also lacks the advantages of traditional high-level
             | languages. Much like C)
        
         | coliveira wrote:
         | Yes, you're correct. C was created to control a CPU, it is a
         | low level language with a comfortable syntax. C abstracts the
         | hardware. But Fortran has nothing to do with hardware, it is
         | just a notation for computing matrix algorithms. Fortran can be
         | thought as a primitive APL. You can do all kinds of
         | optimizations in Fortran that you cannot do in C, because it
         | doesn't care or know about the underlying hardware.
        
         | mhh__ wrote:
         | That was maybe true for C in the seventies but there's
         | practically no difference anymore e.g. C has an as if rule too.
        
       | Alifatisk wrote:
       | This might be a bit irrelevant but,
       | 
       | my professor who teaches computer programming (who previously
       | worked as a nuclear physicist) loved mentioning Fortran whenever
       | possible.
       | 
       | He truly enjoy the language and believes it is well suited for
       | heavy computation. He said its used a lot for weather
       | forecasting.
       | 
       | He is quite old now so I don't know how true it is today, but he
       | made me more interested in Fortran and I might give it a shot
       | instead of C sometime soon.
        
         | nonameiguess wrote:
         | I haven't talked to him in a few years, but used to be good
         | friends with a meteorology researcher at Purdue and Fortran was
         | virtually all he knew, it's so heavily used. When I first
         | started out in GEOINT processing, a lot of our code (NRO code)
         | at the library level was still Fortran, but that was mostly
         | older libraries. Anything written after 2005 or so was
         | predominantly C++. I don't really keep up to date with the
         | constellation status since I haven't worked there in a while
         | now, but there were still a few orbital platforms older than
         | that back around 2017 or so and the code for handling them was
         | still largely Fortran. We never used Fortran directly, though,
         | only bindings for C++, though I did have to learn a tiny bit of
         | Fortran to hack on those once.
        
           | green-salt wrote:
           | This was the case for an atmospheric research facility's HPC
           | cluster which was pretty much running all fortran for the
           | calculations at least back in 2010.
        
         | dkarl wrote:
         | Professors will do weird things if they get a chance. I know of
         | a junior college where the intro to programming course used
         | Fortran77 in the late 2000s. A lot of kids signed up for it
         | because they needed a STEM elective and didn't want to take
         | math or science.
        
         | Joel_Mckay wrote:
         | There are always language specific features that match use-
         | cases. Languages like Fortran and Cobol are a niche area
         | specific to legacy systems.
         | 
         | Anyone who wants to stay employed in the private sector should
         | initially learn JavaScript, Java, Python, C and C++. Then a
         | shop specific choice like Scala (Alphabet), Go (Alphabet),
         | Erlang (telecom), PHP (Meta), Ruby (small time web devs), and
         | or Visual Studio .NET (small MS shops).
         | 
         | Your best advisor is talking with people currently active with
         | the company of interest. Some languages are very industry
         | specific (Amazon has its own certificates for the proprietary
         | stacks).
         | 
         | I would also recommend looking at job postings outside the
         | academic walled garden, and chaotic startups. Most academic
         | institutions have been sued many times due to the volume of
         | nonsense they feed kids. If you are from Engineering, things
         | like PLC Ladder-Logic and Verilog are often more important.
        
           | therealcamino wrote:
           | "Most academic institutions have been sued many times due to
           | the volume of nonsense they feed kids."
           | 
           | I've never heard of that happening, let alone many times to
           | most institutions.
        
             | Joel_Mckay wrote:
             | T
        
               | __mharrison__ wrote:
               | Can someone enlighten me as to what "T" means? First time
               | I've seen it.
        
               | jcranmer wrote:
               | Not sure, but the individual previously had a different
               | comment that was edited to be just "T".
        
               | Joel_Mckay wrote:
               | F
        
               | bananapub wrote:
               | so your weird unfounded assertion is also explicitly
               | unprovable? awesome, thanks for sharing.
        
               | Joel_Mckay wrote:
               | In general lexis nexus holds the records of former
               | students suing for BS employment promises.
               | 
               | You would be fairly surprised who showed up besides trump
               | u in the states.
               | 
               | It was a lot more common than most like to admit. =)
        
               | jcranmer wrote:
               | > Part of the settlements is an NDA.
               | 
               | If someone is sued, that creates a court record. Even if
               | the _terms_ of the settlement are not disclosed, that
               | there was a settlement is lodged in the court record, and
               | publicly accessible. Even if court records are sealed,
               | the existence of the case is not, and initial complaints
               | are rarely sealed.
               | 
               | In short, if suing universities because of what they're
               | teaching is routine, you should be able to point to court
               | cases. If you can't find them, then it's not because it's
               | secret--secret court cases don't exist--it's because they
               | don't exist.
        
               | tptacek wrote:
               | In the spirit of generosity, you might conclude that the
               | parent commenter simply has their wires crossed, and has
               | cases involving for-profit schools like University of
               | Phoenix confused with real universities. If you didn't
               | know much about higher education, you might think
               | UPhoenix was one; it's in the name.
        
               | Joel_Mckay wrote:
               | Nope, but thank you for your concern. The local class
               | action case was with a local school that privatized, lost
               | accreditation, and eventually became a trade school. The
               | other University case I alluded to was covered in the
               | news, and was notable indeed,
               | 
               | I'm just being careful here. Have a great day =)
        
               | jcranmer wrote:
               | And you're still refusing to tell us any names because
               | we're somehow lazy for not being able to correctly guess
               | what locality you're in, what time you're talking about,
               | and search local media history for that timeframe to
               | uncover the case you're referring to?
        
               | Joel_Mckay wrote:
               | T
        
               | jcranmer wrote:
               | Give me three case numbers of cases where former students
               | successfully sued the institution (cases that were
               | dismissed don't count--just because a case was filed does
               | not mean it is meritorious). Those case numbers should
               | make it easy for anyone to find the complete docket of
               | the case on PACER (if it's federal court) or applicable
               | state courts (well, to the degree to which states make
               | their dockets easily accessible).
               | 
               | You made the assertion; it's your job, not my job, to
               | provide evidence for your assertion, and I've put you on
               | notice as to the evidence that is satisfactory. If you
               | can't provide the evidence, I have no reason to believe
               | your assertion.
        
               | Joel_Mckay wrote:
               | T
        
               | jcranmer wrote:
               | > You would be fairly surprised who showed up besides
               | trump u in the states.
               | 
               | Okay then, who showed up? You imply you know who. It
               | shouldn't take any effort to name anybody, if you already
               | know them.
        
               | Joel_Mckay wrote:
               | T
        
               | [deleted]
        
               | tptacek wrote:
               | POC||GTFO, as the kids say.
        
         | kergonath wrote:
         | It is still very much used for weather forecasting (ECMWF has
         | components in it), as well as materials physics, quantum
         | chemistry, and things like finite elements and nuclear fuel
         | performance codes.
         | 
         | It is much, much better than C for all these applications.
         | Nobody in the field seriously considers C; all the non-Fortran
         | codes are in C++. Both the Fortran and C++ bits have Python
         | wrappers anyway.
        
       | mschaef wrote:
       | Back in school (mid-90's), I made a point of taking a FORTRAN
       | introductory class to go along with all the other work I was
       | doing in Pascal, C++, Lisp, etc.
       | 
       | For most of the class, FORTRAN felt mostly like a penalty box. It
       | was full of behaviors and limitations that dated straight back to
       | the early 1960's, if not earlier. It was at least easy to make it
       | do the things it could do.
       | 
       | Where my opinion of FORTRAN changed was in the last time that
       | class met. By that point, we 'knew' the language and the
       | instructor was on to more advanced topics, including
       | optimization. He made a point of how an optimizing FORTRAN
       | compiler could re-nest three nested loops for the purpose of
       | automatically improving memory access patterns. Vector Cray
       | machines (which implemented common looping patterns in hardware)
       | shipped with FORTRAN compilers that could take normal-looking
       | loops and turn them into code that optimized for the hardware.
       | 
       | I don't know if it's still the case (thanks to restrict and
       | better compilers), but he made a compelling point at the time for
       | the simplicity and restrictions of the language leading to more
       | options for optimization.
        
         | dralley wrote:
         | >I don't know if it's still the case (thanks to restrict and
         | better compilers)
         | 
         | Nobody (really) uses restrict, certainly not at scale.
         | 
         | This is especially obvious because when Rust came along (which
         | can effectively use restrict in most places, much like Fortran)
         | it uncovered a lot of serious bugs in LLVM's implementation for
         | restrict. That whole situation only really stabilized in the
         | past year or so.
         | 
         | Maybe GCC does a better job as a result of having gfortran.
        
       | jcranmer wrote:
       | Fortran has a lot more (N-dimensional) array logic intrinsic to
       | the language than C or C++ does, and in general, Fortran has a
       | fair number of "little" things that can add up for optimization:
       | 
       | * Fortran arrays do not alias by default. (This is the most well-
       | known, and is indeed repeated in the answers on StackOverflow)
       | 
       | * Fortran N-D array indexes are UB if they go out-of-bounds in a
       | way that isn't true for most other languages. In C, if you have a
       | declaration `int x[2][3];`, the expression `x[0][4]` is a legal
       | way to access one of the elements--it's only UB to go outside the
       | bounds of x as a whole, not individual rows. In Fortran, the
       | equivalent expression (well, modified because Fortran is column-
       | major) is illegal.
       | 
       | * Fortran has built in array expressions--you can express a
       | vector addition just by adding the arrays themselves. This makes
       | it easier to autovectorize.
        
         | LegionMammal978 wrote:
         | > In C, if you have a declaration `int x[2][3];`, the
         | expression `x[0][4]` is a legal way to access one of the
         | elements--it's only UB to go outside the bounds of x as a
         | whole, not individual rows.
         | 
         | Even though multidimensional arrays are laid out contiguously
         | in C/C++, it doesn't follow that indexing can traverse from one
         | row to the next. By C17 6.5.6/8, pointer-integer addition can
         | only create pointers to other elements (or one past the last
         | element) of the _same array object_ that the original pointer
         | came from. However, an int[2][3] array has two int[3] arrays as
         | elements, and each of those arrays has three int objects as
         | elements. The int objects are not all elements of the same
         | array (the standard 's definition of an array element,
         | 6.2.5/20, doesn't apply recursively), so you aren't allowed to
         | index between them with expressions like (x[0] + 4).
         | 
         | Compilers seem to be pretty lenient about this in practice,
         | though, only optimizing out out-of-bounds accesses when they
         | exit the complete object. But at least Clang's UBSan will
         | complain if you try to index between rows [0].
         | 
         | [0] https://godbolt.org/z/GYosP39Mc
        
         | pklausler wrote:
         | Fortran's _dummy arguments_ can (generally) be assumed in the
         | scope of a subprogram to not alias, whether arrays or not. The
         | burden is on the programmer of calls to procedures to not
         | associate any of its dummy arguments that are modified during
         | the call with data that might be associated with any other
         | dummy argument.
        
         | KeplerBoy wrote:
         | > In C, if you have a declaration `int x[2][3];`, the
         | expression `x[0][4]` is a legal way to access one of the
         | elements
         | 
         | Is it really? Does the C-compiler guarantee that your 6 ints
         | will be stored in a contiguous memory area?
        
           | jcranmer wrote:
           | Yes. A[i] in C is exactly equivalent to *(A + i).
           | (Incidentally, this means that 2[x] is a legal expression in
           | C).
        
             | KeplerBoy wrote:
             | Of course, that part is clear. The question is, if the two
             | rows are guaranteed to be contiguous in memory. If using
             | heap allocated memory, one could easily construct a
             | counterexample (i'm not sure how the stack allocated array
             | behaves, i guess it works because it's the sane choice, but
             | not strictly guaranteed):
             | 
             | int *x; int numRows = 2; int numCols = 3;
             | 
             | x = (int *)malloc(numRows * sizeof(int _));
             | 
             | for (int i = 0; i < numRows; i++) { x[i] = (int
             | _)malloc(numCols * sizeof(int)); }
             | 
             | // This could segfault x[0][4] = 42;
        
               | jcranmer wrote:
               | Yes, it's a multidimensional array, everything is
               | required to be consecutively allocated. (See 6.5.2.1 for
               | more details). `int[2][3]` is 6 integer objects
               | consecutively allocated (as 3 arrays of 2 arrays of
               | integers), not 3 pointers to arrays of 2 integers.
        
               | KeplerBoy wrote:
               | Thanks for pointing to the actual section. Good to know.
        
       | pipo234 wrote:
       | > The performance difference comes from the fact that Fortran
       | says aliasing is not allowed [...]
       | 
       | This is one of the legacies modern C++ is still wrestling with
       | today, supposedly solved by Rust, Zig, Val and the likes.
        
         | _flux wrote:
         | Hmm, I thought basically only Rust solves it. How do Zig and
         | Val solve it?
        
           | Conscat wrote:
           | Multiple-aliasing cannot exist in a language without
           | reference semantics, such as Val (or whatever its new name
           | is).
        
         | zik wrote:
         | It's true that there's no "restrict" inthe C++ standard,
         | however all the major C++ compilers have extensions for it
         | ("__restrict__") and have had for a long time.
        
           | patrec wrote:
           | Except that it doesn't work and is basically WONTFIX since no
           | C or C++ programmer ever uses it.
           | 
           | This is only very slightly exaggerated.
        
             | Conscat wrote:
             | I've never had issues with it, personally. In what way does
             | it not work?
        
               | lmkg wrote:
               | Rust has aliasing guarantees. In theory, Rust code can
               | pass noalias annotations to LLVM and become more
               | optimized.
               | 
               | In practice, it took _years_ for Rust access these
               | optimizations because the noalias annotations kept
               | hitting codegen bugs in LLVM. It was an area of the
               | compiler that had not seen much active use before Rust,
               | and they were causing a lot of dormant bugs to become
               | issues.
               | 
               | The feedback cycle was also very long because LLVM didn't
               | fix the problems quickly. I didn't pay close attention so
               | I don't know if this is prioritization or because those
               | fixes required major surgery, but Rust sometimes had to
               | wait for major version bumps in LLVM before seeing if
               | "this time was the last time."
        
               | vlovich123 wrote:
               | Yup. But it is true now though so hopefully they've added
               | the test cases to LLVM to make sure things continue to
               | work correctly in the face of more and more optimizations
               | being thrown at code.
        
               | Measter wrote:
               | Not just in theory. Rust does now mark almost all
               | references as `noalias`, the exception being `&T` where
               | `T` contains an `UnsafeCell`. The equivalent safe Rust
               | function signature[1] would have all four references
               | marked `noalias`, even though the `input` and `matrix`
               | slices could alias.
               | 
               | How much the optimizer can take advantage of that is
               | another matter, due to what you said. Doing a quick
               | translation to more idiomatic Rust[2], it does hoist the
               | accesses to `matrix` out of the hot loop, and does also
               | seem to be a bit less moving stuff around compared to the
               | C version, which I think is putting the `matrix` values
               | in the right places, which Rust did before the loop.
               | 
               | [1] fn transform(output: &mut [f32], input: &[f32],
               | matrix: &[f32], n: &i32);
               | 
               | [2] https://godbolt.org/z/5PPn3eh19
        
               | Conscat wrote:
               | That's interesting. It also looks like with the
               | `__restrict` specifier (https://godbolt.org/z/cvaYoc6zh),
               | the clang code is similar regarding the matrix
               | multiplication. The body of the vectorized loop itself
               | looks identical, `.LBB0_4` in clang and `.LBB1_8` in
               | rustc.
        
             | Blackthorn wrote:
             | We use it all the time in audio code. How do you get the
             | idea that it doesn't work?
        
               | patrec wrote:
               | Look at the gyrations the Rust compiler team has been
               | going through for _years_ to be able to actually use LLVM
               | 's noalias (which is a very desirable optimization in
               | Rust where the borrow checker statically guarantees
               | absence of aliasing in a lot of code).
               | 
               | I haven't kept up with the current state of things, but
               | if you can finally liberally use noalias in LLVM without
               | having to fear miscompilations, you probably have solely
               | the increased significance of Rust to thank for that.
        
               | Blackthorn wrote:
               | We use gcc so honestly I'm not sure how the difference
               | goes. I'm glad for Rust making the whole ecosystem more
               | usable regardless.
        
               | kergonath wrote:
               | GCC had to spend some time getting things correctly
               | because of gfortran.
        
           | Conscat wrote:
           | https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-
           | Pragmas.htm...
           | 
           | The #pragma GCC ivdep annotation also enables this in loops.
           | C23 also has the [[unsequenced]] attribute, which supports
           | pointer arguments for optimizations like these, unlike
           | [[gnu::const]].
        
         | fsckboy wrote:
         | fortran is call by reference, it's all aliasing, you can even
         | in some implementations change the value of literals
         | 
         | c is call by value, you have to at least try to alias
        
           | gpderetta wrote:
           | my understanding is that in fortran, if two arguments alias
           | each other, it is UB. So it is not that fortran doesn't have
           | aliasing, it is just that the optimizer assume that it
           | doesn't (but don't quote me on this!).
        
             | Bostonian wrote:
             | Arguments are allowed to alias if they are both inputs that
             | will not be changed in the function -- technically if they
             | are both "intent(in)".
        
               | pklausler wrote:
               | INTENT(IN) is not required to allow aliasing of
               | unmodified dummy arguments.
        
               | kergonath wrote:
               | Yes, but in practice no compiler checks whether an
               | argument with no INTENT is modified or not, or could
               | alias with something else or not. So the compilers tend
               | to trust the programmer. With INTENTS, it's obvious and
               | much easier to check.
        
               | pklausler wrote:
               | Dummy argument aliasing has nothing to do with INTENT.
        
           | gnufx wrote:
           | Fortran does _not_ call by reference, and the term aliasing
           | isn 't used by the standard. It calls by what's usually
           | called value-return or copy-in/copy-out, according to the
           | "storage association" rules in the standard.
           | 
           | While it may be true that the rules aren't generally checked
           | by compilers, Toolpack did it for pure Fortran77 in the
           | 1980s, though I don't know how comprehensive it actually was.
        
           | bregma wrote:
           | It took me days one time to figure out why 0 had adopted the
           | value four.
           | 
           | The technicality is that Fortran is call by descriptor (not
           | by reference, not by value). That's a powerful difference but
           | a sharp two-edged sword.
        
             | kaba0 wrote:
             | Could you please explain how is it differs from the other
             | two? Never looked into Fortran.
        
               | kergonath wrote:
               | The Fortran standard does not specify anything about
               | passing by references or value. Instead, it specifies the
               | expected results. The compilers do it however they want
               | or can, which means that they can pass pointers, values,
               | copies, or pointers to copies depending on the situation.
               | 
               | Sometimes (most of the time, to be fair), the compiler
               | can work out the best solution. Sometimes, the developer
               | has to jump through some hoops to avoid unnecessary
               | copies.
        
       | bregma wrote:
       | Fortran is column-major and C is row-major. That difference alone
       | makes vectorization in Fortran much much more frequently used.
        
         | defrost wrote:
         | As a 60 year old C | Fortran coder since ~1980 or so, that
         | difference makes near zero practical difference to numerical
         | computing.
         | 
         | Pretty much any vector|matrix operations will see row against
         | column (or colum against row) operations in equal proportion -
         | what you gain on the swings you lose on the roundabouts
         | whichever way you choose to orientate storage.
         | 
         | Indeed, given large sparse arrays, language specific default
         | array storage methods might not even be used - and here C is
         | (arguably) more flexible at building task specific data
         | structures - a column(?) of _pointers_ to row fragments
         | (perhaps).
         | 
         | The big advantage early FORTRAN had was baked in anti alias
         | assumptions and a truckload of pre existing relatively
         | accessible NIST | NASA grade quality numerical algorithms that
         | had been crawled over, eyeballed, tested, and pounded in
         | simulations by _many_ qualified applied scientists.
        
           | AnimalMuppet wrote:
           | > The big advantage early FORTRAN had was baked in anti alias
           | assumptions and a truckload of pre existing relatively
           | accessible NIST | NASA grade quality numerical algorithms
           | that had been crawled over, eyeballed, tested, and pounded in
           | simulations by many qualified applied scientists.
           | 
           | Yeah. That Fortran library that has a solver for sparse
           | complex matrices that won't produce nonsense if you give it a
           | stiff problem, that has 50 years of people pounding on it to
           | make sure that it's rock solid? I'm not sure that there's a
           | C/C++ equivalent for it. I _absolutely_ cannot write an
           | equivalent. Yes, I can write a matrix solver that takes
           | complex data. No, I can 't write one that is as efficient, as
           | numerically stable, or as bug-free as what already exists in
           | Fortran.
        
           | bregma wrote:
           | As a 60 year old C and Fortran coder since ~1980 or so, and
           | who has spent the better part of the last decade supporting
           | compilers and their internals, I say it makes a big
           | difference when it comes to autovectorization optimizations.
           | But that's my experience based on most users being naive and
           | know-enough-to-be-dangerous programmers, and your experience
           | may differ.
        
         | aap_ wrote:
         | This is down to convention though. You decide how you interpret
         | your data in C.
        
           | bregma wrote:
           | And the compiler determines how to implement your decisions.
        
       | andy99 wrote:
       | In terms of being easier (for the programmer, not the compiler)
       | one thing worth mentioning is that Fortran already has optimized
       | intrinsics for matmul and dot_product. I was recently doing a
       | small comparison between C and Fortran for matrix-vector
       | multiplication and was surprised how fast the matmul is "out of
       | the box" vs handwritten loops. Personally I also think that for
       | code that is linear algebra heavy Fortran is easier to work with
       | because of the intrinsics - it's analogous to numpy, whereas C
       | takes more effort, mainly keeping track of sizes of things.
       | 
       | https://gist.github.com/rbitr/3b86154f78a0f0832e8bd171615236...
        
         | physicsguy wrote:
         | True for small matrices but above certain size you wouldn't
         | want to use these methods or loops anyway, and they're not
         | optimised if your matrix is sparse either. So practically you
         | end up using a library and you're back to basically being
         | equivalent with C.
        
           | andy99 wrote:
           | It's possible to link in an external BLAS library that will
           | be used by the intrinsics instead. See https://fortran-
           | lang.discourse.group/t/compiler-option-matmu...
           | 
           | But yes I agree, at some point it's always going to be back
           | to something more customized.
        
             | microtherion wrote:
             | And this is where the performance starts converging again,
             | because BLAS is also available from C. So if you can
             | express your computation in terms of BLAS primitives, it
             | does not matter much from what language you're calling
             | them.
        
               | kergonath wrote:
               | Yes, but you retain Fortran's expressiveness. You can
               | write a matrix product A = x * B, regardless of whether
               | the matrix multiplication is handled by a compiler's
               | multiplication function, any flavour of BLAS, or
               | ScaLAPACK over MPI. I don't think Fortran has that much
               | of an edge in performance compared to decently optimised
               | C, but the Fortran code to get there is much more
               | readable and easier to grasp. And with much fewer
               | footguns (think allocatables versus pointer arithmetic).
        
           | leereeves wrote:
           | The Fortran matmul intrinsic doesn't use an optimal
           | algorithm?
        
             | kergonath wrote:
             | The compiler is responsible for the implementation of the
             | matmul intrinsic. Most of them have a "decent for small
             | matrices and easy to inline" version by default and an
             | option to convert it to call to BLAS under certain
             | conditions (like larger matrices). Then, the performance is
             | that of your BLAS library. The assumption (quite good in my
             | experience) is that people use the intrinsic naturally
             | because it is easy and natural, but turn to BLAS calls when
             | performance becomes important. Which is not the case for
             | the vast majority of matrix multiplication, even in high
             | performance computing.
             | 
             | There is no universal optimal algorithm.
        
             | duped wrote:
             | Even theoretically this is only possible if the dimensions
             | and sparseness are known at compile time, which can be
             | pretty heavy assumptions.
        
               | jcranmer wrote:
               | The implementation of every BLAS implementation I'm aware
               | of basically looks like this:                 if (dims in
               | range A)         gemm_implA       else if (dims in range
               | B)         gemm_implB       else if (dims in range C)
               | gemm_implC       // several more implementations...
               | 
               | The immediate call function is invariably a switch over
               | several implementations, and by the time you reach that
               | call, you have actual knowledge of dimensions.
        
       | pklausler wrote:
       | Subroutine and function "dummy arguments" are basically
       | restricted references. When analyzing data dependences in the
       | body of a subprogram, a compiler doesn't have to worry about
       | writes to one dummy argument affecting any reads to or writes
       | from another dummy argument. The burden is on the programmer of a
       | procedure's caller to ensure that any dummy argument that is
       | modified during the call does not alias with any other dummy
       | argument. This is the big one.
       | 
       | Fortran >= 90 has ALLOCATABLE variables (and now components),
       | which are dynamically managed objects that are akin to
       | std::unique_ptr<> or std::vector<> smart objects. A compiler can
       | assume that an ALLOCATABLE object is free from aliasing, unlike a
       | POINTER.
       | 
       | Fortran does have POINTERs, but they can only point to objects
       | that have been explicitly tagged with the TARGET attribute.
       | Objects that are not TARGETs are safe from aliasing with
       | POINTERs. Fortran does still not have the concept of a pointer to
       | immutable data.
       | 
       | Do any of these still yield performance advantages over C/C++?
       | Probably not.
        
       ___________________________________________________________________
       (page generated 2023-09-21 23:01 UTC)