hngopher.com

       [HN Gopher] How to write better scientific code in Python?
       ___________________________________________________________________
        
       How to write better scientific code in Python?
        
       Author : EntICOnc
       Score  : 68 points
       Date   : 2022-02-19 14:47 UTC (8 hours ago)
        
 (HTM) web link (zerowithdot.com)
 (TXT) w3m dump (zerowithdot.com)
        
       | zmmmmm wrote:
       | I think the title is a bit provocative / incorrect. There is
       | nothing better about the scientific result of this.
       | 
       | The title could be, "How to improve engineering of scientific
       | code".
        
       | tpoacher wrote:
       | For some reason, this post reminded me of Fizzbuzz Enterprise
       | Edition [0]
       | 
       | [0]
       | https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
        
       | kortex wrote:
       | Excellent to see functional programming ideas like deferred
       | computation and clean interfaces make their way into the
       | scientific computing space.
       | 
       | One thing though: _I_ know these are good ideas. But to someone
       | not as familiar with these patterns, they may wonder  "why go
       | through all this trouble?"
        
         | Frost1x wrote:
         | >One thing though: I know these are good ideas. But to someone
         | not as familiar with these patterns, they may wonder "why go
         | through all this trouble?"
         | 
         | It's not only _why_ , it ignores the fact that the vast
         | majority of scientific code is written for the science. Usually
         | you're already dealing with layers of abstract theory in the
         | science you're working in, you often don't want to deal with
         | additional cognitive load of then abstracting that for a
         | computer because that's now a new problem. If you take this
         | approach in scientific software, unless you're a commercial
         | vendor implementing some battle tested theory with a paying
         | market, you're going to quickly find yourself without a job--
         | that or you're a wizard at efficiently abstracting abstract
         | theory quickly. I've yet to see someone more efficiently
         | abstract such an implementation compare to simply implement a
         | working, good enough, concrete implementation. You only
         | abstract and optimize what really _has_ to be or will clearly
         | result in a net savings of resources for your specific project
         | context.
         | 
         | The vast majority of scientific code isn't written to be robust
         | _by design_ its a known cost cutting measure to meet budgetary
         | constraints, it 's written for a few target goals, not
         | generalization. I've worked in scientific computing and applied
         | science for quite awhile and the ideas like this just don't
         | make sense in most contexts. From a software design
         | perspective, sure, it makes complete sense but it's just
         | unnecessary additional complexity and overhead in _most_
         | contexts. People with software backgrounds often walk into
         | scientific computing with naivity but often good intent that
         | practices should mirror enterprise software approaches and it
         | 's just flat out misguided.
         | 
         | A lot of theory is iterative with a short half-life so to
         | speak, so any code you write surrounding it as a grounding
         | basis will often become outdated quickly. You're often not
         | implementing a method or function like the toy problem here of
         | computing expected value which is a common shared statistical
         | idea that will long outlive the majority of the research you're
         | involved in. You'll instead be writing something highly
         | specific to some toy new theory/model that may later be found
         | to be false or iterated on to find an all together better
         | abstraction you need to again abstract in your software.
         | 
         | You're going to be using these sort of clean robust numerical
         | abstractions, like expected value, when you need them though
         | because they already exist and you can simply call them. It's
         | the unique aspect tightly tied to the research (which is often
         | inherently unique in terms of software it needs) you're doing
         | that will be slapped together rapidly. The vast majority of it
         | will be tossed away. If you hit something successful, _that 's
         | when_ it's time to start thinking about these ideas because now
         | you should consider refactoring your nice generalizable
         | sharable theory others find value in and can use to such a nice
         | clean implementation, then throw your existing prototype into
         | the fires of hell where it belongs. There's usually not a lot
         | of funding for that though and that's where commercial
         | industries come in to scoop up publications and create nice
         | clean efficient implementations of said theory they roll into
         | some computational package to sell to people.
        
           | pid-1 wrote:
           | I don't think that's exclusive to academia.
           | 
           | Business also need to choose wisely when to make ugly POCs
           | for prototyping and when to create robust products / libs to
           | save money long term.
           | 
           | It's not uncommon for labs to have frameworks and internal
           | libs to aid prototyping and experimenting.
        
           | sharikous wrote:
           | > If you hit something successful, that's when it's time to
           | start thinking about these ideas
           | 
           | I agree with you completely. But the article did not specify
           | the use case of these guidelines. They are not to be applied
           | (in my opinion) for research code when you quickly need to
           | publish something. They can be useful however when your
           | already proven and battle tested ideas are used by other
           | people. For example for keeping a shared code base inside a
           | lab, or when you want to provide a robust implementation on
           | top of your ideas.
        
         | mattkrause wrote:
         | I thought the deferred part could be better.
         | 
         | For example, one easy win would be to replace the list
         | comprehensions with a generator: there's no point in allocating
         | that entire list just so that statistics.mean can iterate over
         | it.
         | 
         | At the opposite end, the switch to a Distribution class also
         | enables a _huge_ speedup: keep the sampling machinery for
         | situations where you actually need (Cacuhy?), but while
         | die.expected_value() returns (self.n_sides+1) /2, which is
         | effectively free.
        
       | mjburgess wrote:
       | I think to any audience other than fairly hardcore software
       | engineers this is going to read a little... mad. The first code
       | example was exceptionally clear, from then on, we get
       | increasingly incomprehensible. "Better" it isnt.
       | 
       | Scientific code is _often_ write-once, run-once, done. And since
       | mathematics /statistics is largely already formalised at the
       | level of distributions, sampling, and so on -- we shouldn't
       | expect to need to "software engineer" this type of code.
       | 
       | This article should establish a specific audience it has in mind,
       | presumably data scientists in long-runing scientific projects who
       | need to write generic maintable code to be shared across the
       | organization. _Then_ the article should establish when the first
       | example is _GOOD_ , and when it fails _in this specific use
       | case_.
        
         | atoav wrote:
         | I think _some_ degree of abstraction _can_ lead to more clarity
         | - especially if it hides a bunch of crap, that is done over and
         | over again throughout the code. But these abstractions need to
         | be very well chosen, clearly named and unless they _really_
         | give you more clarity (e.g. by hiding some confusing details to
         | bring the general point you are trying to make across more
         | clearly) I would advice against them.
         | 
         | If your scientific code is meant to be used in actual software
         | or if you want to write tests for the functions used, a little
         | bit more abstraction might be a good idea, however.
        
           | mattkrause wrote:
           | I wish the article had discussed abstraction more, because
           | it's especially tricky in "scientific" code.
           | 
           | The initial abstraction is actually pretty good,
           | conceptually. You have a function that returns a value, like
           | roll(sides=6), or an object you can manipulate to get a
           | value: Die(sides=6).roll(). You then take those samples,
           | somehow analyze them, and get a result. That code matches
           | people's mental model very well.
           | 
           | If you were going to do this in C++, I'd stop here. Things
           | only get more complicated because Python (Matlab and R too)
           | penalize you _heavily_ for leaving the BLAS sandbox.
        
         | bmitc wrote:
         | > Scientific code is often write-once, run-once, done.
         | 
         | I don't buy that in the sense that you mean. In my experience,
         | code generally lives on no matter what, and code that remains
         | write-once is often because no one can read it and thus update
         | it (i.e., write it again). I've seen a decent amount of
         | scientific code, and although it was never meant to be ran more
         | than once by anyone other than the original author, it almost
         | always was done so. And because it wasn't designed well,
         | written well, or even commented well, code like that basically
         | becomes this immutable artifact that people simultaneously
         | don't want touched but also want it updated and keep on
         | running.
         | 
         | I did an REU in mathematics back in the day where I wrote a
         | decent amount of MAPLE code. I wasn't even a software engineer
         | at the time, since I was a math and EE major. I honestly had
         | little concept of what computer science even was and was not a
         | programmer. It was not hard at all to write decent code that
         | was commented and organized.
         | 
         | The problem is that most scientists, in my experience, view
         | software and code as somewhat below them. It's like a hammer
         | and a box of nails to them.
         | 
         | I inherited someone's Ph.D. dissertation Python code that had
         | lived on across several labs. It had basically reached the
         | "needs a rewrite" state because it was impossible to get it to
         | run. It was stuck on Python <2.7 and basically couldn't be
         | upgraded without replacing a lot of it because the Python
         | distribution it was based on had become deprecated and none of
         | the packages could be upgraded without changing a lot of the
         | code. I'm not sure I ever saw a single comment. The person who
         | originally wrote the code actually ended up working at one of
         | the major scientific Python shops, somewhere I once applied to
         | and was unceremoniously turned down. An interesting turn of
         | events.
        
         | noobermin wrote:
         | Thanks for the level headed critique. I indeed began reading
         | the article in earnest and the moment he started making ABCs I
         | skipped to skimming and came here and was about to rant. This
         | is the right perspective, people need to understand their
         | audience first when writing and at the very least justify why
         | the level of abstracts they employed are needed vs for example,
         | their first suggestion for improving the die example which is
         | what I'd go with, even better, you have numpy.average which
         | would be even more clear for scientific programmers because
         | they already know numpy has arithmetic average built-in and
         | will recognize it.
        
         | pgorczak wrote:
         | I get the author's idea and I've been there before. When you're
         | still exploring a problem domain it's really useful to abstract
         | and compose because it helps you both understand the problem
         | better and iterate more efficiently.
         | 
         | At some point you'll want to come back to code that looks more
         | like the first example, ideally with plenty of documentation
         | that explains how you got there. Some other comment put it
         | really well: you create a kind of personal mapping of
         | scientific concepts to programming abstractions. You don't want
         | to have colleagues (or your future self) figure out that
         | mapping to understand your results in terms of the science
         | you're working on.
        
         | modernpink wrote:
         | > I think to any audience other than fairly hardcore software
         | engineers this is going to read a little... mad
         | 
         | So most of what gets posted to HN?
        
         | taeric wrote:
         | I want to disagree with you. But I just can't. We go from
         | having to read ten lines to about forty? With about five or six
         | new concepts that are not related to the original.
        
         | toponaut wrote:
         | To a software engineer this will also read mad. This article is
         | ultimately explaining why abstraction is good and why it's
         | helpful to build classes in python. That is already obvious to
         | SWEs, not at all specific to "scientific computing", and
         | explained elsewhere much more succinctly.
        
           | d0mine wrote:
           | I see it more as an example of anti pattern: unnecessary
           | layers obfuscating what happening. Good abstraction is hard
           | and it is not free. It is better to make mistake of under
           | using abstraction than overusing (the latter is much harder
           | to maintain).
           | 
           | I would understand if the list comprehension were replaced
           | with the corresponding numpy/scipy/pandas/etc code.
        
             | toponaut wrote:
             | I agree. I didn't mean to imply this was a good explanation
             | or example. I still think "abstraction/classes are good"
             | was the intention/gist of the article.
        
         | ad404b8a372f2b9 wrote:
         | I agree, the bad code at the start is clearer than 99% of the
         | scientific code I've read, without exaggeration. The typing &
         | the abstractions proposed are not only unnecessary but also
         | unrealistic for research code.
         | 
         | If scientists want to improve their code, the first and most
         | important step is to get them to use descriptive verbose names
         | for their variables and functions, and to learn the single
         | responsibility principle.
        
       | jointpdf wrote:
       | The terrible irony of this post is that they use a completely
       | wrong definition of expected value. They write down an
       | integral...but then implement something totally different.
       | 
       | You cannot calculate the EV of a random variable X by taking the
       | mean of random samples drawn from the distribution of X (e.g. try
       | it with the Cauchy distribution--see if you get something
       | approaching the actual EV). The only thing they accomplished is
       | building up a _very_ convoluted way of calculating a sample mean
       | --which is already trivial with Numpy or just standard Python.
       | Why do this? I'm perplexed.
       | 
       | This is the issue with software engineers writing scientific code
       | --they often flagrantly misunderstand basic mathematical
       | definitions, and then obfuscate this misunderstanding with their
       | "pure" and "robust" code.
       | 
       | Sorry for the harsh criticism, but I'm tired of seeing this kind
       | of thing.
        
         | ww_wpg wrote:
         | > You cannot calculate the EV of a random variable X by taking
         | the mean of random samples drawn from the distribution of X
         | 
         | My understanding of statistics is rudimentary so forgive me but
         | doesn't the sample mean of a normally distributed variable tend
         | towards the expected value for the population?
        
         | tgb wrote:
         | Well the integral doesn't work for a Cauchy distribution either
         | so that seems a little unfair for criticism. And this kind of
         | procedure is not unusual for more complex distributions that
         | have no well-understood density function to integrate, like the
         | results a physics simulation, aren't they?
        
         | hrzn wrote:
         | Note that the actual EV of a Cauchy random variable is
         | undefined...
        
       | nooorofe wrote:
       | I don't like the code. My start point would be that
       | import pandas as pd         import numpy as np              if
       | __name__ == "__main__":             n_samples = 10000
       | samples_np = pd.DataFrame(np.random.randint(1, 7, n_samples),
       | columns=["face_value"])
       | print(samples_np.face_value.mean())
       | 
       | Speaking about abstraction, I don't know math, so first thought
       | would be to look for *existing* abstractions. When I work with
       | relational data, my first option to check is SQL. For math looks
       | like DataFrame is a *standard* abstraction. To be fair, maybe
       | first I would be using build-in `random.randin` I am not very
       | familiar with `numpy`, but I would definitely google "pandas
       | random sample", that would bring
       | https://pandas.pydata.org/docs/reference/api/pandas.DataFram...
       | if __name__ == "__main__":             n_samples = 10000
       | sample_pd = pd.DataFrame({'face_value': [1, 2, 3, 4, 5, 6]})
       | print(sample_pd.sample(                 n=n_samples,
       | replace=True,                  random_state=np.random.bit_generat
       | or.randbits(20)).face_value.mean())
       | 
       | code uses lambda functions in some examples, it probably kills
       | advantages of `numpy` performance. Using DataFrame API at least
       | helps to avoid those pitfalls.
       | 
       | Type annotation, I like the idea, but in the end code looks like
       | Java, but doesn't performs like Java. It is very hard to make it
       | right in Python, also some of them wrong.
       | 
       | ( _@dataclass(frozen=True):_ - don 't need ":" _Gaussian.sample_
       | - missing return )
       | 
       | when return added it doesn't return `-> Sequence[float]:`
       | Gaussian().sample(90).dtype         >>> dtype('float64')
       | 
       | _- > Sequence[Union(numpy.float64, numpy.float32,
       | numpy.float16)]:_ # ?
       | 
       | I don't believe "scientific code" is fundamentally different from
       | any other code, I would go with following normal development
       | practices
       | 
       | 1) review design ("don't reinvent wheel")
       | 
       | 2) add tests
       | 
       | 3) make code review
       | 
       | 4) version control
       | 
       | etc.
        
       | brilee wrote:
       | This isn't a bad article; the title is just completely
       | misleading. More appropriate would be:
       | 
       | "How to implement the simplest possible version of TensorFlow
       | Probability [1], a probabilistic programming framework".
       | 
       | [1] https://www.tensorflow.org/probability/overview
        
       | ivan_ah wrote:
       | I agree with the others commenters who say the code
       | "improvements" lead to unwarranted complexity and unreadability.
       | Academic version of enterprise fizz-buzz this is.
       | 
       | IMHO, the quality of the code is not so important but more the
       | documentation and testing around the code, and the use of best
       | practices like git, versioning, etc. A good place to learn about
       | these things is Patrick Mineault's _The Good Research Code
       | Handbook_ which is available here https://goodresearch.dev/
       | (intended for scientists who must produce code artifacts as part
       | of their research).
        
       | adam_arthur wrote:
       | Write it like natural language, rather than poorly named,
       | succinct/hard to decipher variables.
       | 
       | That alone will put your code above 90% of scientific code.
        
       | toponaut wrote:
       | The author's solution to the toy problem is
       | 
       | ---
       | 
       | die = Die(12)
       | 
       | expected_values(die, n=10000)
       | 
       | gaussian = Gaussian(mu=4.0, sigma=2.0)
       | 
       | expected_value(gaussian, n=100000)
       | 
       | coin = Coin(fairness=0.75)
       | 
       | expected_value(coin, f=lambda x: np.where(x == "H", 1.0, 0.0)
       | 
       | ---
       | 
       | After all this work, the answers to the problems are right there
       | in the constraints.
       | 
       | ev_die = (sides+1)/2
       | 
       | ev_gaussian = mu
       | 
       | ev_coin = sign * fairness
       | 
       | What is the point?
        
       | bluenose69 wrote:
       | This article starts with naive code that is unlikely to be
       | written by working scientists unless they are really quite new
       | this type of work.
       | 
       | And, quite quickly, the article reaches a level of detail that
       | might not engage the initial readers.
       | 
       | As a scientist with several decades of programming experience, it
       | seems that those who write this sort of naive code are at an
       | early stage of learning, and they might be better off learning
       | Julia, which offers high performance even if written in a naive
       | style. Granted, Python has superior (formal and informal)
       | documentation, and it's error message are a lot easier to
       | understand than those spewed by Julia. But it is quite freeing to
       | be able to express things in a manner that is closer to
       | mathematical notation, without paying a high performance cost.
       | And, if the task really requires it, Julia will let you tune
       | things to Fortran-level speeds (and sometimes better ... but
       | that's another story).
        
       | voorwerpjes wrote:
       | Former academic scientist, now software engineer.
       | 
       | I think this article misses the point that rarely in science is
       | the code the actual product/deliverable. The way a scientific
       | code should be judged is it performant enough to answer my
       | question in a reasonable amount of time, clear enough to another
       | reader (which face it 99% of the time, in science the only other
       | reader will likely be your future self or a student with little
       | coding experience), and does it give the correct result in a
       | reproducible way. To my eye, the first "bad" example is actually
       | the best to all these questions.
       | 
       | To my mind it feels more like how can I use my scientific coding
       | I have to do as a way to improve my software engineering skills.
       | Which isn't a bad endeavor and might make leaving academic
       | science easier for you, heck I did the same thing. However, in my
       | humble opinion, it won't make you write "better" scientific code.
        
         | [deleted]
        
       | diarrhea wrote:
       | Isn't there also a blatant mistake in the content of the post?
       | 
       | The naive approach for approximating e keeps track of two floats.
       | No arrays. So minimal, aka negligible memory footprint.
       | 
       | Then they introduce iterators, promoting that yielding values is
       | more efficient. Yet suddenly, using islice, they are actually
       | producing and having to store a list of values. So now they do
       | have to worry about memory, as opposed to before. Yet, the
       | article claims/implies the opposite.
        
       | qualudeheart wrote:
       | I`d use R instead. Python is best used as a glue language to
       | access premade libraries.
        
         | odonnellryan wrote:
         | R doesn't give you magic code. I have worked on several
         | projects where my job was to rewrite the R code that took over
         | a day to run in Python. Each time I was (fairly easily) able to
         | get to a place where the Python code ran in < minute. Mostly
         | this was because the people who developed the R code had no
         | good concept of data structures.
        
         | kortex wrote:
         | That's exactly what this is doing. Numpy is glue around
         | lapack/blas.
         | 
         | R is okay for batch processing, but what happens when
         | management wants engineering to implement data science's models
         | alongside some pytorch models?
         | 
         | Python is totally performant enough if you know how to wield
         | it.
        
           | qualudeheart wrote:
           | My mistake. I scrolled through the article but didn't see the
           | part where numpy is called.
        
           | igouy wrote:
           | So "Python is totally performant" when it's Fortran :-)
        
             | odonnellryan wrote:
             | I always feel this is a little unfair. Sure, a Python
             | program and an R program, written the same way, using the
             | same data structures, etc.. is going to usually show the R
             | program is faster.
             | 
             | But getting to that point is where the challenge is, and I
             | feel that Python makes thinking about things like data
             | structures and the algorithms you're using (in the case of
             | external libraries) or writing much easier than other
             | languages.
             | 
             | In my experience once you "get there" that is enough.
        
               | mattkrause wrote:
               | The catch, IMO, is that the need to stay in the numpy
               | sandbox really constrains how you choose data structures
               | and algorithms.
               | 
               | Storing N-dimensional points in a Point class, for
               | example, is often so slow as to be a non-starter.
               | Admittedly, it's not _just_ a Python problem: struggles
               | with array-of-structs vs. struct-of-array representations
               | are pretty ubiquitous.
        
               | odonnellryan wrote:
               | That's true if you need to stay in that sandbox, which
               | isn't always the case!
        
             | jpgvm wrote:
             | Pretty much. The only fast parts of Python aren't Python.
        
       | kgwgk wrote:
       | Distrinution
        
       | rob_c wrote:
       | Having worked in the field long enough, I always feel the need to
       | add:
       | 
       | A) If you're writing high-performance code DON'T you _will_ hit
       | bottlenecks which are non-trivial to the language unless you're a
       | guru. If you're a scientist you're going to have to work to get
       | to this level.
       | 
       | Apart2) Python is excellent for saying "perform analysis A with
       | variables B,C,D,E,F on X,Y,Z 30 times using this random seed
       | library". What Python is not excellent at is saying, "I have 20GB
       | of raw doubles and I need to push these through non-euclidian
       | algebra as fast as a modern CPU can cope". The biggest surprise
       | is that modern Python really can do both, it just struggles a lot
       | with the latter over the former. IPython and JuPyter notebooks as
       | well are amazing for the former!
       | 
       | B) For the love of whatever deity you worship. PLEASE DOCUMENT
       | YOUR DEPENDENCIES CORRECTLY. This is non-trivial and doesn't mean
       | "Works on CentOS6". I mean at least tested on Python x.y with
       | depA version 1.2.3. Python isn't as bad a npm, but it's getting
       | there...
       | 
       | Bpart2) Unless you're starting out, _avoid_conda_.
       | Installing/maintaining things through this is not nice. This is
       | the sort of system that makes your local sysadmin groan when you
       | come to him and say "it's easy because it has a UI". Behind the
       | scenes it makes so many assumptions that are great for people
       | starting out, but are often painfully wrong for people looking to
       | run code at hyper-scale on clusters. I recommend investing _some_
       | time in learning how python packages work (not much!, packaging,
       | is not coding). But as a result you will learn or employ a lot of
       | skills which will help you with larger python codebases. There
       | are large performant python packages/tools out-there and they all
       | tend to leverage advanced features for good reasons.
       | 
       | C) Never! re-invent the wheel unless you know you explicitly need
       | to. And avoiding pulling in a dependency (unless the dependency
       | is _huge_ is not always the correct answer).
       | 
       | D) Ask an expert. It's fine to acknowledge you don't know how to
       | use the tool that is modern python. It's a swiss-knife with a
       | shotgun attached pointed at your face. When it goes wrong it will
       | come back to bite, but when you get it correct it's so efficient
       | you will marvel at how much you can do with just 5/6 lines of
       | code.
       | 
       | E) Have fun, just try things and see if they don't work or work
       | really well, python is great for avoiding too much boilerplate
       | for mid-scale projects :)
        
         | jonnycomputer wrote:
         | >B) For the love of whatever deity you worship. PLEASE DOCUMENT
         | YOUR DEPENDENCIES CORRECTLY. This is non-trivial and doesn't
         | mean "Works on CentOS6". I mean at least tested on Python x.y
         | with depA version 1.2.3. Python isn't as bad a npm, but it's
         | getting there...
         | 
         | That is why for R code, I really like the renv project.
         | 
         | https://www.rstudio.com/blog/renv-project-environments-for-r...
        
       | pphysch wrote:
       | On a related note I recently wrote some scientific codes with Go.
       | MPI bindings were ergonomic enough and generics is a big win for
       | writing reusable & fairly performant codes... Built once and
       | deployed on multiple nodes without a hitch. And zero mucking with
       | venvs.
       | 
       | I feel like if you're reaching for interfaces and type hints in
       | Python, it's time to revisit the ole toolkit.
        
       | AitchEmArsey wrote:
       | This is an article about how to write theoretically better code
       | (from a CS perspective) for a scientific use-case - but that is
       | not the same thing as "better scientific code". For most
       | purposes, the effort a scientist will go to in understanding how
       | classes work (as an example) will outweigh any benefit they might
       | see from using them.
       | 
       | One of the nice things about Python is that the programming style
       | arrived at here is totally unnecessary - a point which was
       | soundly missed by the author.
        
       | Someone wrote:
       | I think the author of this article should read the Mythical Man
       | Month (https://en.wikipedia.org/wiki/The_Mythical_Man-Month), in
       | particular on the difference between (all quotes are from that
       | book)
       | 
       | - a _program_ , "complete in itself, ready to be run by the
       | author on the system on which it was developed"
       | 
       | - a _programming product_ : "a program that can be run, tested,
       | repaired, and extended by anybody. It is usable in many operating
       | environments, for many sets of data"
       | 
       | - a _programming system_ : "a collection of interacting programs,
       | coordinated in function and disciplined in format, so that the
       | assemblage constitutes an entire facility for large tasks"
       | 
       | - a _programming systems product_ : "This differs from the simple
       | program in all of the above ways. [...] But it is the truly
       | useful object, the intended product of most system programming
       | effort"
       | 
       | They all have their place, and if you need a _program_ , spending
       | time on writing a _product_ , a _programming system_ or a
       | _programming systems product_ is a waste of effort.
       | 
       | Most scientific code falls in the _program_ or, maybe, somewhat
       | in the direction of the _programming product_ category, and
       | there's nothing wrong with that.
       | 
       | (Note that _quality_ is a concept that's orthogonal to this
       | distinction)
        
       | gillesjacobs wrote:
       | Great writeup, generalisation and data seperation is difficult
       | but will more often than not improve code. Learning how to
       | structure your code around general concepts has the advantage
       | that there many low-level vectorized libraries available
       | increasing compute efficiency.
       | 
       | This is especially true for numerical programming (numpy), stats
       | (scipy) and ML (torch, tf, jax), but the difficulty is often in
       | finding the name of functions to corresponding concepts. Better
       | get good at formulating queries against the API documentation
       | too!
        
       | sega_sai wrote:
       | Scientist here, who has been writing code for > 20 years. I don't
       | buy pretty much anything in the article. This was a bunch of
       | opinionated examples. Science programming in my opinion has many
       | different levels which require different approaches and
       | techniques.
       | 
       | Very often the first code written will be just a quick and dirty.
       | If the idea worked out, I may refactor it, make the code
       | prettier, speed it up a bit. Very-very rarely the bit of code
       | will be something that I'll need to constantly reuse, and there I
       | have to think about interfaces, organisation etc. Also a
       | situation that often comes up, you have a working code that does
       | the job, and then you keep adding more and more functionality to
       | it, and that slowly requires making the code more generic, better
       | organized. But this is not the first thing you do, you only do
       | that if it is needed. That's why I think there are very few very
       | generic recommendations for scientific programming.
        
       | mahathu wrote:
       | There's a typo in "Distribution" in the "Boosting up the purity"
       | section.
        
       | [deleted]
        
       | izhak wrote:
       | How to better scratch some working thoughts on a napkin?
        
       ___________________________________________________________________
       (page generated 2022-02-19 23:01 UTC)