[HN Gopher] Research software code is likely to remain a tangled...
       ___________________________________________________________________
        
       Research software code is likely to remain a tangled mess
        
       Author : hacksilver
       Score  : 147 points
       Date   : 2021-02-22 11:04 UTC (11 hours ago)
        
 (HTM) web link (shape-of-code.coding-guidelines.com)
 (TXT) w3m dump (shape-of-code.coding-guidelines.com)
        
       | ejz wrote:
       | This is actually the opportunity for our startup. I think there
       | is generally a great opportunity to be the Databricks of a lot of
       | academic software. We're starting in a big research area in
       | biology :)
        
       | bjarneh wrote:
       | I agree that academia produces its fare share of spaghetti code,
       | but I don't think all of his arguments are correct.
       | 
       | > writing software is a low status academic activity
       | 
       | This is just not true. People like: Stallman, Knuth, Ritchie,
       | Kernighan, Norvig or Torvalds are not considered as people of low
       | status in the academic world.
       | 
       | Writing horrible spaghetti code in academia may be considered
       | "low status"; but that's another story.
       | 
       | He should compare apples to apples. I.e. do people who work in
       | academia write better or worse code there; compared to when they
       | work for a business? I.e. they should be compared to themselves
       | in different situations, not to some imaginary high coding
       | standard that I've never seen anywhere.
       | 
       | In my own experience from academia at least I'd say that the lack
       | of deadlines; the possibility to do whatever I want, plus the
       | lack of management, creates much higher quality software in
       | academia. When you work commercially, you will churn out
       | embarrassing stuff just to make some stuff work before a
       | deadline.
        
         | pca006132 wrote:
         | I think rather than low status academic activity, it is just
         | not valued in the academia... The code is usually just the by-
         | product of the paper, and higher quality code does not
         | translate to higher quality paper.
         | 
         | When your code works, you probably already developed the main
         | part of your paper, and you would have no incentives to improve
         | on your program if what you want is just publication... At
         | least this is what I think.
        
           | bjarneh wrote:
           | I've seen many research papers where all they disclose about
           | the software is some pseudo code + some tables with timing
           | results to prove their "performance gains".
           | 
           | For those types of papers I agree with your statement. But in
           | many academic scenarios others will want to inspect the
           | source, and the quality of that code is certainly something
           | that will add or subtract from your "status" in the academic
           | world so to speak :-)
        
             | pca006132 wrote:
             | OK, perhaps I should read more papers :)
        
         | abcc8 wrote:
         | Agreed. Based on my lab experiences, it might have been more
         | accurate for the author to write that often code is not being
         | written by individuals who are the intellectual drivers of the
         | lab. In many labs the 'thinkers' get far more credit and are
         | more valued than the 'doers'.
        
           | bjarneh wrote:
           | > In many labs the 'thinkers' get far more credit and are
           | more valued than the 'doers'
           | 
           | In terms of software this never made much sense to me. I
           | would understand if we where talking chemistry or some other
           | discipline where a "new idea" has to be investigated/verified
           | by some "lab-rat" doing mundane tasks for 2 years. In that
           | case the lab-rat would probably get less credit than the
           | person with the actual idea, but this just does not apply to
           | software. Developers are not doing mundane tasks on behalf of
           | some great thinker.
        
             | abcc8 wrote:
             | Rarely are the developers writing grants, generating
             | hypotheses, planning experiments, composing manuscripts,
             | presenting the lab's work, teaching at the university, etc.
             | Perhaps my choice of words was poor, but this should be
             | more clear.
        
               | bjarneh wrote:
               | I guess this differs from place to place, but at my
               | university (Oslo), we did all that..
        
         | nirse wrote:
         | >> writing software is a low status academic activity > This is
         | just not true. People like: Stallman, Knuth, Ritchie,
         | Kernighan, Norvig or Torvalds are not considered as people of
         | low status in the academic world.
         | 
         | I understand the meaning of 'academic software developers' to
         | mean 'software developers that assist in building software for
         | other, non-CS, fields of research', but you only mention people
         | famous within CS. I don't think this article is meant to apply
         | to CS.
        
       | neffy wrote:
       | I don't think research code in aggregate is any worse than any
       | other source for code. If we had the same kind of visibility into
       | all the commercially written code, it would be the same pattern
       | of some well structured, and some a complete mess, without any
       | correlation with the companies concerned, but with a lot of
       | correlation to the ability of the author.
       | 
       | The recent example of Citibank's loan payment interface comes
       | immediately to mind. So does Imperial's Covid model (the one that
       | had timing issues when run on different computers.)
        
         | AshamedCaptain wrote:
         | Exactly. You can imagine most engineering software to be in a
         | similar state as research code. It's just that people get to
         | see research code.
        
       | currymj wrote:
       | as people are saying, the typical software engineering advice
       | simply wouldn't work in a research context.
       | 
       | one exception is the most basic stuff - people should use version
       | control, do light unit testing, and explicitly track
       | dependencies. These weren't really done in the past but are
       | becoming more and more common, fortunately.
       | 
       | I think if software engineering experts actually sat down, looked
       | at how researchers work with computers, and figured out a set of
       | practices to follow that would work well in the research context,
       | they could do a lot of good. This is really needed. But the
       | standard software engineering advice won't work as it is, it has
       | to be adapted somehow.
        
         | pydry wrote:
         | Another issue is that the standard software engineering advice
         | doesn't guarantee clean code either.
        
       | orange_tee wrote:
       | Well as somebody who has written research software, I don't agree
       | that research software is a "tangled mess". A couple of points,
       | 
       | 1. often when I read read software written by profession
       | programmers I find it very hard to read because it is too
       | abstract, almost every time I try to figure out how something
       | works, it turns out I need to learn a new framework and api, by
       | contrast research code tends to be very self contained
       | 
       | 2. when I first wrote research software I applied all the
       | programming best practices and was told these weren't any good;
       | turns out using lots of abstraction to increase modularity makes
       | the code much slower, this is language dependent of course
       | 
       | 3. you will find it much harder to read research code if you
       | don't understand the math+science behind it
       | 
       | > many of those writing software know very little about how to do
       | it
       | 
       | This is just not true. I found in my experience that people
       | writing research software have a very specific skillset that very
       | very few industry programmers are likely to have. They know how
       | to write good numerics code, and they know how to write fast code
       | for super computers. Not to mention, interpreting the numerics
       | theory correctly in the first place is not a trivial matter
       | either.
        
         | deklund wrote:
         | As someone who's worked for a large part of my career as a sort
         | of bridge between academia and industry (working with
         | researchers to implement algorithms in production), both you
         | and the original author are right to an extent.
         | 
         | On one hand, academics I've worked with absolutely undervalue
         | good software engineering practices and the value of
         | experience. They tend to come at professional code from the
         | perspective of "I'm smart, and this abstraction confuses me, so
         | the abstraction must be bad", when really there's good reason
         | to it. Meanwhile they look at their thousands of lines of
         | unstructured code, and the individual bits make sense so it
         | seems good, but it's completely untestable and unmaintainable.
         | 
         | On the other side, a lot of the smartest software engineers
         | I've known have a terrible tendency to over-engineer things.
         | Coming up with clever designs is a fun engineering problem, but
         | then you end up with a system that's too difficult to debug
         | when something goes wrong, and that abstracts the wrong things
         | when the requirements slightly change. And when it comes to
         | scientific software, they want to abstract away mathematical
         | details that don't come as easily to them, but then find that
         | they can't rely on their abstractions in practice because the
         | implementation is buried under so many levels of abstraction
         | that they can't streamline the algorithm implementation to an
         | acceptable performance standard.
         | 
         | If you really want to learn about how to properly marry good
         | software engineering practice with performant numerical
         | routines, I've found the 3D gaming industry to be the most
         | inspirational, though I'd never want to work in it myself. They
         | do some really incredible stuff with millions of lines of code,
         | but I can imagine a lot of my former academia colleagues
         | scoffing at the idea that a bunch of gaming nerds could do
         | something better than they can.
        
           | acmj wrote:
           | > _a lot of the smartest software engineers I 've known have
           | a terrible tendency to over-engineer things._
           | 
           | Your definition of "smartest software engineers" is the
           | opposite of mine. In my view, over-engineering is the symptom
           | of dumb programmers. The best programmers simplify complex
           | problems; they don't complicate simple problems.
        
             | deklund wrote:
             | I don't know that our definitions are that different. Most
             | of the over-engineering I've seen in practice was done in
             | the name of simplifying a complex problem, but resulted in
             | a system that was too rigid to adapt. Our definition of
             | "over-engineered" might be different, though.
        
         | taeric wrote:
         | Your points apply to industry, too. I heretically push flatter
         | code all the time. I'm not against abstraction, but it is easy
         | to fall into the trap of building a solution machine, but
         | missing the solution you need.
        
         | acmj wrote:
         | Quite a few professional programmers evaluate the quality of
         | code by "look": presence of tests, variable length, function
         | length etc. However, what makes great code is really the code
         | structure and logical flows behind. In my experience, good
         | industrial programmers are as rare as good academic
         | programmers. Many industrial programmers make a fuss about
         | coding styles but are not really good at organizing structured
         | code for a medium sized project.
        
         | exdsq wrote:
         | Point 1 is so true, I think it's why I like Golang without
         | generics so people can't go crazy with abstractions.
        
         | disabled wrote:
         | I work on mathematical modeling, dealing with human physiology.
         | Likewise, the software packages used can be esoteric, and the
         | structure of your "code" can be very different looking, to say
         | the least.
         | 
         | This is certainly a lot of work, and this takes a lot of
         | practice to perform efficiently: But no matter what, I comment
         | every single line of code, no matter how mundane it is. I also
         | cite my sources in the commenting itself, and I also have a
         | bibliography at the bottom of my code.
         | 
         | I organize my code in general with sections and chapters, like
         | a book. I always give an overview for each section and chapter.
         | I make sure that my commenting makes sense for a novice reading
         | them, from line-to-line.
         | 
         | I do not know why I do this. I guess it makes me feel like my
         | code is more meaningful. Of course it makes it easier to come
         | back to things and to reuse old code. I also want people to
         | follow my thought process. But, ultimately, I guess I want
         | people to learn how to do what I have done.
        
       | tryonenow wrote:
       | Not all "research" code is equal. I would imagine that research
       | code closer to hard sciences and cutting edges is more difficult
       | to keep clean. The trouble is when you're breaking new ground in
       | applied science, you don't necessarily know how well the new tech
       | will work, and exactly what you'll be able to do with it. As
       | development progresses your expectations must be adjusted, and it
       | is impossible to forecast the direction of highly experimental
       | research since progress is typically incremental and constantly
       | dependent on the most recent results.
       | 
       | There are many paths to scaling a mountain, so to speak, and
       | sometimes for any of a multitude of reasons you end up on another
       | peak long after you've started climbing.
        
       | optiklab wrote:
       | I'm an engineer from the other side of researches or science, but
       | somehow interested in the topic. Recently, I've learned about
       | great work done by Grigori Fursin and entire community of
       | reserach engineers with the goal to make research software more
       | applicable to the industries by doing it with some kind of
       | framework inside. I want to leave some links here, if you don't
       | mind to watch it - the talk is called " Reproducing 150 Research
       | Papers and Testing Them in the Real World":ACM page with webcast
       | https://event.on24.com/wcc/r/2942043/9C904C7AE045B5C92AAB2CF...
       | 
       | Also, source docs available here:
       | https://zenodo.org/record/4005773?fbclid=IwAR1JGaAj4lwCJDrkJ...
       | 
       | And, their solution product https://cknowledge.io/ and source
       | code https://github.com/ctuning/ck
       | 
       | I guess it should be helpful to the researchers community.
        
       | brakus127 wrote:
       | Structure is great for a well understood problem space, but this
       | is not usually the case when working working something novel. As
       | a researcher your focus should be on learning and problem
       | solving, not creating a beautiful code base. Imposing too many
       | constraints early on can negatively impact your project later on.
       | In the worse case, your code starts to limit the way you think
       | about your research. I agree that there are some general best
       | practices that should be applied to nearly all forms of coding,
       | but beyond that it's a balance.
       | 
       | The same thinking should be used when adding regulation to an
       | industry. Heavy regulation on a rapid developing industry can
       | stifle innovation. Regulation (if needed), should be applied as
       | our understanding of the industry increases.
        
         | bordercases wrote:
         | Results need to be refined so that the way they were first
         | formulated doesn't get in the way of their replication. At
         | scale, this too becomes a cost to industry.
         | 
         | In the small, this isn't different from taking a lab notebook
         | and making it clearer and better summarized so that it can be
         | passed on to the poor sucker who has to do what you did after
         | you move on to another project.
         | 
         | Furthermore, software projects that are put under the same
         | iterative stress you imply for R&D inevitably go through a
         | refactoring phase so that performance isn't affected in the
         | long run.
        
           | brakus127 wrote:
           | Agreed that there should be a minimum bar for "completed"
           | research code such as reproducibility and a clear summary,
           | but engineers shouldn't expect the first version of a new
           | algorithm to be easy to understand without additional
           | material or ready for production without a complete rewrite.
        
       | ellimilial wrote:
       | I am actually quite surprised at the figure of 73% research-
       | related code packages not being updated after the publication,
       | was expecting it to be higher.
        
         | rovr138 wrote:
         | Same. But it could be an issue with the sample. 213 in a span
         | of 14 years is not a lot.
         | 
         | Also, a question. If you publish a paper with a repo, what
         | would be the best way to handle the version in the paper
         | matching the repo in the future?
         | 
         | An opinion, there is such a thing as software being 'done' and
         | 'as is'. Software solves a need. After that's meet, that's it.
         | 
         | There's also this part that strikes me,
         | 
         | >Given a tangled mess of source code, I think I could reproduce
         | the results in the associated paper (assuming the author was
         | shipping the code associated with the paper; I have encountered
         | cases where this was not true).
         | 
         | And it strikes me as weird. The main issue to reproduce results
         | is usually data. And depending on the dataset, it's very hard
         | to get. To be able to reproduce the code, I just need the
         | paper.
         | 
         | The code may have bugs, may stop working, may be in a different
         | language/framework. The source of truth is the paper. This is
         | why the _paper_ was published.
        
           | medstrom wrote:
           | >The source of truth is the paper. This is why the _paper_
           | was published.
           | 
           | Speaking as someone who's not the best at math, I find it
           | easier to understand what a paper is saying after I run the
           | code and see all the intermediate results.
           | 
           | When the code doesn't work, it takes me 20 times longer to
           | digest a paper. They could do with _only_ uploading code --
           | to me it 's the shortest and most effective way to express
           | the ideas in the paper.
        
             | rovr138 wrote:
             | >Speaking as someone who's not the best at math, I find it
             | easier to understand what a paper is saying after I run the
             | code and see all the intermediate results.
             | 
             | As long as you understand the paper after, that's okay.
             | 
             | > When the code doesn't work, it takes me 20 times longer
             | to digest a paper.
             | 
             | What if the data isn't available? That's another issue. I
             | see where you're coming from, but that's why the paper
             | itself is the source of truth. Not the implementation.
             | 
             | Another case, what if the implementation makes assumptions
             | on the data? Or on the OS it's being run on?[0][1]
             | 
             | > They could do with only uploading code -- to me it's the
             | shortest and most effective way to express the ideas in the
             | paper.
             | 
             | In my opinion, no. The math and algorithm behind it is more
             | important than an implementation and better for longevity.
             | 
             | [0] https://science.slashdot.org/story/19/10/12/1926252/pyt
             | hon-c...
             | 
             | [1] https://arstechnica.com/information-
             | technology/2019/10/chemi...
        
           | speters wrote:
           | > Also, a question. If you publish a paper with a repo, what
           | would be the best way to handle the version in the paper
           | matching the repo in the future?
           | 
           | You can include the hash of the commit used for your paper.
        
             | rovr138 wrote:
             | oh, that's good. Or even a tag
        
           | jimmyvalmer wrote:
           | > The source of truth is the paper.
           | 
           | Yes, although truth of the flimsiest kind. A lowly but wise
           | code monkey once said "Talk is cheap. Show me the code."
        
             | rovr138 wrote:
             | Here's some code. Data is proprietary. There's no paper
             | explaining the data, prep, steps to gather, caveats,
             | assumptions, etc.
             | 
             | What now?
        
               | jimmyvalmer wrote:
               | No need for the reductionist strawman. Some experiments
               | cannot be reproduced for proprietary data. Those that can
               | should be.
        
         | jonnycomputer wrote:
         | It turns out that maintaining a package is a lot of work, and
         | the career benefit post-publishing said package and
         | accompanying paper is really low.
         | 
         | - writing general purpose software that works on multiple
         | platforms and is bug free is really really hard. So you're just
         | going to be inundated with complaints that it doesn't work on X
         | 
         | - maintaining software is lots of work. Dependencies change,
         | etc.
         | 
         | - supporting and helping an endless number of noobs use your
         | software is a major pita. "I don't know why it wouldn't compile
         | on your system. Leave me alone."
         | 
         | - "oh that was just my grad work"
         | 
         | - its hard to get money to pay for developing it further. great
         | when that happens though.
        
       | hntrader wrote:
       | These are some concepts that I believe in for research code.
       | 
       | Research code shouldn't be a monolith. Each hypothesis should be
       | a script that follows a data pipeline pattern. If you have a big
       | research question, think about what the most modular progression
       | of steps would be along the path from raw data to final output,
       | and write small scripts that perform each step (input is the
       | output from the previous step). Glue them all together with the
       | data pipeline, which itself is a standalone, disposable script.
       | If step N has already been run, then running the pipeline script
       | once again shouldn't resubmit step N (as long as the input hasn't
       | changed since the last run).
       | 
       | This "intermediate data" approach is useful because we can check
       | for errors each step on the way and we don't need to redo
       | calculations if a particular step is shared by multiple research
       | questions.
       | 
       | I was taught this by a good mentor and I've been using this
       | approach for many years for various ML projects and couldn't
       | recommend it more highly.
        
         | porker wrote:
         | This, absolutely this.
         | 
         | I looked over a friend's PhD program because the results were
         | unstable. I knew nothing about the domain which was a large
         | disadvantage, but on the code front it was a monolith following
         | a vague data pipeline approach. Unfortunately components
         | wouldn't run separately and there were only a single end to end
         | tests taking hours to run. Had each section had its own tests,
         | diagnosing which algorithm(s) were malfunctioning would have
         | been easier. We never did.
        
       | ArtWomb wrote:
       | There's a huge digital divide forming as well. Between the
       | hardware a junior software engineer at a well funded research
       | institution such as DeepMind has access to. Compared to the
       | postdoc in Theoretical Physics at Princeton. Who is expected not
       | only to write software. But maintain hardware for a proprietary
       | "supercomputer" that was probably cast off ages ago from a
       | government lab or wall street.
       | 
       | We don't expect Aerospace / Mechanical engineering students to
       | learn metalworking. They typically have access to shop
       | technicians for that work. Why not persuade university
       | administrators to similarly invest in in-house software
       | engineering talent. Generalists who can provide services to any
       | problem domain: from digital humanities to deep reinforcement
       | learning?
        
         | einpoklum wrote:
         | > We don't expect Aerospace / Mechanical engineering students
         | to learn metalworking. They typically have access to shop
         | technicians for that work.
         | 
         | You'd be surprised, but that is often not the case. Lack of
         | sufficient funding, or technicians being dicks, or mis-
         | management by PIs, often result in graduate students having to
         | do the technical work of metalwork, welding, lab equipment
         | calibration, and a bunch of other tasks. Sometimes they even
         | have to operate heavier machinery, or lasers etc without the
         | minimum reasonable technical staff support.
         | 
         | I know this from my time on the executive committee of my old
         | university's Grad Student Organization.
        
         | mattkrause wrote:
         | > We don't expect Aerospace / Mechanical engineering students
         | to learn metalworking.
         | 
         | Umm...we sorta do.
         | 
         | As a neuroscience postdoc, I have done virtually everything
         | from analysis to zookeeping, including some (light)
         | fabrication. We outsource really difficult or mass-production
         | stuff to pro, and there's a single, very overworked machinist
         | who can sometimes help you, but most of the time it's DIY.
        
       | analog31 wrote:
       | >>> writing software is a low status academic activity; it is a
       | low status activity in some companies, but those involved don't
       | commonly have other higher status tasks available to work on.
       | 
       | If measured by compensation, then _research_ is a low status
       | activity. Perhaps more precisely, researchers have low bargaining
       | power. But I don 't think that academics actually analyze
       | activities in such detail. The PI might not even know how much
       | programming is being done.
       | 
       | The researcher is programming, not because they see it as a way
       | to raise (or lower) their status, but because it's a force
       | multiplier for making themselves more productive overall. Though
       | I work in industry, I'm a "research" programmer for all intents
       | and purposes. I program because I need stuff right away, and I do
       | the kind of work that the engineers hate. Reacting to rapidly
       | changing requirements on a moment's notice disrupts their long
       | term planning. Communicating requirements to an engineer who
       | doesn't possess domain knowledge or math skills is painful.
       | Often, a working piece of spaghetti code that demonstrates a
       | process is the best way to communicate what I need. They can
       | translate it into fully developed software if it threatens to go
       | into a shipping product. That's a good use of their time and not
       | of mine.
       | 
       | >>> Why would a researcher want to invest in becoming proficient
       | in a low status activity?
       | 
       | To get a better job. I sometimes suspect that anybody who is good
       | enough at programming to get paid for it, is already doing so.
       | 
       | >>> Why would the principal investigator spend lots of their
       | grant money hiring a proficient developer to work on a low status
       | activity?
       | 
       | Because they don't know how to manage a developer. Software
       | development is costly in terms of both time and effort, and
       | nobody knows how to manage it. Entire books have been written in
       | this topic, and it has been discussed at length on HN. A software
       | project that becomes an end unto itself or goes entirely off the
       | rails can eat you alive. Finding a developer who can do
       | quantitative engineering is hard, and they're already in high
       | demand. It may be that the PI has a better chance managing a
       | researcher who happens to know how to translate their own needs
       | into "good enough" code, than to manage a software project.
        
       | civilized wrote:
       | I see people here saying research is like writing software with
       | fast-changing requirements. I can see how that could seem like an
       | adequate analogy to a software engineer, but it's not.
       | 
       | Researchers use code as a _tool of thought_ to make progress on
       | very ambiguous, high-level problems that lack pre-existing
       | methodology. Like, how could I detect this theoretical
       | astrophysical phenomenon in this dataset? What would it take to
       | predict disease transmission dynamics in a complex environment
       | like a city? Could a neural network leveraging this bag of tricks
       | in some way improve on the state-of-the-art?
       | 
       | If you have JIRA tickets like that in your queue, _maybe_ you can
       | compare your job to that of a researcher.
        
       | yudlejoza wrote:
       | This topic is near and dear to my heart and at a quick glance, I
       | pretty much agree with all/most of this post.
       | 
       | I gained multiple years of industry software engineering
       | experience before joining academia (non-CS, graduate-level). And
       | I was flabbergasted at the way software and programming is
       | treated in research setting where the "domain" is not CS or
       | software itself. It took me a few years just to get a hint of
       | what on earth these people (my collaborators who program side-by-
       | side with me) are thinking, and what kind of mindset do they come
       | from.
       | 
       | Then I took a short break and went to the industry. Software
       | engineering, hardcore CS; no domain, no BS. I was expecting that
       | it would feel like an oasis. It didn't. Apart from a handful of
       | process improvements, like use of version control, issue
       | tracking, deadline-management, the quality of the tangled mess of
       | the code was only slightly better.
       | 
       | Initially I took away the lesson that it's the same in academia
       | and industry. But on further reflection there are two big
       | differences:
       | 
       | - The codebase I worked on in the industry was at least 10x
       | bigger. Despite that, the quality was noticeably better.
       | 
       | - More importantly, I could connect with the my coworkers in the
       | industry. If I raised a point about some SwE terminology like
       | test-driven dev, agile, git, whatever, I could have a meaningful
       | discussion. Whereas in academia, not only most domain experts
       | knew jack about 90% of software-engineering concepts and
       | terminology, they were expert at hiding their ignorance, and
       | would steer the conversation in a way that you wouldn't know if
       | they really didn't know or knew too much. I never got over that
       | deceitful ignorance mixed with elitist arrogance.
       | 
       | In the end, I do think that, despite enormous flaws, the industry
       | is doing way better than academia when it comes to writing and
       | collaborating on software and programming, and that the side-by-
       | side comparison of actual codebases is a very small aspect of it.
        
       | screye wrote:
       | > writing software is a low status academic activity
       | 
       | Yep, that's the one liner right there.
       | 
       | The incentives simply do not match the complaints. Researchers
       | already work upwards of 60 hrs/wk on most occasions. Alongside
       | writing code, they also have to do actual research, write papers,
       | give talks and write grants.
       | 
       | All of the latter tasks are primary aspects of their jobs and are
       | commensurately rewarded. The only situation where a well coded
       | tool is rewarded, is when a package blows up, which is quite
       | rare.
       | 
       | Like all fields, the high-level answer to such questions is
       | rather straightforward. The individual contributors align their
       | efforts to the incentives. Find a way to incentivize good
       | research code, and we will see changes overnight.
        
       | dariosalvi78 wrote:
       | I think that incentives play a big role here. Software has near
       | to zero value in academic evaluation and even less its update and
       | maintenance. The only way to make research software survive is to
       | offer packages that other researchers can also use. Maybe.
        
         | Frost1x wrote:
         | This is changing drastically. The issue is that more and more
         | science relies heavily on computation. Analytic platforms,
         | computational science, modeling/simulation, etc. There's less
         | "bench" science and more of the scientific process is being
         | embedded in software.
         | 
         | There's a certain degree of naivity in this process that SMEs
         | think it's a trivial step translating their research into
         | software. It's not, not if you demand the rigor science should
         | be operating at. As such, many budgets are astronomically lower
         | than they should be. This has worked in the past but as more
         | science moves into software and it becomes more critical to the
         | process, you must invest in the software and it's not going to
         | be cheap. The shortcuts taken in the past won't cut it.
         | 
         | There's a bigger issue in that as a society we don't want to
         | invest in basic research so it's already cash strapped. Combine
         | that with research scientists who already have to cut corners
         | with the massive cost quality software will take and you're
         | creating a storm where science will either produce garbage or
         | well need to reevaluate how we invest in software systems for
         | science.
        
       | asdf_snar wrote:
       | This article seems to cover research software that even can be
       | built. I claim the majority of _code_ written to support research
       | articles is a collection of scripts written to produce figures to
       | put in the paper. Even when the article is about an algorithm,
       | the script that runs this algorithm is just good enough to
       | produce the theoretically expected results; it is never tested,
       | reproduced, or published, never mind being updated after
       | publication.
       | 
       | While others here point out that researchers = bad programmers is
       | a lazy excuse, I think it is important to point out just how
       | steep the learning curve of computer environments can be for the
       | layperson that uses Excel or MATLAB for all their computational
       | work. It can be a huge time investment to get started with tools,
       | such as git or Docker, that we take for granted. I think
       | recognizing this dearth of computer skills is a first step
       | towards training researchers to be computer-competent. Currently,
       | I find the attitude among academics (especially theorists) to be
       | dismissive of the importance of such competencies.
        
         | Doctor_Fegg wrote:
         | > it is never tested, reproduced, or published
         | 
         | This never ceases to amaze me. I regularly read recent papers
         | on shortest-path algorithms. Each one is religiously
         | benchmarked down to the level of saying what C++ compiler was
         | used. But the code itself is almost never published.
        
         | txdv wrote:
         | Reproducibility is a major principle of the scientific method.
         | 
         | Yet computer scientists consistently fail to achieve
         | reproducibility with a tool that is the most consistent at
         | following instructions - the computer.
         | 
         | Even private business is on the DevOps movement, because they
         | see the positive effects of reproducibility.
         | 
         | If the academic world is truly about science, then there is no
         | more excuse, the tools are out there, they need to use them.
        
           | Frost1x wrote:
           | This is really an artifact if unreasonable expectations and
           | modern software ecosystems. When I say unreasonable
           | expectations, the issue is that people assume they can use
           | the latest greatest trendy library and get reproducible
           | results. Good luck on the level of determinism you're looking
           | for.
           | 
           | You need to step back and look at more mature, simple
           | codebases and what you can do in those sorts of environments
           | when you want reproducibility. You can't cobble together a
           | bunch of async services in the cloud and hope your
           | Frankenstein tool gives you perfect results. It will give you
           | good enough results for certain aspects if you focus on those
           | specific aspects (banking does a good job of this with
           | transactional processing and making sure values are
           | consistent because it's their entire business, maybe your
           | account or their web interface is skrewy but that's fine,
           | that can fail).
        
         | Dumblydorr wrote:
         | I am a research scientist published via R, Stata, and Excel
         | analyses. My code documents wouldn't be helpful since the data
         | is all locked up due to HIPAA concerns. We're talking names,
         | health conditions, scrambled SSN, this isn't reproducible
         | because the data is locked to those without security clearance.
         | 
         | The code itself is a ton of munging and then some basic stat
         | functions. This information can be gleaned from the methods
         | section of the article anyway.
         | 
         | So, really, my field of public health doesn't use GitHub or
         | sharing much, there's simply too little benefit to the
         | researcher to share their code.
         | 
         | There's an unwarranted fear of getting your work poached. In
         | modern science, publications are everything, they determine
         | your career. Enabling your direct competitors, those who want
         | the same grants and students and glories, is not common in
         | science.
        
           | asdf_snar wrote:
           | I don't disagree with you on any points. I have some academic
           | friends who mostly do "a ton of munging and then some basic
           | stat functions", as you say (but with less sensitive data).
           | The problem is that their workflow is prone to human error.
           | Even though the stat functions are simple, the proper
           | labeling of inputs and outputs is less reliable.
           | 
           | I have some research published for which I wrote MATLAB code
           | years ago. I trust the fundamental results but not the values
           | displayed in the tables. I would have personally benefited
           | from rudimentary version control and unit testing.
        
           | vharuck wrote:
           | As a public health statistician, I am very grateful to all
           | researchers who publish code. More so for those who publish
           | packages that make their techniques easy to use. I am not an
           | expert in the field of statistics, just a grunt applying what
           | you guys devise. It takes a while for me to do enough
           | research and testing to be sure I'm correctly implementing
           | new techniques. Even a basic pseudo-code walkthrough would
           | immensely help.
           | 
           | >My code documents wouldn't be helpful since the data is all
           | locked up due to HIPAA concerns. We're talking names, health
           | conditions, scrambled SSN, this isn't reproducible because
           | the data is locked to those without security clearance.
           | 
           | Is there a standard format for this kind of data? If so,
           | consider using it. That way, others can easily create
           | artificial datasets to test it. Even if you have no control
           | over your data source, you can convert the raw data to the
           | standard as a "pre-munging" step.
           | 
           | >So, really, my field of public health doesn't use GitHub or
           | sharing much, there's simply too little benefit to the
           | researcher to share their code.
           | 
           | Sad but true.
        
           | stult wrote:
           | In well designed software, data ingestion should be easily
           | separable from the core logic of the application. Which is
           | the point the parent comment is making. Some basic best
           | practices would allow you to share your core code without
           | implicating HIPAA. Even if it's just basic stats, sharing the
           | code makes it easier to reproduce your results and to check
           | your logic.
           | 
           | Although I agree with your analysis that enabling competitors
           | in science is not common, it really, really should be. That's
           | kinda the point of publication, at least in theory. Sharing
           | knowledge and methods.
        
             | jimmyvalmer wrote:
             | > enabling competitors in science ... really, really should
             | be.
             | 
             | Said someone whose livelihood doesn't depend on said
             | competition.
        
       | RocketSyntax wrote:
       | jupyterlab github issue advocating for documenting research:
       | https://github.com/jupyterlab/team-compass/issues/121
        
       | lmilcin wrote:
       | So here is simple fact.
       | 
       | It does not make sense to judge any piece of code that does not
       | meet "highest standard" to be a tangled mess.
       | 
       | There are valid reasons to have varying quality of code and also
       | the idea of quality might be changing from problem to problem and
       | project to project.
       | 
       | A quality of code that governs your car's ECU should be different
       | from quality of code that some research team threw together to
       | demonstrate an idea.
       | 
       | A coding project should achieve some kind of goal or set of goals
       | as efficiently as possible and in many valid cases quality is
       | just not high on the list and for a good reason.
       | 
       | Right now I am working on a PoC to verify an idea that will take
       | a longer time to implement. We do this because we don't want to
       | spend weeks on development just to see it doesn't work or that we
       | want to change something. So spending 2-3 days to avoid
       | significant part of the risk of the rest of the project is fine.
       | It does not need to be spelled out that the code is going to be
       | incomplete, messy and maybe buggy.
       | 
       | There is also something to be said for research people to be
       | actually focusing on something else.
       | 
       | Professional developers focus their careers on a single problem
       | -- how to write well (or at least they should).
       | 
       | But not all people do. Some people actually focus on something
       | else (physics maybe?) and writing code is just a tool to achieve
       | some other goals.
       | 
       | If you think about people working on UIs and why UI code tends to
       | be so messy, this is also probably why. Because these guys focus
       | on something else entirely and the code is there just to animate
       | their graphical design.
        
         | sdwvit wrote:
         | Yeah but you spend more time debugging, that if you write it
         | once with a good architecture and unittests let's say
        
       | bsenftner wrote:
       | Not all research software is a tangled mass. I have extensively
       | worked as a "quant" (before the term was popular) for math,
       | medical, network, media, and physics researchers as my side gig
       | for decades. I'd say about 1/3 of the home brewed research
       | software is constructed with fairly reasonable assumptions, the
       | authors are scientists after all, and I am able to grow their
       | basic setup into a framework they intimately understand and
       | prefer to use. More than once I've found brilliantly engineered
       | software not unlike what I'd find at a pro software development
       | firm.
        
       | f6v wrote:
       | Keep in mind that there're different kinds of research software.
       | Take Seurat[1] as an example. There's CI, issue tracking, etc. It
       | might not be the prettiest code you ever seen, but it absolutely
       | has to be maintainable as it's being actively developed. Such
       | projects are rare, but the low quality is often an indication of
       | a software that isn't used by anyone.
       | 
       | 1. https://github.com/satijalab/seurat
        
         | cratermoon wrote:
         | Also things like EISPAC, BLAS, LINPACK and so on, for FORTRAN.
         | Back in the 70s my dad worked a bit with them when he was
         | employed for UTHERCC: The University of Texas Health,
         | Education, and Research Computer Center. You can find
         | references to UTHERCC in papers from that era.
         | 
         | Come to think of it, something like UTHERCC might be exactly
         | what is needed to help the current situation.
        
       | deeeeplearning wrote:
       | Why is this surprising? Has anyone been inside a Chemistry or Bio
       | lab? You think that what happens in those labs to get research
       | done is industrial grade?
        
       | chilukrn wrote:
       | I agree the post makes valid points, but is there anything new in
       | that? It had been discussed several times here and on other
       | forums as well. "RSE" is just another made-up position with a
       | very average pay structure -- even this is not new.
       | 
       | However, RSEs (or just general software training) may help
       | research groups establish a structure on how to format code, put
       | some standards in place, and at least have some basic tests. This
       | way, more people can read/modify the code efficiently (more = not
       | necessarily general public, but it at least helps incoming grad
       | students/postdocs to pick up the project easily).
        
       | TomMasz wrote:
       | I once interviewed for a programming job that was a bit of bait
       | and switch. The hiring manager showed me a foot-high stack of
       | green bar paper that was Fortran code written by optical
       | scientists that I was expected to convert to C. He was somewhat
       | surprised when I declined and ended the interview. I pity whoever
       | got stuck with that task.
        
       | nowardic wrote:
       | A nice counter example of research software code that adheres to
       | general software engineering best practices and is easy to pick
       | up and use is the OSMNX project: https://github.com/gboeing/osmnx
       | 
       | Props to Geoff for setting a nice standard.
        
       | cratermoon wrote:
       | I was reminded that there are research packages like LINPACK,
       | BLAS, and EISPACK for FORTRAN (and some other languages) that
       | have been maintained since the 70s and are still in use.
       | 
       | Back in the 70s my dad was working for an organization called
       | UTHERCC, the University of Texas Health, Education, and Research
       | Computer Center, and these libraries were some of the code he
       | worked with.
       | 
       | You can find references to UTHERCC in papers from the time,
       | although I don't think it exists under that name. Maybe
       | institutions need something like UTHERCC as an ongoing department
       | now.
        
       | milliams wrote:
       | Disclaimer: I am one of the trustees of the mentioned charity,
       | The Society of Research Software Engineering.
       | 
       | You say that you don't see it having much "difference with regard
       | status and salary". The problem here is two-fold. Firstly,
       | salaries at UK universities are set on a band structure and so an
       | RSE will earn a comparable amount to a postdoc or lecturer. These
       | aren't positions that are known for high wages and historically
       | the reason that people work in research is not for a higher
       | salary.
       | 
       | As for status, I can see that the creation of the Research
       | Software Engineer title (since about 2012) has done great good
       | for improving the status of people with those skills. Before they
       | were "just" postdocs with not many papers but now they can focus
       | on doing what they do best and have career paths which recognise
       | their skills.
       | 
       | My role (at the University of Bristol -
       | https://www.bristol.ac.uk/acrc/research-software-engineering...)
       | is focused almost entirely on teaching. I'm not trying to create
       | a new band of specialists who would identify as RSEs but rather
       | provide technical competency for people working in research so
       | that the code they write is better.
       | 
       | There is a spectrum of RSEs from primarily research-focused
       | postcode who write code to support their work along to full-time
       | RSEs whose job is to support others with their research (almost a
       | contractor-type model). We need to have impact all the way along
       | that spectrum, from training at one end to careers and status at
       | the other.
       | 
       | For more info on the history of the role, there's a great article
       | at https://www.software.ac.uk/blog/2016-08-17-not-so-brief-
       | hist... written by one of the founding members of the Society of
       | Research Software Engineering.
        
       | k__ wrote:
       | I did some research projects, but the problem is that they are a
       | mix of regular projects and experiments.
       | 
       | Things like Nix worked out great, but other stuff I saw is a
       | tangled mess of Java grown over the last 10 years, written by 30
       | different students that didn't talk or let alone knew each other.
        
       | shadowgovt wrote:
       | One of the biggest eye-openers for me as an undergrad was when,
       | upon getting to the point where I'd have to decide whether to
       | pursue graduate education or exit academia and join the
       | workforce, I began to look at the process for publishing novel
       | computer science.
       | 
       | To be clear, novel computer science is valuable and the lifeblood
       | of the software engineering industries. But the actual product? I
       | discovered of myself that I like quality code more than I like
       | novel discovery, and the output of the academic world ain't it.
       | Examples I saw were damn near pessimized... not just a lack of
       | comments, but single-letter variables (attempting to represent
       | the Greek letters in the underlying mathematical formulae) and
       | five-letter abbreviated function names.
       | 
       | I walked away and never looked back.
       | 
       | If there's one thing I wish I could have told freshman-year me,
       | it's that software as a discipline is extremely wide. If you find
       | yourself hating it and you're surprised you're hating it, you may
       | just be doing the kind that doesn't mesh with your interests.
        
       | sumanthvepa wrote:
       | What I find really surprising about research software, is that
       | even people in Computer Science write poorly designed code as
       | part of their research. I would have imagined that they would be
       | better qualified to create good code. Just goes to show that
       | Software Engineering != Computer Science.
        
       | knuthsat wrote:
       | I think there's not enough researchers that publish code.
       | 
       | For example, discrete optimization research (nurse rostering,
       | travelling salesman, vehicle routing problem, etc.) is filled
       | with papers where people are evaluating their methods on public
       | benchmarks but code never sees the day. There's a lot of state-
       | of-the-art methods that never have their code released.
       | 
       | I'm pretty sure it's like that elsewhere. Machine learning and
       | deep learning for some reason has a lot of code in the open but
       | that's not the norm.
       | 
       | I'd prefer the code to be open first. Once that's abundant then I
       | might prefer the code to also be well designed.
        
         | lou1306 wrote:
         | > I think there's not enough researchers that publish code.
         | 
         | I agree, although lately there's been some effort by academia
         | to make authors publish their code, or at least disclose it to
         | the reviewers.
         | 
         | Several conferences have an artifact evaluation committee,
         | which tries to reproduce the experimental part of submitted
         | papers. Some conferences actually _require_ a successful
         | artifact evaluation to be accepted (see, for instance, the tool
         | tracks at CAV [1] and TACAS [2]).
         | 
         | Others, while not requiring an artifact evaluation, may
         | encourage it by other means. The ACM, for instance, marks
         | accepted papers with special badges [3] reflecting how well the
         | alleged findings can be reproduced and whether the code is
         | publicly available.
         | 
         | [1] http://i-cav.org/2021/artifact-evaluation/
         | 
         | [2] https://etaps.org/2021/call-for-papers
         | 
         | [3] https://www.acm.org/publications/policies/artifact-review-
         | an...
        
           | cratermoon wrote:
           | This feels like the right approach. If peer review were to
           | include artifact evaluation, including some kind of code
           | review, and require certain standards be met for acceptance,
           | things would change. As others have noted here, the
           | mechanisms of grant-funded work strongly discourage attention
           | to code quality, and that would have to change as well.
           | 
           | I'm not in academia now, but I started out my career doing
           | sysops and programming in a lab at a medical school and have
           | worked with academics a bit since. I don't do it much because
           | it's basically volunteer work, and it's almost impossible to
           | contribute meaningfully unless you are also well-versed in
           | the field.
        
       | svalorzen wrote:
       | I don't really agree with the reasons given, even though my
       | conclusions are the same. The main reason why research code
       | becomes a tangled mess is due to the intrinsic nature of
       | research. It is highly iterative work where assumptions keep
       | being broken and reformed depending on what you are testing and
       | working on at any given time. Moreover, you have no idea on
       | advance where your experiments are going to take you, thus giving
       | no opportunity to structure the code in advance so it is easy to
       | change.
       | 
       | To make a concrete example, imagine writing an application where
       | requirements changed unpredictably every day, and where the scope
       | of those changes is unbounded.
       | 
       | The closest to "orderly" I think research code can become would
       | be akin to Enterprise style coding, where literally everything is
       | an interface and all implementation details can be changed in all
       | possible ways. We already know how those codebases tend to end..
        
         | Bukhmanizer wrote:
         | As someone who has been on both the research and industry
         | software end, there's really not that much difference.
         | Requirements change, you build that into your plans. Frankly, a
         | lot of best practice software development that gets totally
         | ignored by academia (e.g. OOP) can handle this exact case, and
         | makes things way more flexible.
         | 
         | If the problem was only unpredictability, then projects with a
         | clear and defined end goal (eg, a website to host results)
         | would be of substantially higher quality. But they're not. Well
         | defined projects tend to end up basically just as crappy as
         | exploratory projects.
         | 
         | The problem is evaluation and incentives. There's literally no
         | evaluation of software or software development capability in
         | the industry. I know of a researcher that held a multimillion
         | dollar informatics grant for 3 years. In that 3 years they
         | literally did nothing except collect money. Usually there are
         | grant updating mechanisms, and reports, but he bsed his way
         | through that knowing there's a 0.0000000% chance that any
         | granting agency is going to look through his code. The fraud
         | was only found because he got fired for unrelated activities.
         | 
         | I once looked up older web projects on a grant. 4/6 were
         | completely offline less than 2 years after their grants
         | completed. For 2 of those 4, it's unclear whether the site ever
         | completed in the first place.
        
           | paulclinger wrote:
           | > I know of a researcher that held a multimillion dollar
           | informatics grant for 3 years. In that 3 years they literally
           | did nothing except collect money.
           | 
           | I wonder if a whistleblower payout similar to the one that
           | SEC is doing for 1M+ fines (10-30%) would help in cases like
           | this. The host organization would potentially be on the hook
           | as well, so there is going to be a significant incentive to
           | not let that happen (especially with all the associated
           | reputational damage).
        
           | qmmmur wrote:
           | I can tell you why the sites went offline, because the
           | funding stopped. I don't know what you're research background
           | is but its painful to even get 5 GBP a month to host a
           | droplet on digital ocean in a pretty lucrative department
           | with liberal internal funding.
        
             | Bukhmanizer wrote:
             | Agreed, but all these little things are just a sign that
             | the industry just does not give a shit about software. They
             | _could_ develop mechanisms to fund this stuff, pretty
             | easily actually. But they don't.
             | 
             | A couple of other weird inequities that I've found are: 1.
             | It's hard to get permission to spend money on software
             | subscription based licenses since you won't "have anything"
             | at the end. However, it's much easier to get funding for
             | hardware with time based locks (e.g after 3 years the
             | system will lock up and you have to pay them to unlock).
             | The end result is the same, you can't use the hardware
             | after the time period is up, but for some reason the admin
             | feels much more comfortable about it.
             | 
             | 2. It's hard get funding to hire someone to set up a
             | service to transfer large amounts of data from different
             | places. It's much easier to hire someone to drive out to a
             | bunch of places with a stack of hard drives and manually
             | load the data on them, and drive back. Even if it's 2x more
             | expensive and would take longer. Why? Again my speculation
             | is that the higher ups are just more comfortable with the
             | latter strategy. They can picture the work being done in
             | their head, so they know what they're paying for.
        
               | mschuster91 wrote:
               | > The end result is the same, you can't use the hardware
               | after the time period is up, but for some reason the
               | admin feels much more comfortable about it.
               | 
               | Simple: predictability. With a subscription based model,
               | admin has to deal with recurring (monthly / yearly)
               | payments, and the possibility is always there that
               | whatever SaaS you choose it gets bought up and
               | discontinued. Something you own and host yourself, even
               | if it gets useless after three years, does not incur any
               | administrative overhead and there is no risk of the
               | provider vanishing. Also, there are no "surprise auto
               | renewals" or random price hikes.
               | 
               | > 2. It's hard get funding to hire someone to set up a
               | service to transfer large amounts of data from different
               | places.
               | 
               | Never underestimate the bandwidth of a 40 ton truck
               | filled with SD cards. Joke aside: especially off-campus
               | buildings have ... less than optimal Internet / fibre
               | connections and those that do exist are often enough at
               | enough load to make it unwise to shuffle large amounts of
               | data through them without disrupting ongoing operations.
        
               | selimthegrim wrote:
               | Louisiana state government spent a buttload of money on
               | dedicated high speed fiber optic lines between a bunch of
               | different universities in the state for
               | videoconferencing, telenetworking, "grid computing" etc.
               | 10 years later the only people who remember how to use
               | the system are at LSU, rendering the purpose moot.
               | Everyone else just uses Zoom or Skype.
               | 
               | https://www.regents.la.gov/assets/docs/Finance_and_Facili
               | tie...
        
             | pbourke wrote:
             | Is N years of opex not part of the budget in grant
             | applications?
        
               | qmmmur wrote:
               | In research no and it would depend entirely on your
               | institution. For example, I looked at a job putting
               | together a portal for people to freely examine the
               | research put together for a research team. The project
               | had secured a connection with the british museum, and so
               | that website would live on under that. However, if the
               | project had asked to host it themselves even for 60$ a
               | year for 10 years the answer would be no. Funding grants
               | see small opex that extend beyond the life of the project
               | to be open to corruption or just too facile to fund,
               | wrongly or rightly.
        
           | matthewdgreen wrote:
           | >I know of a researcher that held a multimillion dollar
           | informatics grant for 3 years. In that 3 years they literally
           | did nothing except collect money.
           | 
           | I hate that every HN post about academia ends with an
           | anecdote describing some rare edge-case they've heard about.
           | Intentional academic fraud is a very small percentage of what
           | happens in academia. Partly this is because it's so stupid:
           | academia pays poorly compared to industry, requires years to
           | establish a reputation, and the systems make it hard to
           | extract funds in a way that would be beneficial to the
           | fraudster (hell, I can barely get reimbursed for buying pizza
           | for my students.) So you're going to do a huge amount of work
           | qualifying to receive a grant, write a proposal, and your
           | reward is a relatively mediocre salary for a little while
           | before you shred your reputation. Also, where is your
           | "collected money" going? If you hire a team, then you're
           | paying them to do nothing and collude with you, and your own
           | ability to extract personal wealth is limited.
           | 
           | A much more common situation is that a researcher burns out
           | or just fails to deliver much. That's always a risk in the
           | academic funding world, and it's why grant agencies rarely
           | give out 5-10 year grants (even though sometimes they should)
           | and why the bar for getting a grant is so high. The idea is
           | to let researchers do actual work, rather than having teams
           | manage them and argue about their productivity.
           | 
           | (Also long-term unfunded project maintenance is a big, big
           | problem. It's basically a labor of love slash charitable
           | contribution at that point.)
        
             | Bukhmanizer wrote:
             | > I hate that every HN post about academia ends with an
             | anecdote describing some rare edge-case they've heard about
             | 
             | This isn't a rare edge case, this is very common in
             | software projects. I've heard of it because I was part of
             | the team brought in to fix the situation.
             | 
             | Intentional fraud only is rare when it's recognized as
             | fraud. P-hacking was incredibly widespread (and to some
             | extent still is) because it wasn't recognized as a form of
             | fraud. Do you really think not delivering on a software
             | project has any consequences? Who is going to go in and say
             | what's fraud, what's incompetence, and what's bad luck?
             | 
             | The problem is that the bar for getting software grants
             | isn't high, it's nonsensical. As far as I can tell, ability
             | to produce or manage software development isn't factored in
             | at all. As with everything else, it's judged on papers, and
             | the grant application. In some cases, having working
             | software models and preexisting users end up being
             | detrimental to the process, since it shows less of a "need"
             | for the money. You get "stars" in their field, who end up
             | with massive grants and no idea of how to implement their
             | proposals. Conversely, plenty of scientists who slave away
             | on their own time on personal projects that hundreds of
             | other scientists depend on get no funding whatsoever.
        
               | lazyjeff wrote:
               | Just curious, what kind of 3-year informatics grant not
               | being completed ends up with a team brought in to fix the
               | situation? Multi-million dollar grants don't sound big
               | enough to be a dependency for any major customer (like
               | defense or pharma), so I imagine if fraud was detected,
               | they would just demand a reimbursement and ban the PI.
               | 
               | But I think you're both right in some sense. The cases of
               | intentional major fraud is probably a rare edge case and
               | they make the news when they're uncovered. But there's a
               | lot of grey-ish area like p-hacking as you mentioned,
               | plus funding agencies know there needs to be some
               | flexibility in the proposed timeline due to realities.
               | Realities like you don't necessary get the perfect
               | student for the project right when the grant starts, as
               | the graduate student cycle is annual, plus the research
               | changes over time and it isn't ideal to have students
               | work on an exact plan as if they are an employee.
               | 
               | But I totally agree that maintaining software that people
               | are using should be funded and rewarded by the academic
               | communities. A possible way to do this is have a
               | supplement so that after a grant is over, people who have
               | software generated from the grant that is used by at
               | least 10 external parties without COI, should be funded
               | 100K/yr for however many years they are willing to
               | maintain and improve it. Definitions of what this means
               | needs to be carefully constructed, of course.
        
               | Bukhmanizer wrote:
               | I'll be a bit vague to protect my coworker's privacy, but
               | the scientist was fired for other, unrelated violations,
               | and my boss was brought in to replace him. I think he was
               | leading an arm of a "U" grant, so he wasn't the only
               | senior PI on it. Since they handled it internally, they
               | couldn't just demand a reimbursement. On some level
               | administration knew that the project wasn't moving
               | forward, but once we started asking around, it was clear
               | that there was no effort to start the project at all.
               | 
               | >But I totally agree that maintaining software that
               | people are using should be funded and rewarded by the
               | academic communities. A possible way to do this is have a
               | supplement so that after a grant is over, people who have
               | software generated from the grant that is used by at
               | least 10 external parties without COI, should be funded
               | 100K/yr for however many years they are willing to
               | maintain and improve it. Definitions of what this means
               | needs to be carefully constructed, of course.
               | 
               | I think that this is a great idea.
        
         | j-pb wrote:
         | There's only one way to solve this: Simplicity.
         | 
         | Ironically this is also what occams razor would demand from
         | good Science, so you'd have a win win scenario, where you both
         | create good software and good research, because you focus on
         | the simplest most minimal approach that could possibly work.
        
           | WanderPanda wrote:
           | In my experience simplicity and generality don't go well with
           | performance. If you want to build something that can be used
           | for all kinds of problems and it is simple it will be slow as
           | hell compared to the (dirty) optimised code running hardcoded
           | structures on the GPU
        
             | j-pb wrote:
             | Simplicity pretty much excludes generality in a lot of
             | cases, you're only able to port code to the GPU if it
             | wasn't a million LOC to begin with, so you're pretty much
             | making the case for it.
             | 
             | Note that Simple != Easy or Naive
             | 
             | Hardcoded structures is potentially exactly the kind of
             | simplicity needed.
             | 
             | What's not simple is a general "this solves everything and
             | beyond" code-base with every imaginable feature and legacy
             | capability.
        
           | gmueckl wrote:
           | How do you keep a codebase simple when you need have things
           | in it like implementations of state of the art algorithms to
           | compare against and the previous iterations of your own
           | method so that you can test whether you're actually
           | improving? Then, depending on what you're doing, there's also
           | all the extra nontrivial code for tests and sanity checks of
           | all these implementations.
           | 
           | Simplicity is a nice dream. The realities of research are
           | very often stacked against it.
        
             | j-pb wrote:
             | How the heck to you hope to gain any insighfull metrics
             | when you've got a cobbled together mess that you only half
             | understand. For what it's worth you might only be
             | benchmarking random code layout fluctuations.
             | 
             | I've seen research groups drown in their legacy code base.
             | 
             | The issue of juggling too many balls you describe is one
             | you only have to begin with because the state of the art
             | implementations are so shoddy to begin with.
             | 
             | Research suffers as much as everybody else from feature
             | creep. Good experiments keep the number of new variables
             | low.
        
               | gmueckl wrote:
               | Research code is not only written to measure runtime.
               | Reducing the argument to only that aspect is not helping
               | the discussion.
               | 
               | And you say it yourself: good experiments change a single
               | variable at a time. So how do you check that a series of
               | potential improvements that you are making is sound?
        
               | tylermw wrote:
               | > good experiments change a single variable at a time
               | 
               | Although this is a tangent from the above conversation,
               | this isn't actually true: well-designed experiments can
               | indeed change multiple variables at the same time.
               | There's an entire field of statistics dedicated to
               | experimental design (google "factorial designs" for more
               | information). One-factor-at-a-time (OFAT) experiments are
               | often the least efficient method of running experiments,
               | although they are conceptually simple.
               | 
               | See the following article for a discussion:
               | http://www.engr.mun.ca/~llye/czitrom.pdf
        
             | childintime wrote:
             | It seems Julia has the answer:
             | https://arstechnica.com/science/2020/10/the-unreasonable-
             | eff...
        
               | gmueckl wrote:
               | I can't quite follow what the article is trying to
               | describe because of the heavy use of analogies.
               | 
               | A Google search makes it look like Julia has a mechanism
               | where you can extent the sets of overloads of a function
               | or method outside the original module. The terminology is
               | different (functions have methods instead of overloads in
               | their speak). I don't see how that feature solves the
               | problem in practice.
        
         | f6v wrote:
         | > The main reason why research code becomes a tangled mess is
         | due to the intrinsic nature of research. It is highly iterative
         | work where assumptions keep being broken and reformed depending
         | on what you are testing and working on at any given time.
         | 
         | Oh, boy, how many times have I heard this working at a startup.
         | There is some truth to it, it's hard to organise code in the
         | first weeks of a new project. But if you work on something for
         | 3+ months, it becomes a matter of making a conscious effort to
         | clean things up.
         | 
         | > To make a concrete example, imagine writing an application
         | where requirements changed unpredictably every day,
         | 
         | Welcome to working with product managers at any early stage-
         | company. Somehow I managed to apply TDD and good practices most
         | of the time. Moreover, I went back to school after 7+ years
         | developing software full-time. I guarantee that most of the
         | low-quality research code is a result of a lack of discipline
         | and experience in writing maintainable software.
        
           | warkdarrior wrote:
           | > I guarantee that most of the low-quality research code is a
           | result of a lack of discipline and experience in writing
           | maintainable software.
           | 
           | Bingo! Most research code is written by graduate students who
           | never had a job before, so they do not know how to write
           | maintainable software. You are definitely the exception, as
           | you held a software dev job before going back to school.
        
             | f6v wrote:
             | Some researchers from top-10 schools still publish python2
             | code in 2020. I don't have an explanation for that. It's
             | not even a lack of experience, but something on another
             | level.
        
               | Symbiote wrote:
               | Mathematics doesn't suddenly stop working because your
               | interpreter is a bit old.
        
         | statstutor wrote:
         | >It is highly iterative work where assumptions keep being
         | broken and reformed depending on what you are testing and
         | working on at any given time.
         | 
         | This is describing infinitely fast and efficient p-hacking
         | (i.e. research that is likely to produce invalid results).
         | 
         | If your assumptions are broken then that should ideally be
         | reported as part of your research.
         | 
         | When you do research, you ideally start out with fixed
         | assumptions, and then test those assumptions. The code required
         | to do this can be buggy (and can therefore get fixed), and you
         | can re-purpose earlier code, but the assumptions/brief
         | shouldn't change in the middle of the coding it up.
         | 
         | If you aren't following the original brief, you've rejected
         | your original research concept and you're now doing a different
         | piece of research than you started out - and this is no longer
         | a sound piece of research.
         | 
         | Research _should_ be highly dissimilar to a web design project
         | in this respect.
         | 
         | The reason these projects often become a tangled mess is
         | because researchers don't have the coding skill to program any
         | other way (in my opinion, and nor do institutions invest
         | sufficiently in people who do have this skill).
        
         | ellimilial wrote:
         | There certainly is quite a lot to be said about constant
         | requirements drift. However, this is not something untypical to
         | some of fast-paced product work or, even more closely, r&d
         | effort within the industry.
         | 
         | What then drives the improvement of the code quality is the
         | potential need for continuity and knowledge retention - either
         | in the form of iterative cleaning of the debt or the re-write.
         | This is reliant on the perceived value for the organisation.
         | From this perspective it's more straightforward to get to
         | author's reasons.
        
         | xgb84j wrote:
         | I think software quality in research has nothing to do with the
         | problems themselves. It's more like article suggests that
         | nobody cares about your software. The only goal is to get
         | published and be cited as many times as possible. Your coding
         | mistakes don't matter if they cannot be found out or hurt your
         | reputability.
         | 
         | How many tests would be written for business software if it had
         | only to run for one meeting and then never be looked at again?
        
           | yummypaint wrote:
           | There seems to be an underlying assumption in many of these
           | posts that code has no value once papers are published. This
           | hasn't been my experience working in a research environment
           | at all. The big, complex pieces of code are almost always re-
           | used in some way. For example, theory collaborators send us
           | their code so we can generate predictions from their work
           | without bothering them. Probably 50% or more (and usually the
           | most important parts) of the code written to process
           | experimental data ends up in other experiments. From the
           | perspective of an individual experimentalist, there is
           | tremendous value in creating quality code that can be easily
           | repurposed for future tasks. This core code tends to follow
           | the individual in their career. In some ways it's an
           | extension of commonly used mental tools, and there are
           | diverse incentives to maintain it.
        
         | virgo_eye wrote:
         | Do you think that people doing research at large technical
         | organizations structure their code in the same way as
         | academics? No, although there's always a portion which is
         | active and unstable, they create packages, define interfaces,
         | abstract out pieces which can be reused reliably and depended
         | on. Similarly for other types of researchers in fields where
         | the code is considered an important product. Eg. if you are
         | doing research in compiler design, you're likely to want to
         | create a compiler which can be used by other people. So you
         | make a stable thing with tests, automated builds and so on. And
         | you delimit and instrument the experimental parts.
         | 
         | The real reason is the incentives. Not just are there no
         | incentives to produce good quality code, there are incentives
         | which make people focus on other outputs. Publish or perish
         | means that people put up with technical debt just to get to the
         | next result for the next paper, then do it again and again.
        
           | Frost1x wrote:
           | >The real reason is the incentives. Not just are there no
           | incentives to produce good quality code, there are incentives
           | which make people focus on other outputs. Publish or perish
           | means that people put up with technical debt just to get to
           | the next result for the next paper, then do it again and
           | again.
           | 
           | I believe this is true and is fueled by a misconception of
           | what software is in research. Software in research is often
           | akin to experimentalist work in the past. It's tacked onto
           | theoretical work projects as an afterthought and not treated
           | as what it really is: forcing the theory to be tested in a
           | computational environment.
           | 
           | If we start treating research software like experimentalism
           | in the past, we might get a bit more rigor out of the
           | development process as well as the respect it really
           | deserves.
        
         | Xelbair wrote:
         | >To make a concrete example, imagine writing an application
         | where requirements changed unpredictably every day, and where
         | the scope of those changes is unbounded.
         | 
         | That sounds like software development, alright. It takes a
         | while for domain experts to learn that if programmer ask "is X
         | always true/false", they mean that there are no exceptions from
         | that rule.
         | 
         | I would like for researchers to just name variables sensibly.
         | Even that would improve code quality a lot.
         | 
         | Still the key problem is that there are zero incentives for
         | researchers to even make their code readable! It does not
         | improve any of the metrics they are judged by.
        
         | leecarraher wrote:
         | Yes, not pointing out the difference between coding some novel
         | technique and a well defined software project, completely
         | misses the reason the code is often not well organized.
         | Suggesting that researchers are bad programmers is just a lazy
         | excuse, somewhat damaging, and by no means the rule. I wrote a
         | large complex framework for my research and the very nature of
         | it causes me to add modules and techniques for parts I didn't
         | know would work. And at times hard forks for when I wanted to
         | try something new, which merging back would be impossible to do
         | cleanly. At times you have a hunch and like a fever dream,
         | change who knows what, but you just have to see something
         | through. There is no waterfall method, kanban and agile makes
         | no sense here and even unit tests are I'll defined.
        
           | User23 wrote:
           | This sounds like my software development methodology when I
           | was in my early teens. I was certainly able to get things
           | done and explore all kinds of things (I was doing game dev of
           | course), but the code was a mess and I didn't even have a
           | mature understanding that it was. I just thought that was how
           | programming was and you just had to be really smart to keep
           | things straight in your head.
        
         | commandlinefan wrote:
         | > imagine writing an application where requirements changed
         | unpredictably every day
         | 
         | Imagine?
        
         | Zababa wrote:
         | > The main reason why research code becomes a tangled mess is
         | due to the intrinsic nature of research. It is highly iterative
         | work where assumptions keep being broken and reformed depending
         | on what you are testing and working on at any given time.
         | Moreover, you have no idea on advance where your experiments
         | are going to take you, thus giving no opportunity to structure
         | the code in advance so it is easy to change.
         | 
         | I'd say you're confirming the author's theory that writing code
         | is a low-status activity. Papers and citations are high-status,
         | so papers are well refined after the research is "done". Code,
         | however, is not. If the code was considered on the same level
         | as the paper, I think people would refine their code more after
         | they finish the iteration process.
        
           | svalorzen wrote:
           | Yes... and no. It is true that after a result is obtained,
           | one could clean up the code for publication. And it is true
           | that coding is not seen add first class at the moment.
           | 
           | At the same time, you need to consider that such a clean up
           | is only realistically helpful for other people to check
           | whether there are bugs in the original results, and not much
           | else. Reproducing results can be done with ugly code, and
           | future research efforts will not benefit from the clean up
           | for the same reasons I outlined in my previous post.
           | 
           | While easing code review for other people is definitely
           | helpful (it can still be done if one really wants to, and
           | clean code does not guarantee that people will look at it
           | anyway), overall the gains are smaller than what "standard"
           | software engineers might assume. And I'm saying this as a
           | researcher that always cleans up and publishes his own code
           | (just because I want to mostly).
        
             | jmcdl wrote:
             | Shouldn't checking for bugs be of primary importance. How
             | many times have impressive research results turned out to
             | be a mirage built upon a pile of buggy code? I get the
             | sense that is far too common already.
        
               | pessimizer wrote:
               | > How many times have impressive research results turned
               | out to be a mirage built upon a pile of buggy code?
               | 
               | You're actually making bugs sound like a feature here.
               | I'm pretty sure that if you've gotten impressive results
               | with ugly code, the last thing you want to do is touch
               | the code. If you find a bug, you have no paper.
        
             | throwaway6734 wrote:
             | I am under the impression that most authors do not even
             | publish functioning code when publishing ML/DL papers which
             | I find to be absurd. The paper is describing software. Imo
             | the code is more important than the written word.
        
             | Zababa wrote:
             | > At the same time, you need to consider that such a clean
             | up is only realistically helpful for other people to check
             | whether there are bugs in the original results, and not
             | much else.
             | 
             | I assumed that most code published could be directly useful
             | as an application or a library. Considering what you're
             | saying, this might be only a minority of the code. In that
             | case, I agree with your conclusion about smaller gains.
        
               | jonnycomputer wrote:
               | Most academic code runs once, on one collection of data,
               | on a particular file system.
               | 
               | Academic code can be really bad. But most of the time it
               | doesn't matter, unless they're building libraries,
               | packages, or applications intended for others. That's
               | when it hurts and shows.
               | 
               | I'm a research programmer. I have a master's in CS. I
               | take programming seriously. I think academic programmers
               | could benefit from better practice. But I think software
               | developers make the mistake of thinking that just because
               | academics use code the objective is the same or that best
               | practices should be the same too. Yes, research code
               | should perform tests, though that should mostly look like
               | running code on dummy data and making sure the results
               | look like you expect.
        
               | geebee wrote:
               | I know a lot of "research programmers" (meaning people
               | who write code in research labs but are not themselves
               | the researchers or investigators on a study), and they
               | often have MS degrees in CS - though actually, highly
               | quantitative masters degrees where very elaborate code is
               | used to generate answers is a bit more common than CS per
               | se (math, operations research, branches of engineering,
               | bioinformatics, etc).
               | 
               | Here's the thing - in industry, this background (quant
               | undergrad + MS, high programming ability, industry
               | experience) is kind of the gold standard for data science
               | jobs. In academic job ladders it's... hmm. Here's the
               | thing - by the latest data, MS grads in these fields from
               | top programs are starting at between 120k-160k in
               | industry, and there are very good opportunities for
               | growth.
               | 
               | I actually think that universities and research centers
               | can compete with highly in demand workers in spite of
               | lower salaries, but highly talented people in demand will
               | not turn away an industry job with salary _and_
               | advancement potential to remain in a dead end job.
        
               | galangalalgol wrote:
               | Yeah my standard quote about research code is that it is
               | not the product, so it is ok thta it is bad. The results
               | are the product and those need to be good. Someday
               | someone will take those results (in the form of some data
               | or a paper) and make a software product, and that should
               | be good.
        
               | [deleted]
        
         | geebee wrote:
         | I've worked with a lot of research code. I agree with you that
         | tangled code is somewhat intrinsic to the kind of code written
         | for research.
         | 
         | Here's the thing. Sometimes, there's no code - I mean, they'll
         | find something, but nobody can say, with certainty, that it is
         | the code that generated the data or results you're trying to
         | recreate. There's often no data - and by that, I mean, nothing,
         | not even a dummy file so you can tell if it even runs or
         | understand what structure the data needs to be in. No build, no
         | archive history, no tests. And when I say no tests, I'm not
         | talking about red/green bar integration and unit tests, I mean,
         | ok, the code ran... was this what it was supposed to produce?
         | 
         | Many of these projects are far, far more messed up than the
         | intrinsic nature of research would explain - though I will
         | again agree that research code may be unusually likely to
         | descend into entropy.
        
         | lmm wrote:
         | > To make a concrete example, imagine writing an application
         | where requirements changed unpredictably every day, and where
         | the scope of those changes is unbounded.
         | 
         | I don't have to imagine it, I'm employed in the software
         | industry.
         | 
         | Seriously, nothing you describe sounds any different from
         | normal software development.
        
           | Dumblydorr wrote:
           | In my world, it does sound different, I work with HIPAA data
           | that takes months to get access to. So sharing your code is
           | borderline unacceptable to some orgs, even if it itself
           | doesn't have any privacy data, there's a mass paranoia that
           | you'll accidentally leak patient data, which can lead to
           | fines of 2 million USD.
        
           | burntoutfire wrote:
           | The only difference is speed IMO. Sure, new requirements
           | appear and they can wildly change the underlying assumptions
           | of the systems - but usually, in such case we're given months
           | or years to adapt/rewrite the system in a systematic manner.
           | If, for every wild idea the researcher wants to explore, this
           | amount of rigor was applied in its implementation, I'm
           | guessing the research would slow down immensely. BTW most of
           | research code written for chasing dead ends (quickly testing
           | some small hypetheses), and will be discarded without sharing
           | with anyone - so, investing into writing it properly seems
           | especially wasteful.
        
           | wbl wrote:
           | The program I wrote for my dissertation is as good as it
           | needs to be for a program that had to run once!
        
           | [deleted]
        
       | fabian2k wrote:
       | There is no real incentive to organize and clean up the code,
       | even if the scientists involved have the skills to write well-
       | organized software. And organizing this kind of code that often
       | starts in a more exploratory way is a pretty large amount of
       | additional effort. This kind of effort is simply not appreciated,
       | and if spending time on it means you publish fewer papers it's a
       | net negative for your career.
       | 
       | I'd settle for just publishing the code at all, even if it is a
       | tangled mess. This is still not all that common in the natural
       | sciences, though I have a bit of hope this will change.
        
         | hnedeotes wrote:
         | Yeah I mean, if your study is implying the apocalypse (or even
         | if not, but more so if that's the case) you better put the code
         | there, because that's what the scientific method requires, how
         | should I believe your conclusions or and cute graphs if I can't
         | see how you arrived at it? Maybe it was drawn in Narnia for all
         | I know, maybe it has significant errors, or it's so tailored to
         | produce those results that it's irrelevant.
         | 
         | And if the tools and methods you used for arriving at them are
         | so messy that you dare not publish them what does that tell me
         | about: - your process; - the organisation of your ideas; - the
         | conclusions or points made in the paper?
         | 
         | I don't mean it has to be idiomatic well written code, but it
         | should be readable enough to be followed.
        
       | [deleted]
        
       | mkl95 wrote:
       | If you want to write good research software, a good way is to
       | have professional developers implement it.
       | 
       | I worked closely with an NLP researcher for a while on a project
       | that had received a hefty state grant. She knew more or less what
       | her team needed, but she needed someone to implement it cleanly
       | and in a way that would not make users step on each others toes.
       | 
       | The chances of that project being a buggy mess would have been
       | pretty high if it had been written by people who don't write
       | software for a living. And maybe that's OK.
        
         | mattkrause wrote:
         | Here's the problem with hiring a pro.
         | 
         | The workhorse NIH grant[0] is a R01 with a $250,000/year x 5
         | years "modular" budget. Most labs have, at most, one. Some have
         | two, and a very few have more than that. This covers everything
         | involved in the research: salaries (including the prof's),
         | supplies, publication fees, etc. Suppose you find a programmer
         | for $75k. With benefits/fringe (~31% for us, all-in), that's
         | nearly $100k/year. If the principal investigator (prof,
         | usually) takes a similar amount out of the grant, there's very
         | little money left to do the (often very expensive) work. In
         | contrast, you can get a student or postdoc for far less--and
         | they might even be eligible for a training grant slot, TAship,
         | or their own fellowship, making their net cost to the lab ~$0.
         | 
         | This would be easy to fix: the NIH already has a program for
         | staff scientists, the R50. However, they fund like two dozen
         | per year; that number should be way higher.
         | 
         | [0] Other mechanisms exist at the NIH--and elsewhere--but NSF
         | (etc) grants are often much smaller.
        
           | qudat wrote:
           | > In contrast, you can get a student or postdoc for far less
           | 
           | Yeah I totally agree on this part. The academic system relies
           | not on monetary compensation for its labor, rather it
           | provides them reputation by getting their names on a paper.
           | 
           | I worked essentially for free for a lab in my spare time for
           | 4 years. They get to the result they want, even if its built
           | on a shaky foundation, and for basically free (it doesn't
           | cost anything to put a name on a paper). At the end of the 4
           | years the dream of getting my name on a paper didn't even pan
           | out (lab was ramping down and was essentially a teaching
           | research lab by the time I showed up).
        
       | einpoklum wrote:
       | One of the things which has helped derail my own research career
       | [1] is the tendency to not write tangled-mess code, and to
       | publish and maintain much of my research code after I was
       | supposed to be done with it.
       | 
       | Annoyingly, more people now know of me due to those pieces of
       | software than for my research agenda. :-(
       | 
       | [1] : Not the only thing mind you.
        
         | jimmyvalmer wrote:
         | C'mon, no good deed goes unpunished. Everyone knows that.
        
       | nirse wrote:
       | What doesn't really get mentioned in the article, is that a lot
       | of academic software was written by a single developer. All
       | bigger software projects, academic or not, that were only built
       | and maintained by a single person tends to become messier and
       | messier with time. Perhaps most software suffers from that, that
       | over time it becomes a mess, but having more developers look at
       | code (and enough time, and many other factors) can certainly help
       | to keep things in better shape.
        
         | Robotbeat wrote:
         | Plus it becomes impossible to get multiple developers to work
         | on the code if they can't understand it because of its
         | messiness, so there's a bit of survivor's bias and stronger
         | motivation to clean the code up to be comprehensible to others
         | when you have multiple people working on it.
         | 
         | Also, I feel personally attacked by the headline. :)
         | 
         | It is for this reason I try to keep my code and models pretty
         | simple, only two or three pages of code (or ideally a single
         | page), and I don't try to do too many things with one program,
         | and I choose implementations and algorithms that are simpler to
         | implement to make concise code feasible (sometimes at the
         | expense of speed or generality).
        
       | SavageBeast wrote:
       | I'm currently refactoring a fairly large piece of research code
       | myself. It was written with lean startup thinking in that a
       | little code ought to produce some value in its results. If i was
       | able to eeek some usefulness out of this code, then Id put more
       | energy into it. Otherwise I was perfectly happy to Fail Fast and
       | Fail Cheap.
       | 
       | How did it become such a mess in the first place? Simple - I
       | didn't know my requirements when I started writing it. I built it
       | to do one thing. In running it I learned more things (this is
       | good - why you build stuff like this in the first place). The
       | code changed rapidly to accommodate these lessons.
       | 
       | It wasn't long before I was running into limitations in the
       | design of the underlying libs I was using etc. Of course I could
       | find a way to make it work but it wasn't going to win any
       | Software Design Awards.
       | 
       | Im happy to report that despite ending up a tangled mess, it
       | actually helped me to come to understand and conquer a very
       | specific kind of problem. In doing so I learned the limitations
       | of commercially available tooling, the limitations of
       | commercially available data, not to mention a great deal about
       | the problem domain itself.
       | 
       | This research software has earned its keep and is now being
       | cleaned up into a more organized, near commercial quality kind of
       | project. Im glad I threw out "architecture" when I first started
       | with this. It could have gone the other way where I had a very
       | well built piece of code that didn't in fact perform any useful
       | function.
        
         | analog31 wrote:
         | I believe that of all the lessons to come from contemporary
         | software development, _constant refactoring_ may be the most
         | valuable.
         | 
         | The spaghetti monster looms large when you're in the heat of
         | battle. But we've all got some idle time for whatever reason. I
         | spend some time every week doing a couple of things: 1) Reading
         | about good techniques. 2) Working through old code and cleaning
         | it up.
         | 
         | Because changing your code could always break it, refactoring
         | also reinforces the habit of writing code that can be readily
         | tested -- also a good thing.
        
           | amelius wrote:
           | Makes you wonder why most languages don't come with good
           | refactoring tools.
        
             | bobthepanda wrote:
             | At least in the languages I work primarily in (JS and
             | Java), I find my IDE to be pretty good at analyzing a lot
             | of it.
             | 
             | Refactoring is kind of subjective, because there is rarely
             | One Right Way to solve a problem, and you need context, so
             | I could see why it's not something that languages
             | themselves take strong opinions on.
        
             | realusername wrote:
             | I don't think refactoring tools are that useful for
             | refactoring, most of the time you are doing non-obvious
             | refactoring tools can't help with anyways. Depends what we
             | call "refactoring", tools are mostly useful for what I
             | would call "housekeeping".
        
               | amelius wrote:
               | Well, every refactoring can be seen as a series of
               | correctness-preserving housekeeping operations.
        
             | cratermoon wrote:
             | Safe automatic refactoring requires the ability to do
             | static analysis of the code. Many refactorings are harder
             | in loosely-typed languages.
        
               | grahamlee wrote:
               | Refactoring tools were invented in Smalltalk and worked
               | just fine.
        
               | disgruntledphd2 wrote:
               | This has always surprised me, since I learned it.
               | 
               | What are the features of Smalltalk that allowed this to
               | happen? Conversely, what is stopping this from existing
               | in more modern dynamic languages?
        
               | xkriva11 wrote:
               | Smalltalk has simple and strong reflective features.
               | Moreover, it does not make a difference between the
               | developed program and IDE. This means that doing things
               | like that are very natural and well established in the
               | Smalltalk cultural background.
        
               | grahamlee wrote:
               | Indeed. Having the whole system in front of you, and
               | knowing how patterns like MVC or Thing-Model-View-Editor
               | encapsulate parts of it, makes it very easy to "reason
               | about" wholesale changes to the system.
        
             | analog31 wrote:
             | I should have added that I am probably abusing the term
             | _refactoring_ if it has a precise definition. What I 'm
             | talking about is working on improving the readability of my
             | code, but also improving its structure. Today, "spaghetti"
             | probably doesn't refer to a tangled mess of code sequences
             | because we've gotten rid of the GOTO, but tangled
             | interactions between modules, many of which are vestigial.
             | 
             | A lot of my code interacts with hardware configurations
             | that will cease to exist when a project is done, but I
             | mainly look at the stuff that's potentially reusable, and
             | making it worth re-using.
             | 
             | I'm using Python, and there are a lot of tools for
             | enforcing coding styles and flagging potential errors. I
             | try to remove all of the red and yellow before closing any
             | program file. I don't trust myself with too much
             | automation! "Walk before you run."
        
           | titanomachy wrote:
           | That was my exact thought reading this.
           | 
           | I used to write some crazy spaghetti code as an untrained
           | student working in a lab. Coding would go really quickly at
           | first, but as I kept adding on to accommodate new
           | requirements it became a huge kludgy mess.
           | 
           | Recently (after quite a few years of software engineering
           | experience) I helped a researcher friend to build some
           | software. He was following along with my commits and asked
           | why I kept changing the organization and naming of the code,
           | pulling things out into classes, deleting stuff that he
           | thought might be needed later, etc. He spends only a small
           | part of his time writing code, so he's never realized how
           | much time it actually saves to keep things organized and
           | well-factored.
        
         | hyperpallium2 wrote:
         | I like Brooks' "plan to throw one away; you will, anyhow.":
         | 
         |  _This [first] system acts as a "pilot plan" that reveals
         | techniques that will subsequently cause a complete redesign of
         | the system._
         | 
         | However, in practice I'm not confident enough in my
         | understanding, and fear losing all that hard-won work, so I
         | refactor too.
         | 
         | A rewrite from scratch is probably more viable when the project
         | is small enough to keep in your head at once.
        
           | titanomachy wrote:
           | Brooks has since amended this[1] to say that he really meant
           | it in the context of traditional "waterfall" development,
           | where the first iteration is meticulously planned and
           | designed as a whole system before any code is written at all.
           | 
           | Rapid, iterative prototyping, followed by refactoring, is a
           | perfectly reasonable approach today. No need to create a
           | fresh repository and rewrite all code from scratch.
           | 
           | David Heinemeier Hanssen, creator of Rails and a big advocate
           | of building working code as early as possible, wasn't even
           | born in 1975 when the mythical man-month was written. Linus
           | Torvalds was a (presumably) plucky 6-year-old. Brooks wrote
           | that book for an audience that would have known waterfall as
           | the only way.
           | 
           | [1] https://wiki.c2.com/?PlanToThrowOneAway
        
           | RangerScience wrote:
           | Good architecture is pretty much just about slicing things up
           | so that rewrites / refactors can happen incrementally, rather
           | than all at once. This can actually go both bottom-up (these
           | functions are easy to re-arrange and don't need a rewrite)
           | and top-down (these functions suck, but I don't have to
           | rearrange anything to replace them).
           | 
           | "Good architecture is the one that allows you to change."
        
             | cratermoon wrote:
             | What if you slice it up wrong?
        
               | RangerScience wrote:
               | Then you're gonna have a shit time of it, and may need to
               | do a total rewrite, instead of an incremental one.
        
       | cryptica wrote:
       | Math and physics are a tangled mess so it's not surprising that
       | mathematicians and physicists write code which looks like a
       | tangled mess. Mathematicians and physicists are trained to handle
       | ambiguous concepts and they can work with weird abstractions
       | which are far detached from reality. Unlike programming
       | languages, the language of math is full of gaps - This requires
       | the reader to make assumptions using past knowledge and
       | conventions. Computers, on the other hand cannot make assumptions
       | so the code must be extremely precise and unambiguous.
       | 
       | Writing good code requires a different mindset; firstly, it
       | requires acknowledging that communication is extremely ambiguous
       | and that it takes a great deal of effort to communicate clearly
       | and to choose the right abstractions.
       | 
       | A lot of the best coders I've met struggle with math and a lot of
       | the best mathematicians I've met struggle with writing good code.
        
       ___________________________________________________________________
       (page generated 2021-02-22 23:01 UTC)