[HN Gopher] Research software code is likely to remain a tangled...
___________________________________________________________________
Research software code is likely to remain a tangled mess
Author : hacksilver
Score : 147 points
Date : 2021-02-22 11:04 UTC (11 hours ago)
(HTM) web link (shape-of-code.coding-guidelines.com)
(TXT) w3m dump (shape-of-code.coding-guidelines.com)
| ejz wrote:
| This is actually the opportunity for our startup. I think there
| is generally a great opportunity to be the Databricks of a lot of
| academic software. We're starting in a big research area in
| biology :)
| bjarneh wrote:
| I agree that academia produces its fare share of spaghetti code,
| but I don't think all of his arguments are correct.
|
| > writing software is a low status academic activity
|
| This is just not true. People like: Stallman, Knuth, Ritchie,
| Kernighan, Norvig or Torvalds are not considered as people of low
| status in the academic world.
|
| Writing horrible spaghetti code in academia may be considered
| "low status"; but that's another story.
|
| He should compare apples to apples. I.e. do people who work in
| academia write better or worse code there; compared to when they
| work for a business? I.e. they should be compared to themselves
| in different situations, not to some imaginary high coding
| standard that I've never seen anywhere.
|
| In my own experience from academia at least I'd say that the lack
| of deadlines; the possibility to do whatever I want, plus the
| lack of management, creates much higher quality software in
| academia. When you work commercially, you will churn out
| embarrassing stuff just to make some stuff work before a
| deadline.
| pca006132 wrote:
| I think rather than low status academic activity, it is just
| not valued in the academia... The code is usually just the by-
| product of the paper, and higher quality code does not
| translate to higher quality paper.
|
| When your code works, you probably already developed the main
| part of your paper, and you would have no incentives to improve
| on your program if what you want is just publication... At
| least this is what I think.
| bjarneh wrote:
| I've seen many research papers where all they disclose about
| the software is some pseudo code + some tables with timing
| results to prove their "performance gains".
|
| For those types of papers I agree with your statement. But in
| many academic scenarios others will want to inspect the
| source, and the quality of that code is certainly something
| that will add or subtract from your "status" in the academic
| world so to speak :-)
| pca006132 wrote:
| OK, perhaps I should read more papers :)
| abcc8 wrote:
| Agreed. Based on my lab experiences, it might have been more
| accurate for the author to write that often code is not being
| written by individuals who are the intellectual drivers of the
| lab. In many labs the 'thinkers' get far more credit and are
| more valued than the 'doers'.
| bjarneh wrote:
| > In many labs the 'thinkers' get far more credit and are
| more valued than the 'doers'
|
| In terms of software this never made much sense to me. I
| would understand if we where talking chemistry or some other
| discipline where a "new idea" has to be investigated/verified
| by some "lab-rat" doing mundane tasks for 2 years. In that
| case the lab-rat would probably get less credit than the
| person with the actual idea, but this just does not apply to
| software. Developers are not doing mundane tasks on behalf of
| some great thinker.
| abcc8 wrote:
| Rarely are the developers writing grants, generating
| hypotheses, planning experiments, composing manuscripts,
| presenting the lab's work, teaching at the university, etc.
| Perhaps my choice of words was poor, but this should be
| more clear.
| bjarneh wrote:
| I guess this differs from place to place, but at my
| university (Oslo), we did all that..
| nirse wrote:
| >> writing software is a low status academic activity > This is
| just not true. People like: Stallman, Knuth, Ritchie,
| Kernighan, Norvig or Torvalds are not considered as people of
| low status in the academic world.
|
| I understand the meaning of 'academic software developers' to
| mean 'software developers that assist in building software for
| other, non-CS, fields of research', but you only mention people
| famous within CS. I don't think this article is meant to apply
| to CS.
| neffy wrote:
| I don't think research code in aggregate is any worse than any
| other source for code. If we had the same kind of visibility into
| all the commercially written code, it would be the same pattern
| of some well structured, and some a complete mess, without any
| correlation with the companies concerned, but with a lot of
| correlation to the ability of the author.
|
| The recent example of Citibank's loan payment interface comes
| immediately to mind. So does Imperial's Covid model (the one that
| had timing issues when run on different computers.)
| AshamedCaptain wrote:
| Exactly. You can imagine most engineering software to be in a
| similar state as research code. It's just that people get to
| see research code.
| currymj wrote:
| as people are saying, the typical software engineering advice
| simply wouldn't work in a research context.
|
| one exception is the most basic stuff - people should use version
| control, do light unit testing, and explicitly track
| dependencies. These weren't really done in the past but are
| becoming more and more common, fortunately.
|
| I think if software engineering experts actually sat down, looked
| at how researchers work with computers, and figured out a set of
| practices to follow that would work well in the research context,
| they could do a lot of good. This is really needed. But the
| standard software engineering advice won't work as it is, it has
| to be adapted somehow.
| pydry wrote:
| Another issue is that the standard software engineering advice
| doesn't guarantee clean code either.
| orange_tee wrote:
| Well as somebody who has written research software, I don't agree
| that research software is a "tangled mess". A couple of points,
|
| 1. often when I read read software written by profession
| programmers I find it very hard to read because it is too
| abstract, almost every time I try to figure out how something
| works, it turns out I need to learn a new framework and api, by
| contrast research code tends to be very self contained
|
| 2. when I first wrote research software I applied all the
| programming best practices and was told these weren't any good;
| turns out using lots of abstraction to increase modularity makes
| the code much slower, this is language dependent of course
|
| 3. you will find it much harder to read research code if you
| don't understand the math+science behind it
|
| > many of those writing software know very little about how to do
| it
|
| This is just not true. I found in my experience that people
| writing research software have a very specific skillset that very
| very few industry programmers are likely to have. They know how
| to write good numerics code, and they know how to write fast code
| for super computers. Not to mention, interpreting the numerics
| theory correctly in the first place is not a trivial matter
| either.
| deklund wrote:
| As someone who's worked for a large part of my career as a sort
| of bridge between academia and industry (working with
| researchers to implement algorithms in production), both you
| and the original author are right to an extent.
|
| On one hand, academics I've worked with absolutely undervalue
| good software engineering practices and the value of
| experience. They tend to come at professional code from the
| perspective of "I'm smart, and this abstraction confuses me, so
| the abstraction must be bad", when really there's good reason
| to it. Meanwhile they look at their thousands of lines of
| unstructured code, and the individual bits make sense so it
| seems good, but it's completely untestable and unmaintainable.
|
| On the other side, a lot of the smartest software engineers
| I've known have a terrible tendency to over-engineer things.
| Coming up with clever designs is a fun engineering problem, but
| then you end up with a system that's too difficult to debug
| when something goes wrong, and that abstracts the wrong things
| when the requirements slightly change. And when it comes to
| scientific software, they want to abstract away mathematical
| details that don't come as easily to them, but then find that
| they can't rely on their abstractions in practice because the
| implementation is buried under so many levels of abstraction
| that they can't streamline the algorithm implementation to an
| acceptable performance standard.
|
| If you really want to learn about how to properly marry good
| software engineering practice with performant numerical
| routines, I've found the 3D gaming industry to be the most
| inspirational, though I'd never want to work in it myself. They
| do some really incredible stuff with millions of lines of code,
| but I can imagine a lot of my former academia colleagues
| scoffing at the idea that a bunch of gaming nerds could do
| something better than they can.
| acmj wrote:
| > _a lot of the smartest software engineers I 've known have
| a terrible tendency to over-engineer things._
|
| Your definition of "smartest software engineers" is the
| opposite of mine. In my view, over-engineering is the symptom
| of dumb programmers. The best programmers simplify complex
| problems; they don't complicate simple problems.
| deklund wrote:
| I don't know that our definitions are that different. Most
| of the over-engineering I've seen in practice was done in
| the name of simplifying a complex problem, but resulted in
| a system that was too rigid to adapt. Our definition of
| "over-engineered" might be different, though.
| taeric wrote:
| Your points apply to industry, too. I heretically push flatter
| code all the time. I'm not against abstraction, but it is easy
| to fall into the trap of building a solution machine, but
| missing the solution you need.
| acmj wrote:
| Quite a few professional programmers evaluate the quality of
| code by "look": presence of tests, variable length, function
| length etc. However, what makes great code is really the code
| structure and logical flows behind. In my experience, good
| industrial programmers are as rare as good academic
| programmers. Many industrial programmers make a fuss about
| coding styles but are not really good at organizing structured
| code for a medium sized project.
| exdsq wrote:
| Point 1 is so true, I think it's why I like Golang without
| generics so people can't go crazy with abstractions.
| disabled wrote:
| I work on mathematical modeling, dealing with human physiology.
| Likewise, the software packages used can be esoteric, and the
| structure of your "code" can be very different looking, to say
| the least.
|
| This is certainly a lot of work, and this takes a lot of
| practice to perform efficiently: But no matter what, I comment
| every single line of code, no matter how mundane it is. I also
| cite my sources in the commenting itself, and I also have a
| bibliography at the bottom of my code.
|
| I organize my code in general with sections and chapters, like
| a book. I always give an overview for each section and chapter.
| I make sure that my commenting makes sense for a novice reading
| them, from line-to-line.
|
| I do not know why I do this. I guess it makes me feel like my
| code is more meaningful. Of course it makes it easier to come
| back to things and to reuse old code. I also want people to
| follow my thought process. But, ultimately, I guess I want
| people to learn how to do what I have done.
| tryonenow wrote:
| Not all "research" code is equal. I would imagine that research
| code closer to hard sciences and cutting edges is more difficult
| to keep clean. The trouble is when you're breaking new ground in
| applied science, you don't necessarily know how well the new tech
| will work, and exactly what you'll be able to do with it. As
| development progresses your expectations must be adjusted, and it
| is impossible to forecast the direction of highly experimental
| research since progress is typically incremental and constantly
| dependent on the most recent results.
|
| There are many paths to scaling a mountain, so to speak, and
| sometimes for any of a multitude of reasons you end up on another
| peak long after you've started climbing.
| optiklab wrote:
| I'm an engineer from the other side of researches or science, but
| somehow interested in the topic. Recently, I've learned about
| great work done by Grigori Fursin and entire community of
| reserach engineers with the goal to make research software more
| applicable to the industries by doing it with some kind of
| framework inside. I want to leave some links here, if you don't
| mind to watch it - the talk is called " Reproducing 150 Research
| Papers and Testing Them in the Real World":ACM page with webcast
| https://event.on24.com/wcc/r/2942043/9C904C7AE045B5C92AAB2CF...
|
| Also, source docs available here:
| https://zenodo.org/record/4005773?fbclid=IwAR1JGaAj4lwCJDrkJ...
|
| And, their solution product https://cknowledge.io/ and source
| code https://github.com/ctuning/ck
|
| I guess it should be helpful to the researchers community.
| brakus127 wrote:
| Structure is great for a well understood problem space, but this
| is not usually the case when working working something novel. As
| a researcher your focus should be on learning and problem
| solving, not creating a beautiful code base. Imposing too many
| constraints early on can negatively impact your project later on.
| In the worse case, your code starts to limit the way you think
| about your research. I agree that there are some general best
| practices that should be applied to nearly all forms of coding,
| but beyond that it's a balance.
|
| The same thinking should be used when adding regulation to an
| industry. Heavy regulation on a rapid developing industry can
| stifle innovation. Regulation (if needed), should be applied as
| our understanding of the industry increases.
| bordercases wrote:
| Results need to be refined so that the way they were first
| formulated doesn't get in the way of their replication. At
| scale, this too becomes a cost to industry.
|
| In the small, this isn't different from taking a lab notebook
| and making it clearer and better summarized so that it can be
| passed on to the poor sucker who has to do what you did after
| you move on to another project.
|
| Furthermore, software projects that are put under the same
| iterative stress you imply for R&D inevitably go through a
| refactoring phase so that performance isn't affected in the
| long run.
| brakus127 wrote:
| Agreed that there should be a minimum bar for "completed"
| research code such as reproducibility and a clear summary,
| but engineers shouldn't expect the first version of a new
| algorithm to be easy to understand without additional
| material or ready for production without a complete rewrite.
| ellimilial wrote:
| I am actually quite surprised at the figure of 73% research-
| related code packages not being updated after the publication,
| was expecting it to be higher.
| rovr138 wrote:
| Same. But it could be an issue with the sample. 213 in a span
| of 14 years is not a lot.
|
| Also, a question. If you publish a paper with a repo, what
| would be the best way to handle the version in the paper
| matching the repo in the future?
|
| An opinion, there is such a thing as software being 'done' and
| 'as is'. Software solves a need. After that's meet, that's it.
|
| There's also this part that strikes me,
|
| >Given a tangled mess of source code, I think I could reproduce
| the results in the associated paper (assuming the author was
| shipping the code associated with the paper; I have encountered
| cases where this was not true).
|
| And it strikes me as weird. The main issue to reproduce results
| is usually data. And depending on the dataset, it's very hard
| to get. To be able to reproduce the code, I just need the
| paper.
|
| The code may have bugs, may stop working, may be in a different
| language/framework. The source of truth is the paper. This is
| why the _paper_ was published.
| medstrom wrote:
| >The source of truth is the paper. This is why the _paper_
| was published.
|
| Speaking as someone who's not the best at math, I find it
| easier to understand what a paper is saying after I run the
| code and see all the intermediate results.
|
| When the code doesn't work, it takes me 20 times longer to
| digest a paper. They could do with _only_ uploading code --
| to me it 's the shortest and most effective way to express
| the ideas in the paper.
| rovr138 wrote:
| >Speaking as someone who's not the best at math, I find it
| easier to understand what a paper is saying after I run the
| code and see all the intermediate results.
|
| As long as you understand the paper after, that's okay.
|
| > When the code doesn't work, it takes me 20 times longer
| to digest a paper.
|
| What if the data isn't available? That's another issue. I
| see where you're coming from, but that's why the paper
| itself is the source of truth. Not the implementation.
|
| Another case, what if the implementation makes assumptions
| on the data? Or on the OS it's being run on?[0][1]
|
| > They could do with only uploading code -- to me it's the
| shortest and most effective way to express the ideas in the
| paper.
|
| In my opinion, no. The math and algorithm behind it is more
| important than an implementation and better for longevity.
|
| [0] https://science.slashdot.org/story/19/10/12/1926252/pyt
| hon-c...
|
| [1] https://arstechnica.com/information-
| technology/2019/10/chemi...
| speters wrote:
| > Also, a question. If you publish a paper with a repo, what
| would be the best way to handle the version in the paper
| matching the repo in the future?
|
| You can include the hash of the commit used for your paper.
| rovr138 wrote:
| oh, that's good. Or even a tag
| jimmyvalmer wrote:
| > The source of truth is the paper.
|
| Yes, although truth of the flimsiest kind. A lowly but wise
| code monkey once said "Talk is cheap. Show me the code."
| rovr138 wrote:
| Here's some code. Data is proprietary. There's no paper
| explaining the data, prep, steps to gather, caveats,
| assumptions, etc.
|
| What now?
| jimmyvalmer wrote:
| No need for the reductionist strawman. Some experiments
| cannot be reproduced for proprietary data. Those that can
| should be.
| jonnycomputer wrote:
| It turns out that maintaining a package is a lot of work, and
| the career benefit post-publishing said package and
| accompanying paper is really low.
|
| - writing general purpose software that works on multiple
| platforms and is bug free is really really hard. So you're just
| going to be inundated with complaints that it doesn't work on X
|
| - maintaining software is lots of work. Dependencies change,
| etc.
|
| - supporting and helping an endless number of noobs use your
| software is a major pita. "I don't know why it wouldn't compile
| on your system. Leave me alone."
|
| - "oh that was just my grad work"
|
| - its hard to get money to pay for developing it further. great
| when that happens though.
| hntrader wrote:
| These are some concepts that I believe in for research code.
|
| Research code shouldn't be a monolith. Each hypothesis should be
| a script that follows a data pipeline pattern. If you have a big
| research question, think about what the most modular progression
| of steps would be along the path from raw data to final output,
| and write small scripts that perform each step (input is the
| output from the previous step). Glue them all together with the
| data pipeline, which itself is a standalone, disposable script.
| If step N has already been run, then running the pipeline script
| once again shouldn't resubmit step N (as long as the input hasn't
| changed since the last run).
|
| This "intermediate data" approach is useful because we can check
| for errors each step on the way and we don't need to redo
| calculations if a particular step is shared by multiple research
| questions.
|
| I was taught this by a good mentor and I've been using this
| approach for many years for various ML projects and couldn't
| recommend it more highly.
| porker wrote:
| This, absolutely this.
|
| I looked over a friend's PhD program because the results were
| unstable. I knew nothing about the domain which was a large
| disadvantage, but on the code front it was a monolith following
| a vague data pipeline approach. Unfortunately components
| wouldn't run separately and there were only a single end to end
| tests taking hours to run. Had each section had its own tests,
| diagnosing which algorithm(s) were malfunctioning would have
| been easier. We never did.
| ArtWomb wrote:
| There's a huge digital divide forming as well. Between the
| hardware a junior software engineer at a well funded research
| institution such as DeepMind has access to. Compared to the
| postdoc in Theoretical Physics at Princeton. Who is expected not
| only to write software. But maintain hardware for a proprietary
| "supercomputer" that was probably cast off ages ago from a
| government lab or wall street.
|
| We don't expect Aerospace / Mechanical engineering students to
| learn metalworking. They typically have access to shop
| technicians for that work. Why not persuade university
| administrators to similarly invest in in-house software
| engineering talent. Generalists who can provide services to any
| problem domain: from digital humanities to deep reinforcement
| learning?
| einpoklum wrote:
| > We don't expect Aerospace / Mechanical engineering students
| to learn metalworking. They typically have access to shop
| technicians for that work.
|
| You'd be surprised, but that is often not the case. Lack of
| sufficient funding, or technicians being dicks, or mis-
| management by PIs, often result in graduate students having to
| do the technical work of metalwork, welding, lab equipment
| calibration, and a bunch of other tasks. Sometimes they even
| have to operate heavier machinery, or lasers etc without the
| minimum reasonable technical staff support.
|
| I know this from my time on the executive committee of my old
| university's Grad Student Organization.
| mattkrause wrote:
| > We don't expect Aerospace / Mechanical engineering students
| to learn metalworking.
|
| Umm...we sorta do.
|
| As a neuroscience postdoc, I have done virtually everything
| from analysis to zookeeping, including some (light)
| fabrication. We outsource really difficult or mass-production
| stuff to pro, and there's a single, very overworked machinist
| who can sometimes help you, but most of the time it's DIY.
| analog31 wrote:
| >>> writing software is a low status academic activity; it is a
| low status activity in some companies, but those involved don't
| commonly have other higher status tasks available to work on.
|
| If measured by compensation, then _research_ is a low status
| activity. Perhaps more precisely, researchers have low bargaining
| power. But I don 't think that academics actually analyze
| activities in such detail. The PI might not even know how much
| programming is being done.
|
| The researcher is programming, not because they see it as a way
| to raise (or lower) their status, but because it's a force
| multiplier for making themselves more productive overall. Though
| I work in industry, I'm a "research" programmer for all intents
| and purposes. I program because I need stuff right away, and I do
| the kind of work that the engineers hate. Reacting to rapidly
| changing requirements on a moment's notice disrupts their long
| term planning. Communicating requirements to an engineer who
| doesn't possess domain knowledge or math skills is painful.
| Often, a working piece of spaghetti code that demonstrates a
| process is the best way to communicate what I need. They can
| translate it into fully developed software if it threatens to go
| into a shipping product. That's a good use of their time and not
| of mine.
|
| >>> Why would a researcher want to invest in becoming proficient
| in a low status activity?
|
| To get a better job. I sometimes suspect that anybody who is good
| enough at programming to get paid for it, is already doing so.
|
| >>> Why would the principal investigator spend lots of their
| grant money hiring a proficient developer to work on a low status
| activity?
|
| Because they don't know how to manage a developer. Software
| development is costly in terms of both time and effort, and
| nobody knows how to manage it. Entire books have been written in
| this topic, and it has been discussed at length on HN. A software
| project that becomes an end unto itself or goes entirely off the
| rails can eat you alive. Finding a developer who can do
| quantitative engineering is hard, and they're already in high
| demand. It may be that the PI has a better chance managing a
| researcher who happens to know how to translate their own needs
| into "good enough" code, than to manage a software project.
| civilized wrote:
| I see people here saying research is like writing software with
| fast-changing requirements. I can see how that could seem like an
| adequate analogy to a software engineer, but it's not.
|
| Researchers use code as a _tool of thought_ to make progress on
| very ambiguous, high-level problems that lack pre-existing
| methodology. Like, how could I detect this theoretical
| astrophysical phenomenon in this dataset? What would it take to
| predict disease transmission dynamics in a complex environment
| like a city? Could a neural network leveraging this bag of tricks
| in some way improve on the state-of-the-art?
|
| If you have JIRA tickets like that in your queue, _maybe_ you can
| compare your job to that of a researcher.
| yudlejoza wrote:
| This topic is near and dear to my heart and at a quick glance, I
| pretty much agree with all/most of this post.
|
| I gained multiple years of industry software engineering
| experience before joining academia (non-CS, graduate-level). And
| I was flabbergasted at the way software and programming is
| treated in research setting where the "domain" is not CS or
| software itself. It took me a few years just to get a hint of
| what on earth these people (my collaborators who program side-by-
| side with me) are thinking, and what kind of mindset do they come
| from.
|
| Then I took a short break and went to the industry. Software
| engineering, hardcore CS; no domain, no BS. I was expecting that
| it would feel like an oasis. It didn't. Apart from a handful of
| process improvements, like use of version control, issue
| tracking, deadline-management, the quality of the tangled mess of
| the code was only slightly better.
|
| Initially I took away the lesson that it's the same in academia
| and industry. But on further reflection there are two big
| differences:
|
| - The codebase I worked on in the industry was at least 10x
| bigger. Despite that, the quality was noticeably better.
|
| - More importantly, I could connect with the my coworkers in the
| industry. If I raised a point about some SwE terminology like
| test-driven dev, agile, git, whatever, I could have a meaningful
| discussion. Whereas in academia, not only most domain experts
| knew jack about 90% of software-engineering concepts and
| terminology, they were expert at hiding their ignorance, and
| would steer the conversation in a way that you wouldn't know if
| they really didn't know or knew too much. I never got over that
| deceitful ignorance mixed with elitist arrogance.
|
| In the end, I do think that, despite enormous flaws, the industry
| is doing way better than academia when it comes to writing and
| collaborating on software and programming, and that the side-by-
| side comparison of actual codebases is a very small aspect of it.
| screye wrote:
| > writing software is a low status academic activity
|
| Yep, that's the one liner right there.
|
| The incentives simply do not match the complaints. Researchers
| already work upwards of 60 hrs/wk on most occasions. Alongside
| writing code, they also have to do actual research, write papers,
| give talks and write grants.
|
| All of the latter tasks are primary aspects of their jobs and are
| commensurately rewarded. The only situation where a well coded
| tool is rewarded, is when a package blows up, which is quite
| rare.
|
| Like all fields, the high-level answer to such questions is
| rather straightforward. The individual contributors align their
| efforts to the incentives. Find a way to incentivize good
| research code, and we will see changes overnight.
| dariosalvi78 wrote:
| I think that incentives play a big role here. Software has near
| to zero value in academic evaluation and even less its update and
| maintenance. The only way to make research software survive is to
| offer packages that other researchers can also use. Maybe.
| Frost1x wrote:
| This is changing drastically. The issue is that more and more
| science relies heavily on computation. Analytic platforms,
| computational science, modeling/simulation, etc. There's less
| "bench" science and more of the scientific process is being
| embedded in software.
|
| There's a certain degree of naivity in this process that SMEs
| think it's a trivial step translating their research into
| software. It's not, not if you demand the rigor science should
| be operating at. As such, many budgets are astronomically lower
| than they should be. This has worked in the past but as more
| science moves into software and it becomes more critical to the
| process, you must invest in the software and it's not going to
| be cheap. The shortcuts taken in the past won't cut it.
|
| There's a bigger issue in that as a society we don't want to
| invest in basic research so it's already cash strapped. Combine
| that with research scientists who already have to cut corners
| with the massive cost quality software will take and you're
| creating a storm where science will either produce garbage or
| well need to reevaluate how we invest in software systems for
| science.
| asdf_snar wrote:
| This article seems to cover research software that even can be
| built. I claim the majority of _code_ written to support research
| articles is a collection of scripts written to produce figures to
| put in the paper. Even when the article is about an algorithm,
| the script that runs this algorithm is just good enough to
| produce the theoretically expected results; it is never tested,
| reproduced, or published, never mind being updated after
| publication.
|
| While others here point out that researchers = bad programmers is
| a lazy excuse, I think it is important to point out just how
| steep the learning curve of computer environments can be for the
| layperson that uses Excel or MATLAB for all their computational
| work. It can be a huge time investment to get started with tools,
| such as git or Docker, that we take for granted. I think
| recognizing this dearth of computer skills is a first step
| towards training researchers to be computer-competent. Currently,
| I find the attitude among academics (especially theorists) to be
| dismissive of the importance of such competencies.
| Doctor_Fegg wrote:
| > it is never tested, reproduced, or published
|
| This never ceases to amaze me. I regularly read recent papers
| on shortest-path algorithms. Each one is religiously
| benchmarked down to the level of saying what C++ compiler was
| used. But the code itself is almost never published.
| txdv wrote:
| Reproducibility is a major principle of the scientific method.
|
| Yet computer scientists consistently fail to achieve
| reproducibility with a tool that is the most consistent at
| following instructions - the computer.
|
| Even private business is on the DevOps movement, because they
| see the positive effects of reproducibility.
|
| If the academic world is truly about science, then there is no
| more excuse, the tools are out there, they need to use them.
| Frost1x wrote:
| This is really an artifact if unreasonable expectations and
| modern software ecosystems. When I say unreasonable
| expectations, the issue is that people assume they can use
| the latest greatest trendy library and get reproducible
| results. Good luck on the level of determinism you're looking
| for.
|
| You need to step back and look at more mature, simple
| codebases and what you can do in those sorts of environments
| when you want reproducibility. You can't cobble together a
| bunch of async services in the cloud and hope your
| Frankenstein tool gives you perfect results. It will give you
| good enough results for certain aspects if you focus on those
| specific aspects (banking does a good job of this with
| transactional processing and making sure values are
| consistent because it's their entire business, maybe your
| account or their web interface is skrewy but that's fine,
| that can fail).
| Dumblydorr wrote:
| I am a research scientist published via R, Stata, and Excel
| analyses. My code documents wouldn't be helpful since the data
| is all locked up due to HIPAA concerns. We're talking names,
| health conditions, scrambled SSN, this isn't reproducible
| because the data is locked to those without security clearance.
|
| The code itself is a ton of munging and then some basic stat
| functions. This information can be gleaned from the methods
| section of the article anyway.
|
| So, really, my field of public health doesn't use GitHub or
| sharing much, there's simply too little benefit to the
| researcher to share their code.
|
| There's an unwarranted fear of getting your work poached. In
| modern science, publications are everything, they determine
| your career. Enabling your direct competitors, those who want
| the same grants and students and glories, is not common in
| science.
| asdf_snar wrote:
| I don't disagree with you on any points. I have some academic
| friends who mostly do "a ton of munging and then some basic
| stat functions", as you say (but with less sensitive data).
| The problem is that their workflow is prone to human error.
| Even though the stat functions are simple, the proper
| labeling of inputs and outputs is less reliable.
|
| I have some research published for which I wrote MATLAB code
| years ago. I trust the fundamental results but not the values
| displayed in the tables. I would have personally benefited
| from rudimentary version control and unit testing.
| vharuck wrote:
| As a public health statistician, I am very grateful to all
| researchers who publish code. More so for those who publish
| packages that make their techniques easy to use. I am not an
| expert in the field of statistics, just a grunt applying what
| you guys devise. It takes a while for me to do enough
| research and testing to be sure I'm correctly implementing
| new techniques. Even a basic pseudo-code walkthrough would
| immensely help.
|
| >My code documents wouldn't be helpful since the data is all
| locked up due to HIPAA concerns. We're talking names, health
| conditions, scrambled SSN, this isn't reproducible because
| the data is locked to those without security clearance.
|
| Is there a standard format for this kind of data? If so,
| consider using it. That way, others can easily create
| artificial datasets to test it. Even if you have no control
| over your data source, you can convert the raw data to the
| standard as a "pre-munging" step.
|
| >So, really, my field of public health doesn't use GitHub or
| sharing much, there's simply too little benefit to the
| researcher to share their code.
|
| Sad but true.
| stult wrote:
| In well designed software, data ingestion should be easily
| separable from the core logic of the application. Which is
| the point the parent comment is making. Some basic best
| practices would allow you to share your core code without
| implicating HIPAA. Even if it's just basic stats, sharing the
| code makes it easier to reproduce your results and to check
| your logic.
|
| Although I agree with your analysis that enabling competitors
| in science is not common, it really, really should be. That's
| kinda the point of publication, at least in theory. Sharing
| knowledge and methods.
| jimmyvalmer wrote:
| > enabling competitors in science ... really, really should
| be.
|
| Said someone whose livelihood doesn't depend on said
| competition.
| RocketSyntax wrote:
| jupyterlab github issue advocating for documenting research:
| https://github.com/jupyterlab/team-compass/issues/121
| lmilcin wrote:
| So here is simple fact.
|
| It does not make sense to judge any piece of code that does not
| meet "highest standard" to be a tangled mess.
|
| There are valid reasons to have varying quality of code and also
| the idea of quality might be changing from problem to problem and
| project to project.
|
| A quality of code that governs your car's ECU should be different
| from quality of code that some research team threw together to
| demonstrate an idea.
|
| A coding project should achieve some kind of goal or set of goals
| as efficiently as possible and in many valid cases quality is
| just not high on the list and for a good reason.
|
| Right now I am working on a PoC to verify an idea that will take
| a longer time to implement. We do this because we don't want to
| spend weeks on development just to see it doesn't work or that we
| want to change something. So spending 2-3 days to avoid
| significant part of the risk of the rest of the project is fine.
| It does not need to be spelled out that the code is going to be
| incomplete, messy and maybe buggy.
|
| There is also something to be said for research people to be
| actually focusing on something else.
|
| Professional developers focus their careers on a single problem
| -- how to write well (or at least they should).
|
| But not all people do. Some people actually focus on something
| else (physics maybe?) and writing code is just a tool to achieve
| some other goals.
|
| If you think about people working on UIs and why UI code tends to
| be so messy, this is also probably why. Because these guys focus
| on something else entirely and the code is there just to animate
| their graphical design.
| sdwvit wrote:
| Yeah but you spend more time debugging, that if you write it
| once with a good architecture and unittests let's say
| bsenftner wrote:
| Not all research software is a tangled mass. I have extensively
| worked as a "quant" (before the term was popular) for math,
| medical, network, media, and physics researchers as my side gig
| for decades. I'd say about 1/3 of the home brewed research
| software is constructed with fairly reasonable assumptions, the
| authors are scientists after all, and I am able to grow their
| basic setup into a framework they intimately understand and
| prefer to use. More than once I've found brilliantly engineered
| software not unlike what I'd find at a pro software development
| firm.
| f6v wrote:
| Keep in mind that there're different kinds of research software.
| Take Seurat[1] as an example. There's CI, issue tracking, etc. It
| might not be the prettiest code you ever seen, but it absolutely
| has to be maintainable as it's being actively developed. Such
| projects are rare, but the low quality is often an indication of
| a software that isn't used by anyone.
|
| 1. https://github.com/satijalab/seurat
| cratermoon wrote:
| Also things like EISPAC, BLAS, LINPACK and so on, for FORTRAN.
| Back in the 70s my dad worked a bit with them when he was
| employed for UTHERCC: The University of Texas Health,
| Education, and Research Computer Center. You can find
| references to UTHERCC in papers from that era.
|
| Come to think of it, something like UTHERCC might be exactly
| what is needed to help the current situation.
| deeeeplearning wrote:
| Why is this surprising? Has anyone been inside a Chemistry or Bio
| lab? You think that what happens in those labs to get research
| done is industrial grade?
| chilukrn wrote:
| I agree the post makes valid points, but is there anything new in
| that? It had been discussed several times here and on other
| forums as well. "RSE" is just another made-up position with a
| very average pay structure -- even this is not new.
|
| However, RSEs (or just general software training) may help
| research groups establish a structure on how to format code, put
| some standards in place, and at least have some basic tests. This
| way, more people can read/modify the code efficiently (more = not
| necessarily general public, but it at least helps incoming grad
| students/postdocs to pick up the project easily).
| TomMasz wrote:
| I once interviewed for a programming job that was a bit of bait
| and switch. The hiring manager showed me a foot-high stack of
| green bar paper that was Fortran code written by optical
| scientists that I was expected to convert to C. He was somewhat
| surprised when I declined and ended the interview. I pity whoever
| got stuck with that task.
| nowardic wrote:
| A nice counter example of research software code that adheres to
| general software engineering best practices and is easy to pick
| up and use is the OSMNX project: https://github.com/gboeing/osmnx
|
| Props to Geoff for setting a nice standard.
| cratermoon wrote:
| I was reminded that there are research packages like LINPACK,
| BLAS, and EISPACK for FORTRAN (and some other languages) that
| have been maintained since the 70s and are still in use.
|
| Back in the 70s my dad was working for an organization called
| UTHERCC, the University of Texas Health, Education, and Research
| Computer Center, and these libraries were some of the code he
| worked with.
|
| You can find references to UTHERCC in papers from the time,
| although I don't think it exists under that name. Maybe
| institutions need something like UTHERCC as an ongoing department
| now.
| milliams wrote:
| Disclaimer: I am one of the trustees of the mentioned charity,
| The Society of Research Software Engineering.
|
| You say that you don't see it having much "difference with regard
| status and salary". The problem here is two-fold. Firstly,
| salaries at UK universities are set on a band structure and so an
| RSE will earn a comparable amount to a postdoc or lecturer. These
| aren't positions that are known for high wages and historically
| the reason that people work in research is not for a higher
| salary.
|
| As for status, I can see that the creation of the Research
| Software Engineer title (since about 2012) has done great good
| for improving the status of people with those skills. Before they
| were "just" postdocs with not many papers but now they can focus
| on doing what they do best and have career paths which recognise
| their skills.
|
| My role (at the University of Bristol -
| https://www.bristol.ac.uk/acrc/research-software-engineering...)
| is focused almost entirely on teaching. I'm not trying to create
| a new band of specialists who would identify as RSEs but rather
| provide technical competency for people working in research so
| that the code they write is better.
|
| There is a spectrum of RSEs from primarily research-focused
| postcode who write code to support their work along to full-time
| RSEs whose job is to support others with their research (almost a
| contractor-type model). We need to have impact all the way along
| that spectrum, from training at one end to careers and status at
| the other.
|
| For more info on the history of the role, there's a great article
| at https://www.software.ac.uk/blog/2016-08-17-not-so-brief-
| hist... written by one of the founding members of the Society of
| Research Software Engineering.
| k__ wrote:
| I did some research projects, but the problem is that they are a
| mix of regular projects and experiments.
|
| Things like Nix worked out great, but other stuff I saw is a
| tangled mess of Java grown over the last 10 years, written by 30
| different students that didn't talk or let alone knew each other.
| shadowgovt wrote:
| One of the biggest eye-openers for me as an undergrad was when,
| upon getting to the point where I'd have to decide whether to
| pursue graduate education or exit academia and join the
| workforce, I began to look at the process for publishing novel
| computer science.
|
| To be clear, novel computer science is valuable and the lifeblood
| of the software engineering industries. But the actual product? I
| discovered of myself that I like quality code more than I like
| novel discovery, and the output of the academic world ain't it.
| Examples I saw were damn near pessimized... not just a lack of
| comments, but single-letter variables (attempting to represent
| the Greek letters in the underlying mathematical formulae) and
| five-letter abbreviated function names.
|
| I walked away and never looked back.
|
| If there's one thing I wish I could have told freshman-year me,
| it's that software as a discipline is extremely wide. If you find
| yourself hating it and you're surprised you're hating it, you may
| just be doing the kind that doesn't mesh with your interests.
| sumanthvepa wrote:
| What I find really surprising about research software, is that
| even people in Computer Science write poorly designed code as
| part of their research. I would have imagined that they would be
| better qualified to create good code. Just goes to show that
| Software Engineering != Computer Science.
| knuthsat wrote:
| I think there's not enough researchers that publish code.
|
| For example, discrete optimization research (nurse rostering,
| travelling salesman, vehicle routing problem, etc.) is filled
| with papers where people are evaluating their methods on public
| benchmarks but code never sees the day. There's a lot of state-
| of-the-art methods that never have their code released.
|
| I'm pretty sure it's like that elsewhere. Machine learning and
| deep learning for some reason has a lot of code in the open but
| that's not the norm.
|
| I'd prefer the code to be open first. Once that's abundant then I
| might prefer the code to also be well designed.
| lou1306 wrote:
| > I think there's not enough researchers that publish code.
|
| I agree, although lately there's been some effort by academia
| to make authors publish their code, or at least disclose it to
| the reviewers.
|
| Several conferences have an artifact evaluation committee,
| which tries to reproduce the experimental part of submitted
| papers. Some conferences actually _require_ a successful
| artifact evaluation to be accepted (see, for instance, the tool
| tracks at CAV [1] and TACAS [2]).
|
| Others, while not requiring an artifact evaluation, may
| encourage it by other means. The ACM, for instance, marks
| accepted papers with special badges [3] reflecting how well the
| alleged findings can be reproduced and whether the code is
| publicly available.
|
| [1] http://i-cav.org/2021/artifact-evaluation/
|
| [2] https://etaps.org/2021/call-for-papers
|
| [3] https://www.acm.org/publications/policies/artifact-review-
| an...
| cratermoon wrote:
| This feels like the right approach. If peer review were to
| include artifact evaluation, including some kind of code
| review, and require certain standards be met for acceptance,
| things would change. As others have noted here, the
| mechanisms of grant-funded work strongly discourage attention
| to code quality, and that would have to change as well.
|
| I'm not in academia now, but I started out my career doing
| sysops and programming in a lab at a medical school and have
| worked with academics a bit since. I don't do it much because
| it's basically volunteer work, and it's almost impossible to
| contribute meaningfully unless you are also well-versed in
| the field.
| svalorzen wrote:
| I don't really agree with the reasons given, even though my
| conclusions are the same. The main reason why research code
| becomes a tangled mess is due to the intrinsic nature of
| research. It is highly iterative work where assumptions keep
| being broken and reformed depending on what you are testing and
| working on at any given time. Moreover, you have no idea on
| advance where your experiments are going to take you, thus giving
| no opportunity to structure the code in advance so it is easy to
| change.
|
| To make a concrete example, imagine writing an application where
| requirements changed unpredictably every day, and where the scope
| of those changes is unbounded.
|
| The closest to "orderly" I think research code can become would
| be akin to Enterprise style coding, where literally everything is
| an interface and all implementation details can be changed in all
| possible ways. We already know how those codebases tend to end..
| Bukhmanizer wrote:
| As someone who has been on both the research and industry
| software end, there's really not that much difference.
| Requirements change, you build that into your plans. Frankly, a
| lot of best practice software development that gets totally
| ignored by academia (e.g. OOP) can handle this exact case, and
| makes things way more flexible.
|
| If the problem was only unpredictability, then projects with a
| clear and defined end goal (eg, a website to host results)
| would be of substantially higher quality. But they're not. Well
| defined projects tend to end up basically just as crappy as
| exploratory projects.
|
| The problem is evaluation and incentives. There's literally no
| evaluation of software or software development capability in
| the industry. I know of a researcher that held a multimillion
| dollar informatics grant for 3 years. In that 3 years they
| literally did nothing except collect money. Usually there are
| grant updating mechanisms, and reports, but he bsed his way
| through that knowing there's a 0.0000000% chance that any
| granting agency is going to look through his code. The fraud
| was only found because he got fired for unrelated activities.
|
| I once looked up older web projects on a grant. 4/6 were
| completely offline less than 2 years after their grants
| completed. For 2 of those 4, it's unclear whether the site ever
| completed in the first place.
| paulclinger wrote:
| > I know of a researcher that held a multimillion dollar
| informatics grant for 3 years. In that 3 years they literally
| did nothing except collect money.
|
| I wonder if a whistleblower payout similar to the one that
| SEC is doing for 1M+ fines (10-30%) would help in cases like
| this. The host organization would potentially be on the hook
| as well, so there is going to be a significant incentive to
| not let that happen (especially with all the associated
| reputational damage).
| qmmmur wrote:
| I can tell you why the sites went offline, because the
| funding stopped. I don't know what you're research background
| is but its painful to even get 5 GBP a month to host a
| droplet on digital ocean in a pretty lucrative department
| with liberal internal funding.
| Bukhmanizer wrote:
| Agreed, but all these little things are just a sign that
| the industry just does not give a shit about software. They
| _could_ develop mechanisms to fund this stuff, pretty
| easily actually. But they don't.
|
| A couple of other weird inequities that I've found are: 1.
| It's hard to get permission to spend money on software
| subscription based licenses since you won't "have anything"
| at the end. However, it's much easier to get funding for
| hardware with time based locks (e.g after 3 years the
| system will lock up and you have to pay them to unlock).
| The end result is the same, you can't use the hardware
| after the time period is up, but for some reason the admin
| feels much more comfortable about it.
|
| 2. It's hard get funding to hire someone to set up a
| service to transfer large amounts of data from different
| places. It's much easier to hire someone to drive out to a
| bunch of places with a stack of hard drives and manually
| load the data on them, and drive back. Even if it's 2x more
| expensive and would take longer. Why? Again my speculation
| is that the higher ups are just more comfortable with the
| latter strategy. They can picture the work being done in
| their head, so they know what they're paying for.
| mschuster91 wrote:
| > The end result is the same, you can't use the hardware
| after the time period is up, but for some reason the
| admin feels much more comfortable about it.
|
| Simple: predictability. With a subscription based model,
| admin has to deal with recurring (monthly / yearly)
| payments, and the possibility is always there that
| whatever SaaS you choose it gets bought up and
| discontinued. Something you own and host yourself, even
| if it gets useless after three years, does not incur any
| administrative overhead and there is no risk of the
| provider vanishing. Also, there are no "surprise auto
| renewals" or random price hikes.
|
| > 2. It's hard get funding to hire someone to set up a
| service to transfer large amounts of data from different
| places.
|
| Never underestimate the bandwidth of a 40 ton truck
| filled with SD cards. Joke aside: especially off-campus
| buildings have ... less than optimal Internet / fibre
| connections and those that do exist are often enough at
| enough load to make it unwise to shuffle large amounts of
| data through them without disrupting ongoing operations.
| selimthegrim wrote:
| Louisiana state government spent a buttload of money on
| dedicated high speed fiber optic lines between a bunch of
| different universities in the state for
| videoconferencing, telenetworking, "grid computing" etc.
| 10 years later the only people who remember how to use
| the system are at LSU, rendering the purpose moot.
| Everyone else just uses Zoom or Skype.
|
| https://www.regents.la.gov/assets/docs/Finance_and_Facili
| tie...
| pbourke wrote:
| Is N years of opex not part of the budget in grant
| applications?
| qmmmur wrote:
| In research no and it would depend entirely on your
| institution. For example, I looked at a job putting
| together a portal for people to freely examine the
| research put together for a research team. The project
| had secured a connection with the british museum, and so
| that website would live on under that. However, if the
| project had asked to host it themselves even for 60$ a
| year for 10 years the answer would be no. Funding grants
| see small opex that extend beyond the life of the project
| to be open to corruption or just too facile to fund,
| wrongly or rightly.
| matthewdgreen wrote:
| >I know of a researcher that held a multimillion dollar
| informatics grant for 3 years. In that 3 years they literally
| did nothing except collect money.
|
| I hate that every HN post about academia ends with an
| anecdote describing some rare edge-case they've heard about.
| Intentional academic fraud is a very small percentage of what
| happens in academia. Partly this is because it's so stupid:
| academia pays poorly compared to industry, requires years to
| establish a reputation, and the systems make it hard to
| extract funds in a way that would be beneficial to the
| fraudster (hell, I can barely get reimbursed for buying pizza
| for my students.) So you're going to do a huge amount of work
| qualifying to receive a grant, write a proposal, and your
| reward is a relatively mediocre salary for a little while
| before you shred your reputation. Also, where is your
| "collected money" going? If you hire a team, then you're
| paying them to do nothing and collude with you, and your own
| ability to extract personal wealth is limited.
|
| A much more common situation is that a researcher burns out
| or just fails to deliver much. That's always a risk in the
| academic funding world, and it's why grant agencies rarely
| give out 5-10 year grants (even though sometimes they should)
| and why the bar for getting a grant is so high. The idea is
| to let researchers do actual work, rather than having teams
| manage them and argue about their productivity.
|
| (Also long-term unfunded project maintenance is a big, big
| problem. It's basically a labor of love slash charitable
| contribution at that point.)
| Bukhmanizer wrote:
| > I hate that every HN post about academia ends with an
| anecdote describing some rare edge-case they've heard about
|
| This isn't a rare edge case, this is very common in
| software projects. I've heard of it because I was part of
| the team brought in to fix the situation.
|
| Intentional fraud only is rare when it's recognized as
| fraud. P-hacking was incredibly widespread (and to some
| extent still is) because it wasn't recognized as a form of
| fraud. Do you really think not delivering on a software
| project has any consequences? Who is going to go in and say
| what's fraud, what's incompetence, and what's bad luck?
|
| The problem is that the bar for getting software grants
| isn't high, it's nonsensical. As far as I can tell, ability
| to produce or manage software development isn't factored in
| at all. As with everything else, it's judged on papers, and
| the grant application. In some cases, having working
| software models and preexisting users end up being
| detrimental to the process, since it shows less of a "need"
| for the money. You get "stars" in their field, who end up
| with massive grants and no idea of how to implement their
| proposals. Conversely, plenty of scientists who slave away
| on their own time on personal projects that hundreds of
| other scientists depend on get no funding whatsoever.
| lazyjeff wrote:
| Just curious, what kind of 3-year informatics grant not
| being completed ends up with a team brought in to fix the
| situation? Multi-million dollar grants don't sound big
| enough to be a dependency for any major customer (like
| defense or pharma), so I imagine if fraud was detected,
| they would just demand a reimbursement and ban the PI.
|
| But I think you're both right in some sense. The cases of
| intentional major fraud is probably a rare edge case and
| they make the news when they're uncovered. But there's a
| lot of grey-ish area like p-hacking as you mentioned,
| plus funding agencies know there needs to be some
| flexibility in the proposed timeline due to realities.
| Realities like you don't necessary get the perfect
| student for the project right when the grant starts, as
| the graduate student cycle is annual, plus the research
| changes over time and it isn't ideal to have students
| work on an exact plan as if they are an employee.
|
| But I totally agree that maintaining software that people
| are using should be funded and rewarded by the academic
| communities. A possible way to do this is have a
| supplement so that after a grant is over, people who have
| software generated from the grant that is used by at
| least 10 external parties without COI, should be funded
| 100K/yr for however many years they are willing to
| maintain and improve it. Definitions of what this means
| needs to be carefully constructed, of course.
| Bukhmanizer wrote:
| I'll be a bit vague to protect my coworker's privacy, but
| the scientist was fired for other, unrelated violations,
| and my boss was brought in to replace him. I think he was
| leading an arm of a "U" grant, so he wasn't the only
| senior PI on it. Since they handled it internally, they
| couldn't just demand a reimbursement. On some level
| administration knew that the project wasn't moving
| forward, but once we started asking around, it was clear
| that there was no effort to start the project at all.
|
| >But I totally agree that maintaining software that
| people are using should be funded and rewarded by the
| academic communities. A possible way to do this is have a
| supplement so that after a grant is over, people who have
| software generated from the grant that is used by at
| least 10 external parties without COI, should be funded
| 100K/yr for however many years they are willing to
| maintain and improve it. Definitions of what this means
| needs to be carefully constructed, of course.
|
| I think that this is a great idea.
| j-pb wrote:
| There's only one way to solve this: Simplicity.
|
| Ironically this is also what occams razor would demand from
| good Science, so you'd have a win win scenario, where you both
| create good software and good research, because you focus on
| the simplest most minimal approach that could possibly work.
| WanderPanda wrote:
| In my experience simplicity and generality don't go well with
| performance. If you want to build something that can be used
| for all kinds of problems and it is simple it will be slow as
| hell compared to the (dirty) optimised code running hardcoded
| structures on the GPU
| j-pb wrote:
| Simplicity pretty much excludes generality in a lot of
| cases, you're only able to port code to the GPU if it
| wasn't a million LOC to begin with, so you're pretty much
| making the case for it.
|
| Note that Simple != Easy or Naive
|
| Hardcoded structures is potentially exactly the kind of
| simplicity needed.
|
| What's not simple is a general "this solves everything and
| beyond" code-base with every imaginable feature and legacy
| capability.
| gmueckl wrote:
| How do you keep a codebase simple when you need have things
| in it like implementations of state of the art algorithms to
| compare against and the previous iterations of your own
| method so that you can test whether you're actually
| improving? Then, depending on what you're doing, there's also
| all the extra nontrivial code for tests and sanity checks of
| all these implementations.
|
| Simplicity is a nice dream. The realities of research are
| very often stacked against it.
| j-pb wrote:
| How the heck to you hope to gain any insighfull metrics
| when you've got a cobbled together mess that you only half
| understand. For what it's worth you might only be
| benchmarking random code layout fluctuations.
|
| I've seen research groups drown in their legacy code base.
|
| The issue of juggling too many balls you describe is one
| you only have to begin with because the state of the art
| implementations are so shoddy to begin with.
|
| Research suffers as much as everybody else from feature
| creep. Good experiments keep the number of new variables
| low.
| gmueckl wrote:
| Research code is not only written to measure runtime.
| Reducing the argument to only that aspect is not helping
| the discussion.
|
| And you say it yourself: good experiments change a single
| variable at a time. So how do you check that a series of
| potential improvements that you are making is sound?
| tylermw wrote:
| > good experiments change a single variable at a time
|
| Although this is a tangent from the above conversation,
| this isn't actually true: well-designed experiments can
| indeed change multiple variables at the same time.
| There's an entire field of statistics dedicated to
| experimental design (google "factorial designs" for more
| information). One-factor-at-a-time (OFAT) experiments are
| often the least efficient method of running experiments,
| although they are conceptually simple.
|
| See the following article for a discussion:
| http://www.engr.mun.ca/~llye/czitrom.pdf
| childintime wrote:
| It seems Julia has the answer:
| https://arstechnica.com/science/2020/10/the-unreasonable-
| eff...
| gmueckl wrote:
| I can't quite follow what the article is trying to
| describe because of the heavy use of analogies.
|
| A Google search makes it look like Julia has a mechanism
| where you can extent the sets of overloads of a function
| or method outside the original module. The terminology is
| different (functions have methods instead of overloads in
| their speak). I don't see how that feature solves the
| problem in practice.
| f6v wrote:
| > The main reason why research code becomes a tangled mess is
| due to the intrinsic nature of research. It is highly iterative
| work where assumptions keep being broken and reformed depending
| on what you are testing and working on at any given time.
|
| Oh, boy, how many times have I heard this working at a startup.
| There is some truth to it, it's hard to organise code in the
| first weeks of a new project. But if you work on something for
| 3+ months, it becomes a matter of making a conscious effort to
| clean things up.
|
| > To make a concrete example, imagine writing an application
| where requirements changed unpredictably every day,
|
| Welcome to working with product managers at any early stage-
| company. Somehow I managed to apply TDD and good practices most
| of the time. Moreover, I went back to school after 7+ years
| developing software full-time. I guarantee that most of the
| low-quality research code is a result of a lack of discipline
| and experience in writing maintainable software.
| warkdarrior wrote:
| > I guarantee that most of the low-quality research code is a
| result of a lack of discipline and experience in writing
| maintainable software.
|
| Bingo! Most research code is written by graduate students who
| never had a job before, so they do not know how to write
| maintainable software. You are definitely the exception, as
| you held a software dev job before going back to school.
| f6v wrote:
| Some researchers from top-10 schools still publish python2
| code in 2020. I don't have an explanation for that. It's
| not even a lack of experience, but something on another
| level.
| Symbiote wrote:
| Mathematics doesn't suddenly stop working because your
| interpreter is a bit old.
| statstutor wrote:
| >It is highly iterative work where assumptions keep being
| broken and reformed depending on what you are testing and
| working on at any given time.
|
| This is describing infinitely fast and efficient p-hacking
| (i.e. research that is likely to produce invalid results).
|
| If your assumptions are broken then that should ideally be
| reported as part of your research.
|
| When you do research, you ideally start out with fixed
| assumptions, and then test those assumptions. The code required
| to do this can be buggy (and can therefore get fixed), and you
| can re-purpose earlier code, but the assumptions/brief
| shouldn't change in the middle of the coding it up.
|
| If you aren't following the original brief, you've rejected
| your original research concept and you're now doing a different
| piece of research than you started out - and this is no longer
| a sound piece of research.
|
| Research _should_ be highly dissimilar to a web design project
| in this respect.
|
| The reason these projects often become a tangled mess is
| because researchers don't have the coding skill to program any
| other way (in my opinion, and nor do institutions invest
| sufficiently in people who do have this skill).
| ellimilial wrote:
| There certainly is quite a lot to be said about constant
| requirements drift. However, this is not something untypical to
| some of fast-paced product work or, even more closely, r&d
| effort within the industry.
|
| What then drives the improvement of the code quality is the
| potential need for continuity and knowledge retention - either
| in the form of iterative cleaning of the debt or the re-write.
| This is reliant on the perceived value for the organisation.
| From this perspective it's more straightforward to get to
| author's reasons.
| xgb84j wrote:
| I think software quality in research has nothing to do with the
| problems themselves. It's more like article suggests that
| nobody cares about your software. The only goal is to get
| published and be cited as many times as possible. Your coding
| mistakes don't matter if they cannot be found out or hurt your
| reputability.
|
| How many tests would be written for business software if it had
| only to run for one meeting and then never be looked at again?
| yummypaint wrote:
| There seems to be an underlying assumption in many of these
| posts that code has no value once papers are published. This
| hasn't been my experience working in a research environment
| at all. The big, complex pieces of code are almost always re-
| used in some way. For example, theory collaborators send us
| their code so we can generate predictions from their work
| without bothering them. Probably 50% or more (and usually the
| most important parts) of the code written to process
| experimental data ends up in other experiments. From the
| perspective of an individual experimentalist, there is
| tremendous value in creating quality code that can be easily
| repurposed for future tasks. This core code tends to follow
| the individual in their career. In some ways it's an
| extension of commonly used mental tools, and there are
| diverse incentives to maintain it.
| virgo_eye wrote:
| Do you think that people doing research at large technical
| organizations structure their code in the same way as
| academics? No, although there's always a portion which is
| active and unstable, they create packages, define interfaces,
| abstract out pieces which can be reused reliably and depended
| on. Similarly for other types of researchers in fields where
| the code is considered an important product. Eg. if you are
| doing research in compiler design, you're likely to want to
| create a compiler which can be used by other people. So you
| make a stable thing with tests, automated builds and so on. And
| you delimit and instrument the experimental parts.
|
| The real reason is the incentives. Not just are there no
| incentives to produce good quality code, there are incentives
| which make people focus on other outputs. Publish or perish
| means that people put up with technical debt just to get to the
| next result for the next paper, then do it again and again.
| Frost1x wrote:
| >The real reason is the incentives. Not just are there no
| incentives to produce good quality code, there are incentives
| which make people focus on other outputs. Publish or perish
| means that people put up with technical debt just to get to
| the next result for the next paper, then do it again and
| again.
|
| I believe this is true and is fueled by a misconception of
| what software is in research. Software in research is often
| akin to experimentalist work in the past. It's tacked onto
| theoretical work projects as an afterthought and not treated
| as what it really is: forcing the theory to be tested in a
| computational environment.
|
| If we start treating research software like experimentalism
| in the past, we might get a bit more rigor out of the
| development process as well as the respect it really
| deserves.
| Xelbair wrote:
| >To make a concrete example, imagine writing an application
| where requirements changed unpredictably every day, and where
| the scope of those changes is unbounded.
|
| That sounds like software development, alright. It takes a
| while for domain experts to learn that if programmer ask "is X
| always true/false", they mean that there are no exceptions from
| that rule.
|
| I would like for researchers to just name variables sensibly.
| Even that would improve code quality a lot.
|
| Still the key problem is that there are zero incentives for
| researchers to even make their code readable! It does not
| improve any of the metrics they are judged by.
| leecarraher wrote:
| Yes, not pointing out the difference between coding some novel
| technique and a well defined software project, completely
| misses the reason the code is often not well organized.
| Suggesting that researchers are bad programmers is just a lazy
| excuse, somewhat damaging, and by no means the rule. I wrote a
| large complex framework for my research and the very nature of
| it causes me to add modules and techniques for parts I didn't
| know would work. And at times hard forks for when I wanted to
| try something new, which merging back would be impossible to do
| cleanly. At times you have a hunch and like a fever dream,
| change who knows what, but you just have to see something
| through. There is no waterfall method, kanban and agile makes
| no sense here and even unit tests are I'll defined.
| User23 wrote:
| This sounds like my software development methodology when I
| was in my early teens. I was certainly able to get things
| done and explore all kinds of things (I was doing game dev of
| course), but the code was a mess and I didn't even have a
| mature understanding that it was. I just thought that was how
| programming was and you just had to be really smart to keep
| things straight in your head.
| commandlinefan wrote:
| > imagine writing an application where requirements changed
| unpredictably every day
|
| Imagine?
| Zababa wrote:
| > The main reason why research code becomes a tangled mess is
| due to the intrinsic nature of research. It is highly iterative
| work where assumptions keep being broken and reformed depending
| on what you are testing and working on at any given time.
| Moreover, you have no idea on advance where your experiments
| are going to take you, thus giving no opportunity to structure
| the code in advance so it is easy to change.
|
| I'd say you're confirming the author's theory that writing code
| is a low-status activity. Papers and citations are high-status,
| so papers are well refined after the research is "done". Code,
| however, is not. If the code was considered on the same level
| as the paper, I think people would refine their code more after
| they finish the iteration process.
| svalorzen wrote:
| Yes... and no. It is true that after a result is obtained,
| one could clean up the code for publication. And it is true
| that coding is not seen add first class at the moment.
|
| At the same time, you need to consider that such a clean up
| is only realistically helpful for other people to check
| whether there are bugs in the original results, and not much
| else. Reproducing results can be done with ugly code, and
| future research efforts will not benefit from the clean up
| for the same reasons I outlined in my previous post.
|
| While easing code review for other people is definitely
| helpful (it can still be done if one really wants to, and
| clean code does not guarantee that people will look at it
| anyway), overall the gains are smaller than what "standard"
| software engineers might assume. And I'm saying this as a
| researcher that always cleans up and publishes his own code
| (just because I want to mostly).
| jmcdl wrote:
| Shouldn't checking for bugs be of primary importance. How
| many times have impressive research results turned out to
| be a mirage built upon a pile of buggy code? I get the
| sense that is far too common already.
| pessimizer wrote:
| > How many times have impressive research results turned
| out to be a mirage built upon a pile of buggy code?
|
| You're actually making bugs sound like a feature here.
| I'm pretty sure that if you've gotten impressive results
| with ugly code, the last thing you want to do is touch
| the code. If you find a bug, you have no paper.
| throwaway6734 wrote:
| I am under the impression that most authors do not even
| publish functioning code when publishing ML/DL papers which
| I find to be absurd. The paper is describing software. Imo
| the code is more important than the written word.
| Zababa wrote:
| > At the same time, you need to consider that such a clean
| up is only realistically helpful for other people to check
| whether there are bugs in the original results, and not
| much else.
|
| I assumed that most code published could be directly useful
| as an application or a library. Considering what you're
| saying, this might be only a minority of the code. In that
| case, I agree with your conclusion about smaller gains.
| jonnycomputer wrote:
| Most academic code runs once, on one collection of data,
| on a particular file system.
|
| Academic code can be really bad. But most of the time it
| doesn't matter, unless they're building libraries,
| packages, or applications intended for others. That's
| when it hurts and shows.
|
| I'm a research programmer. I have a master's in CS. I
| take programming seriously. I think academic programmers
| could benefit from better practice. But I think software
| developers make the mistake of thinking that just because
| academics use code the objective is the same or that best
| practices should be the same too. Yes, research code
| should perform tests, though that should mostly look like
| running code on dummy data and making sure the results
| look like you expect.
| geebee wrote:
| I know a lot of "research programmers" (meaning people
| who write code in research labs but are not themselves
| the researchers or investigators on a study), and they
| often have MS degrees in CS - though actually, highly
| quantitative masters degrees where very elaborate code is
| used to generate answers is a bit more common than CS per
| se (math, operations research, branches of engineering,
| bioinformatics, etc).
|
| Here's the thing - in industry, this background (quant
| undergrad + MS, high programming ability, industry
| experience) is kind of the gold standard for data science
| jobs. In academic job ladders it's... hmm. Here's the
| thing - by the latest data, MS grads in these fields from
| top programs are starting at between 120k-160k in
| industry, and there are very good opportunities for
| growth.
|
| I actually think that universities and research centers
| can compete with highly in demand workers in spite of
| lower salaries, but highly talented people in demand will
| not turn away an industry job with salary _and_
| advancement potential to remain in a dead end job.
| galangalalgol wrote:
| Yeah my standard quote about research code is that it is
| not the product, so it is ok thta it is bad. The results
| are the product and those need to be good. Someday
| someone will take those results (in the form of some data
| or a paper) and make a software product, and that should
| be good.
| [deleted]
| geebee wrote:
| I've worked with a lot of research code. I agree with you that
| tangled code is somewhat intrinsic to the kind of code written
| for research.
|
| Here's the thing. Sometimes, there's no code - I mean, they'll
| find something, but nobody can say, with certainty, that it is
| the code that generated the data or results you're trying to
| recreate. There's often no data - and by that, I mean, nothing,
| not even a dummy file so you can tell if it even runs or
| understand what structure the data needs to be in. No build, no
| archive history, no tests. And when I say no tests, I'm not
| talking about red/green bar integration and unit tests, I mean,
| ok, the code ran... was this what it was supposed to produce?
|
| Many of these projects are far, far more messed up than the
| intrinsic nature of research would explain - though I will
| again agree that research code may be unusually likely to
| descend into entropy.
| lmm wrote:
| > To make a concrete example, imagine writing an application
| where requirements changed unpredictably every day, and where
| the scope of those changes is unbounded.
|
| I don't have to imagine it, I'm employed in the software
| industry.
|
| Seriously, nothing you describe sounds any different from
| normal software development.
| Dumblydorr wrote:
| In my world, it does sound different, I work with HIPAA data
| that takes months to get access to. So sharing your code is
| borderline unacceptable to some orgs, even if it itself
| doesn't have any privacy data, there's a mass paranoia that
| you'll accidentally leak patient data, which can lead to
| fines of 2 million USD.
| burntoutfire wrote:
| The only difference is speed IMO. Sure, new requirements
| appear and they can wildly change the underlying assumptions
| of the systems - but usually, in such case we're given months
| or years to adapt/rewrite the system in a systematic manner.
| If, for every wild idea the researcher wants to explore, this
| amount of rigor was applied in its implementation, I'm
| guessing the research would slow down immensely. BTW most of
| research code written for chasing dead ends (quickly testing
| some small hypetheses), and will be discarded without sharing
| with anyone - so, investing into writing it properly seems
| especially wasteful.
| wbl wrote:
| The program I wrote for my dissertation is as good as it
| needs to be for a program that had to run once!
| [deleted]
| fabian2k wrote:
| There is no real incentive to organize and clean up the code,
| even if the scientists involved have the skills to write well-
| organized software. And organizing this kind of code that often
| starts in a more exploratory way is a pretty large amount of
| additional effort. This kind of effort is simply not appreciated,
| and if spending time on it means you publish fewer papers it's a
| net negative for your career.
|
| I'd settle for just publishing the code at all, even if it is a
| tangled mess. This is still not all that common in the natural
| sciences, though I have a bit of hope this will change.
| hnedeotes wrote:
| Yeah I mean, if your study is implying the apocalypse (or even
| if not, but more so if that's the case) you better put the code
| there, because that's what the scientific method requires, how
| should I believe your conclusions or and cute graphs if I can't
| see how you arrived at it? Maybe it was drawn in Narnia for all
| I know, maybe it has significant errors, or it's so tailored to
| produce those results that it's irrelevant.
|
| And if the tools and methods you used for arriving at them are
| so messy that you dare not publish them what does that tell me
| about: - your process; - the organisation of your ideas; - the
| conclusions or points made in the paper?
|
| I don't mean it has to be idiomatic well written code, but it
| should be readable enough to be followed.
| [deleted]
| mkl95 wrote:
| If you want to write good research software, a good way is to
| have professional developers implement it.
|
| I worked closely with an NLP researcher for a while on a project
| that had received a hefty state grant. She knew more or less what
| her team needed, but she needed someone to implement it cleanly
| and in a way that would not make users step on each others toes.
|
| The chances of that project being a buggy mess would have been
| pretty high if it had been written by people who don't write
| software for a living. And maybe that's OK.
| mattkrause wrote:
| Here's the problem with hiring a pro.
|
| The workhorse NIH grant[0] is a R01 with a $250,000/year x 5
| years "modular" budget. Most labs have, at most, one. Some have
| two, and a very few have more than that. This covers everything
| involved in the research: salaries (including the prof's),
| supplies, publication fees, etc. Suppose you find a programmer
| for $75k. With benefits/fringe (~31% for us, all-in), that's
| nearly $100k/year. If the principal investigator (prof,
| usually) takes a similar amount out of the grant, there's very
| little money left to do the (often very expensive) work. In
| contrast, you can get a student or postdoc for far less--and
| they might even be eligible for a training grant slot, TAship,
| or their own fellowship, making their net cost to the lab ~$0.
|
| This would be easy to fix: the NIH already has a program for
| staff scientists, the R50. However, they fund like two dozen
| per year; that number should be way higher.
|
| [0] Other mechanisms exist at the NIH--and elsewhere--but NSF
| (etc) grants are often much smaller.
| qudat wrote:
| > In contrast, you can get a student or postdoc for far less
|
| Yeah I totally agree on this part. The academic system relies
| not on monetary compensation for its labor, rather it
| provides them reputation by getting their names on a paper.
|
| I worked essentially for free for a lab in my spare time for
| 4 years. They get to the result they want, even if its built
| on a shaky foundation, and for basically free (it doesn't
| cost anything to put a name on a paper). At the end of the 4
| years the dream of getting my name on a paper didn't even pan
| out (lab was ramping down and was essentially a teaching
| research lab by the time I showed up).
| einpoklum wrote:
| One of the things which has helped derail my own research career
| [1] is the tendency to not write tangled-mess code, and to
| publish and maintain much of my research code after I was
| supposed to be done with it.
|
| Annoyingly, more people now know of me due to those pieces of
| software than for my research agenda. :-(
|
| [1] : Not the only thing mind you.
| jimmyvalmer wrote:
| C'mon, no good deed goes unpunished. Everyone knows that.
| nirse wrote:
| What doesn't really get mentioned in the article, is that a lot
| of academic software was written by a single developer. All
| bigger software projects, academic or not, that were only built
| and maintained by a single person tends to become messier and
| messier with time. Perhaps most software suffers from that, that
| over time it becomes a mess, but having more developers look at
| code (and enough time, and many other factors) can certainly help
| to keep things in better shape.
| Robotbeat wrote:
| Plus it becomes impossible to get multiple developers to work
| on the code if they can't understand it because of its
| messiness, so there's a bit of survivor's bias and stronger
| motivation to clean the code up to be comprehensible to others
| when you have multiple people working on it.
|
| Also, I feel personally attacked by the headline. :)
|
| It is for this reason I try to keep my code and models pretty
| simple, only two or three pages of code (or ideally a single
| page), and I don't try to do too many things with one program,
| and I choose implementations and algorithms that are simpler to
| implement to make concise code feasible (sometimes at the
| expense of speed or generality).
| SavageBeast wrote:
| I'm currently refactoring a fairly large piece of research code
| myself. It was written with lean startup thinking in that a
| little code ought to produce some value in its results. If i was
| able to eeek some usefulness out of this code, then Id put more
| energy into it. Otherwise I was perfectly happy to Fail Fast and
| Fail Cheap.
|
| How did it become such a mess in the first place? Simple - I
| didn't know my requirements when I started writing it. I built it
| to do one thing. In running it I learned more things (this is
| good - why you build stuff like this in the first place). The
| code changed rapidly to accommodate these lessons.
|
| It wasn't long before I was running into limitations in the
| design of the underlying libs I was using etc. Of course I could
| find a way to make it work but it wasn't going to win any
| Software Design Awards.
|
| Im happy to report that despite ending up a tangled mess, it
| actually helped me to come to understand and conquer a very
| specific kind of problem. In doing so I learned the limitations
| of commercially available tooling, the limitations of
| commercially available data, not to mention a great deal about
| the problem domain itself.
|
| This research software has earned its keep and is now being
| cleaned up into a more organized, near commercial quality kind of
| project. Im glad I threw out "architecture" when I first started
| with this. It could have gone the other way where I had a very
| well built piece of code that didn't in fact perform any useful
| function.
| analog31 wrote:
| I believe that of all the lessons to come from contemporary
| software development, _constant refactoring_ may be the most
| valuable.
|
| The spaghetti monster looms large when you're in the heat of
| battle. But we've all got some idle time for whatever reason. I
| spend some time every week doing a couple of things: 1) Reading
| about good techniques. 2) Working through old code and cleaning
| it up.
|
| Because changing your code could always break it, refactoring
| also reinforces the habit of writing code that can be readily
| tested -- also a good thing.
| amelius wrote:
| Makes you wonder why most languages don't come with good
| refactoring tools.
| bobthepanda wrote:
| At least in the languages I work primarily in (JS and
| Java), I find my IDE to be pretty good at analyzing a lot
| of it.
|
| Refactoring is kind of subjective, because there is rarely
| One Right Way to solve a problem, and you need context, so
| I could see why it's not something that languages
| themselves take strong opinions on.
| realusername wrote:
| I don't think refactoring tools are that useful for
| refactoring, most of the time you are doing non-obvious
| refactoring tools can't help with anyways. Depends what we
| call "refactoring", tools are mostly useful for what I
| would call "housekeeping".
| amelius wrote:
| Well, every refactoring can be seen as a series of
| correctness-preserving housekeeping operations.
| cratermoon wrote:
| Safe automatic refactoring requires the ability to do
| static analysis of the code. Many refactorings are harder
| in loosely-typed languages.
| grahamlee wrote:
| Refactoring tools were invented in Smalltalk and worked
| just fine.
| disgruntledphd2 wrote:
| This has always surprised me, since I learned it.
|
| What are the features of Smalltalk that allowed this to
| happen? Conversely, what is stopping this from existing
| in more modern dynamic languages?
| xkriva11 wrote:
| Smalltalk has simple and strong reflective features.
| Moreover, it does not make a difference between the
| developed program and IDE. This means that doing things
| like that are very natural and well established in the
| Smalltalk cultural background.
| grahamlee wrote:
| Indeed. Having the whole system in front of you, and
| knowing how patterns like MVC or Thing-Model-View-Editor
| encapsulate parts of it, makes it very easy to "reason
| about" wholesale changes to the system.
| analog31 wrote:
| I should have added that I am probably abusing the term
| _refactoring_ if it has a precise definition. What I 'm
| talking about is working on improving the readability of my
| code, but also improving its structure. Today, "spaghetti"
| probably doesn't refer to a tangled mess of code sequences
| because we've gotten rid of the GOTO, but tangled
| interactions between modules, many of which are vestigial.
|
| A lot of my code interacts with hardware configurations
| that will cease to exist when a project is done, but I
| mainly look at the stuff that's potentially reusable, and
| making it worth re-using.
|
| I'm using Python, and there are a lot of tools for
| enforcing coding styles and flagging potential errors. I
| try to remove all of the red and yellow before closing any
| program file. I don't trust myself with too much
| automation! "Walk before you run."
| titanomachy wrote:
| That was my exact thought reading this.
|
| I used to write some crazy spaghetti code as an untrained
| student working in a lab. Coding would go really quickly at
| first, but as I kept adding on to accommodate new
| requirements it became a huge kludgy mess.
|
| Recently (after quite a few years of software engineering
| experience) I helped a researcher friend to build some
| software. He was following along with my commits and asked
| why I kept changing the organization and naming of the code,
| pulling things out into classes, deleting stuff that he
| thought might be needed later, etc. He spends only a small
| part of his time writing code, so he's never realized how
| much time it actually saves to keep things organized and
| well-factored.
| hyperpallium2 wrote:
| I like Brooks' "plan to throw one away; you will, anyhow.":
|
| _This [first] system acts as a "pilot plan" that reveals
| techniques that will subsequently cause a complete redesign of
| the system._
|
| However, in practice I'm not confident enough in my
| understanding, and fear losing all that hard-won work, so I
| refactor too.
|
| A rewrite from scratch is probably more viable when the project
| is small enough to keep in your head at once.
| titanomachy wrote:
| Brooks has since amended this[1] to say that he really meant
| it in the context of traditional "waterfall" development,
| where the first iteration is meticulously planned and
| designed as a whole system before any code is written at all.
|
| Rapid, iterative prototyping, followed by refactoring, is a
| perfectly reasonable approach today. No need to create a
| fresh repository and rewrite all code from scratch.
|
| David Heinemeier Hanssen, creator of Rails and a big advocate
| of building working code as early as possible, wasn't even
| born in 1975 when the mythical man-month was written. Linus
| Torvalds was a (presumably) plucky 6-year-old. Brooks wrote
| that book for an audience that would have known waterfall as
| the only way.
|
| [1] https://wiki.c2.com/?PlanToThrowOneAway
| RangerScience wrote:
| Good architecture is pretty much just about slicing things up
| so that rewrites / refactors can happen incrementally, rather
| than all at once. This can actually go both bottom-up (these
| functions are easy to re-arrange and don't need a rewrite)
| and top-down (these functions suck, but I don't have to
| rearrange anything to replace them).
|
| "Good architecture is the one that allows you to change."
| cratermoon wrote:
| What if you slice it up wrong?
| RangerScience wrote:
| Then you're gonna have a shit time of it, and may need to
| do a total rewrite, instead of an incremental one.
| cryptica wrote:
| Math and physics are a tangled mess so it's not surprising that
| mathematicians and physicists write code which looks like a
| tangled mess. Mathematicians and physicists are trained to handle
| ambiguous concepts and they can work with weird abstractions
| which are far detached from reality. Unlike programming
| languages, the language of math is full of gaps - This requires
| the reader to make assumptions using past knowledge and
| conventions. Computers, on the other hand cannot make assumptions
| so the code must be extremely precise and unambiguous.
|
| Writing good code requires a different mindset; firstly, it
| requires acknowledging that communication is extremely ambiguous
| and that it takes a great deal of effort to communicate clearly
| and to choose the right abstractions.
|
| A lot of the best coders I've met struggle with math and a lot of
| the best mathematicians I've met struggle with writing good code.
___________________________________________________________________
(page generated 2021-02-22 23:01 UTC)