hngopher.com

       [HN Gopher] What my retraction taught me
       ___________________________________________________________________
        
       What my retraction taught me
        
       Author : mellosouls
       Score  : 81 points
       Date   : 2021-01-21 10:00 UTC (13 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | sxp wrote:
       | The workflow for modern scientific research always seems so
       | amateurish from the perspective of basic software development.
       | 
       | > We set up a video meeting, and decided that Susanne would go
       | through simulations, and I would go through my old data, if I
       | could dig them up. That was a challenge. Only months before, my
       | current university had suffered a cyberattack, and access to my
       | back-up drive was prohibited at first. ... I spent a week piecing
       | together the necessary files and coding a pipeline to reproduce
       | the original findings. To my horror, I also reproduced the
       | problem that Susanne had found. The main issue was that I had
       | used the same data for selection and comparison, a circularity
       | that crops up again and again.
       | 
       | The proper solution would be to publish the code along with the
       | paper so that others could directly review it. Ideally, the raw
       | data would also be published, but there might be privacy issues
       | with that. Peer review for journals is supposed to catch these
       | type of workflow problems, but just throwing the code & paper up
       | onto GitHub would probably be more effective than publishing just
       | the paper and waiting for others to catch the problem.
        
         | justinpombrio wrote:
         | > The proper solution would be to publish the code along with
         | the paper so that others could directly review it.
         | 
         | This is taking root! More CS conferences should get onboard.
         | 
         | https://www.artifact-eval.org/about.html
         | 
         | (certificate seems to be expired...)
        
         | robotresearcher wrote:
         | My lab practice is to publish the repo path and the full hash
         | of the commit that gave the results in the paper. The repo
         | might have subsequent bug fixes, new code or data, etc. But you
         | can get and run the exact code & data that the claims are based
         | on if you wish.
         | 
         | The published paper contains the hash printed in the text.
        
         | marmaduke wrote:
         | > workflow for modern scientific research
         | 
         | careful with the word modern, as the latest generation of
         | students and postdocs is using Git, CI/CD etc, sometimes
         | militantly but this amateurish approach reflects the incentive
         | structure, where the goal isn't to produce a robust solution
         | but to achieve notoriety and impact in one's domain. This
         | impact factor plays into getting the next grant which creates a
         | feedback loop that makes any shortcuts to the latest shiny
         | thing obligatory.
         | 
         | Even when failures become apparent, scientists can play the
         | feature-not-a-bug card very frequently to stay on top.
        
         | bachmeier wrote:
         | The journal I edit has a policy that the code and data has to
         | be made available to the reviewers. I was surprised at the
         | number of reviewers that dig into the code.
        
           | jacobolus wrote:
           | Surprised that the reviewers do or don't look at the code?
        
             | bachmeier wrote:
             | Surprised that the reviewers do look at the code. I had
             | assumed they wouldn't want to put the time in. In
             | hindsight, I think it's often the case that it's faster to
             | look at the code than to read an unclear methodological
             | description.
        
           | semi-extrinsic wrote:
           | TBH, code (even when all you can do is just looking at it and
           | understanding how it works) is probably the most useful thing
           | to the reviewers themselves.
           | 
           | Imagine the equivalent for reviewing an experimental paper,
           | if you were given a free instant travel to the place where
           | they did the experiment and you could look at their setup and
           | tinker with it and see how it works. I guarantee you a lot of
           | reviewers would go for that.
        
             | bachmeier wrote:
             | I've learned that what you say is true. This is
             | particularly the case for papers that use small datasets
             | and not especially complicated empirical analysis.
        
         | CJefferson wrote:
         | I partially blame software infrastructure here.
         | 
         | I've tried to do reproducible research, but just setting
         | something up when I get the exact version of GCC, python and
         | python packages I want, in such a way I can get the same
         | versions 2 years later, is a collosal pain.
         | 
         | Just dumping a pile of code people won't be able to run isn't
         | super useful, and becomes a source of continous complaints and
         | requests for fixes.
        
           | kansface wrote:
           | You could make a container with the pipeline and data set up
           | to go. Also, ideally, you'd publicly host the Dockerfile or
           | w/e build time config, too.
        
             | skissane wrote:
             | > Also, ideally, you'd publicly host the Dockerfile
             | 
             | Docker doesn't make it easy to produce a reproducible
             | config. People commonly write Dockerfiles with stuff like
             | this in them:
             | 
             | RUN apt-get install -y my-favourite-package
             | 
             | And then which version of my-favourite-package you get
             | depends on what is the latest one on the update servers at
             | a time. Maybe a new version is released tomorrow with a
             | regression bug that breaks everything.
             | 
             | There are ways to get around this problem, but the default
             | ways of doing things in Docker doesn't encourage
             | reproducibility.
        
             | ivan888 wrote:
             | Downvoters: explain? Maybe this suggestion is aimed too
             | much at a specific piece of technology (Docker). But there
             | are plenty of tools out there to create reproducible
             | builds, which I think is the overall point of this response
        
               | afarrell wrote:
               | Speculation: many downvoters are people who have
               | repeatedly had a frustrating user experiences and are
               | just exasperated at being repeatedly told how thier tools
               | are good enough.
        
               | TheGallopedHigh wrote:
               | There is so much on the plate of the academic that to
               | learn yet another piece of software -- like docker which
               | can be trying at the best of times -- is yet another
               | straw that can break the back.
        
         | pbalau wrote:
         | > The proper solution would be to publish the code along with
         | the paper so that others could directly review it.
         | 
         | Most ML researchers write code to prove their theory works, not
         | to check if it works.
        
         | conformist wrote:
         | Yes, but there needs to be a more complete solution to this.
         | For this to happen, somebody should start funding one
         | (relatively) professional developer per research group. Or, at
         | least, there should be a way to fund "methodology/development"
         | researchers/PhD students that have some domain knowledge, but
         | mainly focus on aspects of publishing code, data, stats
         | methodology, maintaining it, and ensuring that others can use
         | it. Unfortunately, in many fields there does not seem to be a
         | niche for that. But clearly this is a very important problem,
         | possibly more important than funding a larger number of
         | research ideas?
        
       | osamagirl69 wrote:
       | Bummer that they didn't catch it until years after it was
       | published, but it is heartwarming to hear that the process went
       | smoothly at least.
       | 
       | I had a near miss publishing a paper in grad school, where buried
       | in the ~10k LOC data analysis script there was a bug in my data
       | processing pipeline. In summary, I had meant to evaluate
       | B=sqrt(f(A^2)) but what I actually evaluated was B=sqrt(f(A)^2),
       | which caused the resulting output to slightly off. In the review
       | process, one of the reviewers looked at the output, and said wait
       | a second this seems fishy, can you explain why it has such and
       | such artifact. Their comments quickly allowed me to pinpoint what
       | was going wrong, and correct the analysis script appropriately --
       | which actually ended up improving the result significantly!
       | 
       | What I take away from this all is that for every article about
       | academic misconduct and p hacking there are 100 more where the
       | peer review process (both before and during submission to a
       | journal) caught the issues in time.
       | 
       | But... also probably a decent number where the errors is still in
       | the wild to this day...
        
         | typomatic wrote:
         | > What I take away from this all is that for every article
         | about academic misconduct and p hacking there are 100 more
         | where the peer review process (both before and during
         | submission to a journal) caught the issues in time.
         | 
         | p-hacking isn't a mathematical error. It cannot be "caught"
         | because it is not presented to referees--you slightly modify
         | your hypotheses after you experiment based on results, or you
         | throw out "outliers" that blow up your theory. These are things
         | that don't even show up in a paper, they happen during the
         | compilation of the paper.
         | 
         | How you could conclude that p-hacking is rare based on a
         | completely unrelated experience is beyond me.
        
           | osamagirl69 wrote:
           | >How you could conclude that p-hacking is rare based on a
           | completely unrelated experience is beyond me.
           | 
           | I was involved in the submission of >100 papers through peer
           | review processes, none of which involved p-hacking. In fact,
           | they couldn't have been p-hacked because their novelty did
           | not rely on any statistical analysis, or they were
           | preregistered with the journal.
           | 
           | I did have a run-in with 1 publication that I suspected
           | involved academic misconduct (fabrication of experimental
           | results), but it was a thesis so it did not go through the
           | peer review process.
        
       | [deleted]
        
       | mrandish wrote:
       | Really happy to see this in a major journal. Simultaneous sharing
       | of code and data necessary to replicate a result should be the
       | minimum expectation for journal publication.
        
       | einpoklum wrote:
       | > That [the method of statistical analysis] could be a problem in
       | our particular context didn't dawn on me and my colleagues -- nor
       | on anyone else in the field -- before [whoever]'s discovery.
       | 
       | This is troubling to me. It does not seem like the bias was due
       | to something arcane, some finer point in advanced statistics,
       | something hidden from view etc. The poster says:
       | 
       | > It involved regression towards the mean -- when noisy data are
       | measured repeatedly, values that at first look extreme become
       | less so.
       | 
       | This may not be 100% straightforward when it's buried in the
       | middle of a paper, but if you actually consider the methodolgy
       | you are likely to notice this happening.
       | 
       | So here's what _I_ learn from this case:
       | 
       | * It is possible that the reviewers at Nature don't properly
       | scrutinize the methodological soundness of some submissions (I
       | say "possible" since this is a single example, not a pattern)
       | 
       | * PhD avisors, like the author's, may not be exercising due
       | dilligence on statistical research done with their PhD
       | candidates. The author's advisor had this to say:
       | 
       | > "It's great that we've persisted in attempting to understand
       | our methodology and findings!"
       | 
       | so he says it's "great" that they did not fully understand their
       | methodology before submitting a paper using it. Maybe that's not
       | exactly what he meant, but still, pretty worrying.
        
         | bachmeier wrote:
         | You present a view of journals as an outlet for correct
         | results. Realistically, that's not possible, and I wish more
         | people accepted that it's not possible. If a result is
         | confirmed in many studies by many authors using many datasets
         | and many methodologies, that's how we know we can trust a
         | result. I personally do not put much weight on a single paper's
         | results unless there's something special about it.
         | 
         | The most important thing is that everything about the
         | investigation is open: the methodology (including everything
         | not found in the paper), the programs, the data. It's more like
         | posting code for a large project on Github and then new
         | researchers make PRs to correct bugs or extend the program in
         | useful ways.
        
           | einpoklum wrote:
           | I think you're conflating correctness and degrees of
           | validity.
           | 
           | Yes, I expect journals to have to have correct results - not
           | in the sense that the generalizations from their findings are
           | universally valid, but in that the statements of fact and of
           | logical implications are valid.
           | 
           | To be more explicit:
           | 
           | * "Events X, Y, Z occurred" <- expect this kind of sentences
           | to be correct
           | 
           | * "We did X, Y, Z" <- expect this kind of sentences to be
           | correct
           | 
           | * "A and A->B, so B" <- expect this kind of sentences to be
           | correct
           | 
           | * "We therefore conclude that X" <- Don't expect X to
           | necessarily be correct.
        
             | kansface wrote:
             | > in that the statements of fact and of logical
             | implications are valid.
             | 
             | See for example
             | https://slatestarcodex.com/2019/05/07/5-httlpr-a-pointed-
             | rev... for a slightly different take.
        
       | smoyer wrote:
       | I listen to a weekly podcast titled "Everything Hertz" in which
       | includes a large amount of discussion related to methodology as
       | well as problematically citing previous research etc. It's pretty
       | fascinating (as is this article) - https://everythinghertz.com/.
        
         | jarenmf wrote:
         | Thank you, this looks interesting indeed
        
       ___________________________________________________________________
       (page generated 2021-01-21 23:01 UTC)