[HN Gopher] What my retraction taught me
___________________________________________________________________
What my retraction taught me
Author : mellosouls
Score : 81 points
Date : 2021-01-21 10:00 UTC (13 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| sxp wrote:
| The workflow for modern scientific research always seems so
| amateurish from the perspective of basic software development.
|
| > We set up a video meeting, and decided that Susanne would go
| through simulations, and I would go through my old data, if I
| could dig them up. That was a challenge. Only months before, my
| current university had suffered a cyberattack, and access to my
| back-up drive was prohibited at first. ... I spent a week piecing
| together the necessary files and coding a pipeline to reproduce
| the original findings. To my horror, I also reproduced the
| problem that Susanne had found. The main issue was that I had
| used the same data for selection and comparison, a circularity
| that crops up again and again.
|
| The proper solution would be to publish the code along with the
| paper so that others could directly review it. Ideally, the raw
| data would also be published, but there might be privacy issues
| with that. Peer review for journals is supposed to catch these
| type of workflow problems, but just throwing the code & paper up
| onto GitHub would probably be more effective than publishing just
| the paper and waiting for others to catch the problem.
| justinpombrio wrote:
| > The proper solution would be to publish the code along with
| the paper so that others could directly review it.
|
| This is taking root! More CS conferences should get onboard.
|
| https://www.artifact-eval.org/about.html
|
| (certificate seems to be expired...)
| robotresearcher wrote:
| My lab practice is to publish the repo path and the full hash
| of the commit that gave the results in the paper. The repo
| might have subsequent bug fixes, new code or data, etc. But you
| can get and run the exact code & data that the claims are based
| on if you wish.
|
| The published paper contains the hash printed in the text.
| marmaduke wrote:
| > workflow for modern scientific research
|
| careful with the word modern, as the latest generation of
| students and postdocs is using Git, CI/CD etc, sometimes
| militantly but this amateurish approach reflects the incentive
| structure, where the goal isn't to produce a robust solution
| but to achieve notoriety and impact in one's domain. This
| impact factor plays into getting the next grant which creates a
| feedback loop that makes any shortcuts to the latest shiny
| thing obligatory.
|
| Even when failures become apparent, scientists can play the
| feature-not-a-bug card very frequently to stay on top.
| bachmeier wrote:
| The journal I edit has a policy that the code and data has to
| be made available to the reviewers. I was surprised at the
| number of reviewers that dig into the code.
| jacobolus wrote:
| Surprised that the reviewers do or don't look at the code?
| bachmeier wrote:
| Surprised that the reviewers do look at the code. I had
| assumed they wouldn't want to put the time in. In
| hindsight, I think it's often the case that it's faster to
| look at the code than to read an unclear methodological
| description.
| semi-extrinsic wrote:
| TBH, code (even when all you can do is just looking at it and
| understanding how it works) is probably the most useful thing
| to the reviewers themselves.
|
| Imagine the equivalent for reviewing an experimental paper,
| if you were given a free instant travel to the place where
| they did the experiment and you could look at their setup and
| tinker with it and see how it works. I guarantee you a lot of
| reviewers would go for that.
| bachmeier wrote:
| I've learned that what you say is true. This is
| particularly the case for papers that use small datasets
| and not especially complicated empirical analysis.
| CJefferson wrote:
| I partially blame software infrastructure here.
|
| I've tried to do reproducible research, but just setting
| something up when I get the exact version of GCC, python and
| python packages I want, in such a way I can get the same
| versions 2 years later, is a collosal pain.
|
| Just dumping a pile of code people won't be able to run isn't
| super useful, and becomes a source of continous complaints and
| requests for fixes.
| kansface wrote:
| You could make a container with the pipeline and data set up
| to go. Also, ideally, you'd publicly host the Dockerfile or
| w/e build time config, too.
| skissane wrote:
| > Also, ideally, you'd publicly host the Dockerfile
|
| Docker doesn't make it easy to produce a reproducible
| config. People commonly write Dockerfiles with stuff like
| this in them:
|
| RUN apt-get install -y my-favourite-package
|
| And then which version of my-favourite-package you get
| depends on what is the latest one on the update servers at
| a time. Maybe a new version is released tomorrow with a
| regression bug that breaks everything.
|
| There are ways to get around this problem, but the default
| ways of doing things in Docker doesn't encourage
| reproducibility.
| ivan888 wrote:
| Downvoters: explain? Maybe this suggestion is aimed too
| much at a specific piece of technology (Docker). But there
| are plenty of tools out there to create reproducible
| builds, which I think is the overall point of this response
| afarrell wrote:
| Speculation: many downvoters are people who have
| repeatedly had a frustrating user experiences and are
| just exasperated at being repeatedly told how thier tools
| are good enough.
| TheGallopedHigh wrote:
| There is so much on the plate of the academic that to
| learn yet another piece of software -- like docker which
| can be trying at the best of times -- is yet another
| straw that can break the back.
| pbalau wrote:
| > The proper solution would be to publish the code along with
| the paper so that others could directly review it.
|
| Most ML researchers write code to prove their theory works, not
| to check if it works.
| conformist wrote:
| Yes, but there needs to be a more complete solution to this.
| For this to happen, somebody should start funding one
| (relatively) professional developer per research group. Or, at
| least, there should be a way to fund "methodology/development"
| researchers/PhD students that have some domain knowledge, but
| mainly focus on aspects of publishing code, data, stats
| methodology, maintaining it, and ensuring that others can use
| it. Unfortunately, in many fields there does not seem to be a
| niche for that. But clearly this is a very important problem,
| possibly more important than funding a larger number of
| research ideas?
| osamagirl69 wrote:
| Bummer that they didn't catch it until years after it was
| published, but it is heartwarming to hear that the process went
| smoothly at least.
|
| I had a near miss publishing a paper in grad school, where buried
| in the ~10k LOC data analysis script there was a bug in my data
| processing pipeline. In summary, I had meant to evaluate
| B=sqrt(f(A^2)) but what I actually evaluated was B=sqrt(f(A)^2),
| which caused the resulting output to slightly off. In the review
| process, one of the reviewers looked at the output, and said wait
| a second this seems fishy, can you explain why it has such and
| such artifact. Their comments quickly allowed me to pinpoint what
| was going wrong, and correct the analysis script appropriately --
| which actually ended up improving the result significantly!
|
| What I take away from this all is that for every article about
| academic misconduct and p hacking there are 100 more where the
| peer review process (both before and during submission to a
| journal) caught the issues in time.
|
| But... also probably a decent number where the errors is still in
| the wild to this day...
| typomatic wrote:
| > What I take away from this all is that for every article
| about academic misconduct and p hacking there are 100 more
| where the peer review process (both before and during
| submission to a journal) caught the issues in time.
|
| p-hacking isn't a mathematical error. It cannot be "caught"
| because it is not presented to referees--you slightly modify
| your hypotheses after you experiment based on results, or you
| throw out "outliers" that blow up your theory. These are things
| that don't even show up in a paper, they happen during the
| compilation of the paper.
|
| How you could conclude that p-hacking is rare based on a
| completely unrelated experience is beyond me.
| osamagirl69 wrote:
| >How you could conclude that p-hacking is rare based on a
| completely unrelated experience is beyond me.
|
| I was involved in the submission of >100 papers through peer
| review processes, none of which involved p-hacking. In fact,
| they couldn't have been p-hacked because their novelty did
| not rely on any statistical analysis, or they were
| preregistered with the journal.
|
| I did have a run-in with 1 publication that I suspected
| involved academic misconduct (fabrication of experimental
| results), but it was a thesis so it did not go through the
| peer review process.
| [deleted]
| mrandish wrote:
| Really happy to see this in a major journal. Simultaneous sharing
| of code and data necessary to replicate a result should be the
| minimum expectation for journal publication.
| einpoklum wrote:
| > That [the method of statistical analysis] could be a problem in
| our particular context didn't dawn on me and my colleagues -- nor
| on anyone else in the field -- before [whoever]'s discovery.
|
| This is troubling to me. It does not seem like the bias was due
| to something arcane, some finer point in advanced statistics,
| something hidden from view etc. The poster says:
|
| > It involved regression towards the mean -- when noisy data are
| measured repeatedly, values that at first look extreme become
| less so.
|
| This may not be 100% straightforward when it's buried in the
| middle of a paper, but if you actually consider the methodolgy
| you are likely to notice this happening.
|
| So here's what _I_ learn from this case:
|
| * It is possible that the reviewers at Nature don't properly
| scrutinize the methodological soundness of some submissions (I
| say "possible" since this is a single example, not a pattern)
|
| * PhD avisors, like the author's, may not be exercising due
| dilligence on statistical research done with their PhD
| candidates. The author's advisor had this to say:
|
| > "It's great that we've persisted in attempting to understand
| our methodology and findings!"
|
| so he says it's "great" that they did not fully understand their
| methodology before submitting a paper using it. Maybe that's not
| exactly what he meant, but still, pretty worrying.
| bachmeier wrote:
| You present a view of journals as an outlet for correct
| results. Realistically, that's not possible, and I wish more
| people accepted that it's not possible. If a result is
| confirmed in many studies by many authors using many datasets
| and many methodologies, that's how we know we can trust a
| result. I personally do not put much weight on a single paper's
| results unless there's something special about it.
|
| The most important thing is that everything about the
| investigation is open: the methodology (including everything
| not found in the paper), the programs, the data. It's more like
| posting code for a large project on Github and then new
| researchers make PRs to correct bugs or extend the program in
| useful ways.
| einpoklum wrote:
| I think you're conflating correctness and degrees of
| validity.
|
| Yes, I expect journals to have to have correct results - not
| in the sense that the generalizations from their findings are
| universally valid, but in that the statements of fact and of
| logical implications are valid.
|
| To be more explicit:
|
| * "Events X, Y, Z occurred" <- expect this kind of sentences
| to be correct
|
| * "We did X, Y, Z" <- expect this kind of sentences to be
| correct
|
| * "A and A->B, so B" <- expect this kind of sentences to be
| correct
|
| * "We therefore conclude that X" <- Don't expect X to
| necessarily be correct.
| kansface wrote:
| > in that the statements of fact and of logical
| implications are valid.
|
| See for example
| https://slatestarcodex.com/2019/05/07/5-httlpr-a-pointed-
| rev... for a slightly different take.
| smoyer wrote:
| I listen to a weekly podcast titled "Everything Hertz" in which
| includes a large amount of discussion related to methodology as
| well as problematically citing previous research etc. It's pretty
| fascinating (as is this article) - https://everythinghertz.com/.
| jarenmf wrote:
| Thank you, this looks interesting indeed
___________________________________________________________________
(page generated 2021-01-21 23:01 UTC)