[HN Gopher] How Should We Critique Research?
       ___________________________________________________________________
        
       How Should We Critique Research?
        
       Author : SubiculumCode
       Score  : 57 points
       Date   : 2021-04-16 14:44 UTC (7 hours ago)
        
 (HTM) web link (www.gwern.net)
 (TXT) w3m dump (www.gwern.net)
        
       | choeger wrote:
       | Aren't we forced to require replication for a measure of truth in
       | empirical studies? If that's universally accepted, shouldn't we
       | demand that researchers come up with predictions of the form "do
       | X and see result Y"? And if we accept that, then should we not
       | discuss who is going do actually "do X" _before_ research gets
       | published?
       | 
       | From the outside perspective, it appears that a lot of people do
       | "open-ended" experiments, write up some findings and pretty much
       | no one tries to confirm the results with the notable exception of
       | actual industrial applications (e.g., pharmacy or aircraft
       | design), where the validation happens out of necessity.
        
         | seesawtron wrote:
         | Substantially solid research takes years to complete. Each new
         | finding opens possibilties of 10 new research directions that
         | one "should" investigate. The timelines necessary for this
         | follow up research studies blatantly exceed the scope of a Phd
         | or Postdoc researcher and just one scientific paper.
         | 
         | It is not true that no one tries to confirm the results. The
         | labs do further studies internally to follow up. Also those
         | working closely in the field in other monitor these "findings"
         | and try to align them in their own research whether directly
         | comparing them or indirectly using those ideas to support or
         | reject their own observation.
         | 
         | It is never one study but years of past research that lays the
         | foundation for a scientific discovery that finally creates a
         | paradigm shift in our understanding of the field.
        
         | BeetleB wrote:
         | > Aren't we forced to require replication for a measure of
         | truth in empirical studies?
         | 
         | Almost no scientific discipline requires it, sadly. There's
         | usually no career prospects if you do replication studies.
        
         | pjc50 wrote:
         | It's important to notice that the funding, which determines
         | what research gets done, is conditional on "impact factor" and
         | publication rates. The classic case of letting a single metric
         | drive a business function without regard for what it actually
         | _means_. This has been incredibly bad for research, which now
         | requires very large amounts of gaming the system to get actual
         | science done.
         | 
         | Reproducibility and other forms of validity are not currently
         | factored in very well.
        
           | asplake wrote:
           | A recipe for dysfunction not just in science but in any field
        
         | ketzu wrote:
         | Preregistering studies [1] are common in some fields and a nice
         | idea. Unfortunately my personal experience is that
         | reproductibility, or even providing the sources for your
         | experiments, is not something many reviewers care about in
         | comp-sci fields. At least no conference review I received ever
         | mentioned reproductibility no matter if it was available or not
         | (and I am not sure anyone ever mentioned source code
         | availability either).
         | 
         | [1] https://en.wikipedia.org/wiki/Preregistration_(science)
        
           | rscho wrote:
           | Pre-registration is very common in medicine. I don't see it
           | change anything, because there are no real consequences for
           | not doing what you registered for.
        
             | analog31 wrote:
             | Unless registration is an employment contract, it will be
             | as binding and useful as project plans in business.
        
       | fatsdomino001 wrote:
       | I didn't realize how incredible gwern's website is. Wow. Very
       | information dense and well thought out.
        
       | huonib wrote:
       | I think it should simply be two parts: 1). A list of what steps
       | were taken 2). What conclusions can be drawn from it
       | 
       | Instead we get something that combines both so it is hard to tell
       | where researchers beliefs come into play.
        
       | troelsSteegin wrote:
       | I interpreted the "difference which makes a difference" as the
       | sensitivity of our understanding of the status quo relative to
       | the research instance. In that sense, the critique would be about
       | estimating impact on some dimension of understanding conditioned
       | on a finding of credible difference. That would say that big bets
       | are better research.
       | 
       | In other terms of critique, Gelman talks about two components
       | (which may appear to be in tension), one referencing the craft of
       | research, and the other in terms of apparent novelty of effect.
       | My interpretation is that a critique of research should factor in
       | both the character of the bet and an assessment of the
       | evaluation. I don't think big is better, I think incisive is
       | better, but that broader impact (news worthiness) is conditioned
       | on credible surprise.
       | 
       | [0]
       | https://statmodeling.stat.columbia.edu/2014/08/01/scientific...
        
       | PicassoCTs wrote:
       | Throw grant-money at it to until those with weak evidence, buckle
       | under the load?
        
       | fooker wrote:
       | Waiting for the article : "How should we critique research
       | without being called anti-science?"
        
       | mike_hearn wrote:
       | Well, that looks at first like a pretty comprehensive list of
       | possible problems in a scientific studies, but the correct
       | reaction on reading it should be "wait, what about the really
       | serious problems?". I mean, (ab)using Google Surveys or
       | experimenter demand effects is one kind of problem that indeed
       | probably deserves a lot of nuance about criticizing it. But it's
       | also not really where the biggest problems are or where the
       | attention should be focused.
       | 
       | We're in an environment where people routinely discover flat out
       | academic fraud, as in, scientists just made whole tables of data
       | up out of thin air. Someone notices, informs the journal, and
       | with 95% likelihood nothing happens. Or maybe the researcher is
       | allowed to 'correct' their made up data. We're in an environment
       | where you literally cannot take any number in a COVID-related
       | paper at face value, even when there are multiple citations for
       | it, because on checking citations routinely turn out to be
       | fraudulent e.g. the cited paper doesn't actually contain the
       | claimed data anywhere in it, or explicitly states the opposite of
       | what's being claimed, or the number turns out to have been a
       | hypothetical scenario but is being cited as a "widely believed"
       | empirical fact, etc. We're in an environment where literature
       | reviews argue that whilst very few public health models can ever
       | be validated against reality, that's not a big deal and _they
       | should be used anyway_.
       | 
       | Gwern criticises people who learn about logical fallacies and
       | then go around over-criticising people for engaging in them.
       | Yeah, sure, if you claim someone is taking bribes and they
       | actually are then it's technically an ad hominem but still
       | correct to say. Granted. But we are not suffering from an over-
       | abundance of nitpicky fallacy-criticizers. Where are these people
       | when you need them? COVID related research frequently contains or
       | is completely built on circular logic! That's a pretty basic
       | fallacy yet papers that engage in it manage to be written by
       | teams of 20, sail through multiple peer reviews and appear in
       | Nature or the BMJ. As far as I can tell the scientific
       | institutions cannot reliably detect logical fallacies even when
       | papers are dealing with things that should be entirely logical
       | like data, maths, code, study design.
       | 
       | The notion that scientific criticism should be refined to focus
       | on what really matters sounds completely reasonable in the
       | abstract, and is probably an important discussion to have in some
       | very specific fields (maybe genetics is one). But I'd worry that
       | if people are arguing about p-hacking or lack of negative
       | results, that means they're not arguing about the apparently
       | legion researchers making "mistakes" that cannot plausibly be
       | actual mistakes. Stuff that should be actually criminal needs to
       | be fixed first, before worrying about mere low standards.
        
         | exmadscientist wrote:
         | In practice, there's a _massive_ bikeshedding problem in
         | scientific review. It is very frustrating to see proposals or
         | preprints criticized for missing a dot in paragraph 37 when the
         | bigger problem is that the experiment is overall worthless
         | because it can never change a decision.... This is bad enough
         | when it 's an honest mistake, but then there are all the
         | dishonest or plainly incompetent people to worry about.
         | 
         | So, no, in fact I think we _are_ suffering from an abundance of
         | nitpickers. Science _desperately_ needs more reviewers who can
         | see the bigger pictures.
        
           | mike_hearn wrote:
           | That's fair enough. I was meaning nitpickers about logical
           | fallacies specifically, not stuff like grammar or minor
           | details of an experiment.
           | 
           | That said, is peer review really the place to dunk on a study
           | because the whole goal is pointless? The work is done by that
           | point, it's too late. It's the granting bodies that should be
           | getting peer reviewed in that regard, but one of the root
           | causes of the malaise in research is that the granting bodies
           | appear to be entirely blind buyers. They care about
           | dispersing money, not what they get out of the spending. If
           | they didn't spend the money they'd be fired, so that's
           | understandable. The core problem IMHO is the huge level of
           | state funding of research. The buck stops at politicians but
           | they are in no position to evaluate the quality of academic
           | studies.
        
       | guscost wrote:
       | Have you heard about the "grounded theory" methodology? It is
       | astonishing - some academic circles have _formally standardized_
       | the practice of fitting a model to data, and then immediately
       | drawing conclusions from it.
        
       | rscho wrote:
       | I have a particularly funny recent anecdote: in the past year, a
       | professor at my dept had two papers of his attacked by aggressive
       | letters to the editor pointing out flaws in both methodology and
       | interpretation. The end result has been that we published two
       | additional papers titled "a reanalysis of [insert title of 1st
       | paper]". So a critique of bad research ended up producing twice
       | the amount of bad research!
       | 
       | It's so absurd that it has a certain Monty Python flavour twist
       | of comedy.
        
         | SubiculumCode wrote:
         | It does not automatically follow that the reanalysis is also
         | bad...and by saying "we" you presumably were an author on this
         | paper. If you thought it was bad, why did you put your name on
         | it?
        
           | rscho wrote:
           | I didn't ask for it. I just found about it after the fact and
           | not being a professor I have no say in the matter. Also, I'd
           | prefer to keep my job since I have a family to feed.
           | 
           | Yes, the reanalysis is terrible and it follows from the fact
           | that the base methodology is unsound.
        
             | caddemon wrote:
             | Maybe it's a field specific thing, or maybe one of the
             | other authors pretended to be you, but I am surprised
             | something could be published with your name on the author
             | list without you knowing about it. Every paper I've ever
             | been listed on, sometimes 10+ authors deep, has required me
             | to confirm that I want my name on the paper/attest I
             | contributed.
        
             | SubiculumCode wrote:
             | You absolutely can ask to have your named removed. To be
             | diplomatic, you can always claim that you didn't do enough
             | to deserve credit. Even now, you could contact the journal
             | to request the change.
             | 
             | In your particular case, I will not contest that the
             | reanalysis was terrible. My point was that reanalysis of a
             | paper after an issue is raised is not automatically also
             | crap. It may indeed be correcting the issue.
        
               | rscho wrote:
               | Yes, I could ask to lose my job. But again, I prefer not
               | to. Research in certain (most?) fields is a dictatorship
               | and even suggesting something to be fishy puts a target
               | on your back.
               | 
               | In this particular case, there were multiple outcomes
               | tested (read: tens of them) and alpha was .05
        
               | SubiculumCode wrote:
               | Man, all I can say is you are working for a bad prof
               | then. I can't imagine any of my former
               | advisors/colleagues/etc punishing me for declining to be
               | a co-author.
        
               | rscho wrote:
               | I am guessing that you don't do clinical research, then
               | ;-)
        
               | samatman wrote:
               | I'm reminded of a bit of dialogue from Fight Club:
               | 
               | "Which car company do you work for?"
               | 
               | "...a major one."
        
               | SubiculumCode wrote:
               | Im on the edge of clinical work. I work with some
               | clinical people, but am not clinical myself. I would say
               | that they are a bit more willing to be instrumental and
               | throw a bunch of comparisons out and see what sticks. Its
               | largely because of a lack of awareness about statistical
               | issues though, and they respond well to my objections.
        
               | rscho wrote:
               | Well, most professors around here don't respond well at
               | all to objections. All (MD) professors I've met are
               | dangerous Excel spreadsheet warriors. Note that we have a
               | statistician in our unit, that no one listens to because
               | "we always did it that way, right?".
               | 
               | However, in a wider view the issue is far bigger than
               | that: for a clinician nowadays it's become basically
               | impossible to both be a really honest researcher while
               | acquiring sufficient publication velocity to rise above.
               | In a word, our system selects for bad actors.
        
         | nerdponx wrote:
         | Out of curiosity: why were these issues being flagged by
         | readers after publication, and not during peer review?
        
       | pmichaud wrote:
       | This is a really great article, and one thing in particular that
       | struck me was something super obvious in retrospect but that I
       | didn't think of before:
       | 
       | Replicability doesn't mean the study was right. If a study
       | doesn't replicate then it's almost certainly nonsense. If it does
       | replicate that just means that it's self consistent--but if it
       | was garbage in, then it will invariably be garbage out.
        
         | jasonhong wrote:
         | > If a study doesn't replicate then it's almost certainly
         | nonsense.
         | 
         | For some fields, especially natural sciences, this statement
         | makes sense. However, for sciences of the artificial (to use
         | Herb Simon's term) I would respectfully disagree.
         | 
         | For example, sometimes the underlying context has changed. I
         | remember a luminary in my field (HCI) once argued that we
         | should consider re-doing many studies every decade, because
         | each cohort uses a different set of tools and has a different
         | set of experiences, and because the underlying demographics of
         | those cohorts change.
        
         | randcraw wrote:
         | Replicability shows that your method leads to consistent
         | results, not that your hypothesis correctly explains the cause
         | of those results. Yes, your intervention did provoke the causal
         | chain into action, but it may not necessarily have correctly
         | identified or thoroughly characterized the component you
         | identified as the trigger. Your method may work great but only
         | under the perverse set of conditions you happened to explore.
         | 
         | Conversely, if your method fails to drive the desired outcome,
         | your hypothesis could still be correct, just incomplete. Maybe
         | your perturbation of the chemical reaction didn't _quite_ reach
         | the activation energy. Or maybe other essential components in
         | the mechanism were overlooked, given the set of conditions you
         | happened to explore.
         | 
         | Complex black box systems like the brain are notoriously
         | perverse in reproducibly giving up their secrets, even when
         | your hypothesis is correct and your method robust.
        
         | splithalf wrote:
         | "doesn't replicate then it's almost certainly nonsense"
         | 
         | Disagree. A significant finding is expected to occur due to
         | chance in direct proportion to the surface area for such
         | possibilities. More studies, more forking paths within those
         | studies, more models, all increase the frequency of spurious
         | findings. So one can do everything perfectly and still get
         | "garbage out". It is certain.
        
           | pmichaud wrote:
           | I'm not super sure I understand your point, but I think
           | you're saying that it's possible to run a good replication
           | attempt on a good study and still have it not replicate. I
           | agree with that. I'm not super sure how to correctly estimate
           | the chance of that happening, but one dumb way I can think of
           | is just using p value, so if it was .05 then you have a 1/20
           | chance of failing to replicate a study even if everything was
           | done correctly.
           | 
           | However, when I said "doesn't replicate" I didn't have a
           | single attempt with a 5% chance of failure in mind. I had a
           | field's aggregate attempt to confirm the result in mind,
           | which would include multiple attempts and checking for
           | statistical bullshit and all that.
           | 
           | Under those conditions the chances are vanishingly small of a
           | whole field getting massively unlucky when trying to
           | replicate a well-done study that theoretically should
           | replicate.
           | 
           | That's what I had in mind, and I still think it's right.
           | 
           | --
           | 
           | Rereading what you wrote, a different interpretation of what
           | you said is that the original investigators might have done
           | everything perfectly, and nevertheless found a significant
           | result that was spurious just because that stuff can happen
           | by chance. If that's what you meant, I don't understand the
           | disagreement, except maybe semantically. I would call a
           | perfectly done study that shows a spurious result "nonsense,"
           | and I would expect replication attempts to show the result is
           | nonsense, even if the process that generated the nonsense was
           | perfect. Maybe you're just saying you wouldn't call a
           | perfectly done study "nonsense," regardless of the outcome?
        
       | kleer001 wrote:
       | The problem with something being wrong is that there's nearly an
       | endless number of ways for something to be wrong.
       | 
       | However, with something correct, there's very little to say about
       | it.
        
         | SubiculumCode wrote:
         | The point of the article I think is that while many criticisms
         | of a piece of research are valid, they are not necessarily
         | meaningful. Because of their validity (despite being not
         | meaningful), such criticisms can be weaponized to undermine any
         | piece of research that does not fit your worldview...indeed, I
         | see this regularly on HN...which is why I posted the article.
        
       ___________________________________________________________________
       (page generated 2021-04-16 22:01 UTC)