[HN Gopher] How Should We Critique Research?
___________________________________________________________________
How Should We Critique Research?
Author : SubiculumCode
Score : 57 points
Date : 2021-04-16 14:44 UTC (7 hours ago)
(HTM) web link (www.gwern.net)
(TXT) w3m dump (www.gwern.net)
| choeger wrote:
| Aren't we forced to require replication for a measure of truth in
| empirical studies? If that's universally accepted, shouldn't we
| demand that researchers come up with predictions of the form "do
| X and see result Y"? And if we accept that, then should we not
| discuss who is going do actually "do X" _before_ research gets
| published?
|
| From the outside perspective, it appears that a lot of people do
| "open-ended" experiments, write up some findings and pretty much
| no one tries to confirm the results with the notable exception of
| actual industrial applications (e.g., pharmacy or aircraft
| design), where the validation happens out of necessity.
| seesawtron wrote:
| Substantially solid research takes years to complete. Each new
| finding opens possibilties of 10 new research directions that
| one "should" investigate. The timelines necessary for this
| follow up research studies blatantly exceed the scope of a Phd
| or Postdoc researcher and just one scientific paper.
|
| It is not true that no one tries to confirm the results. The
| labs do further studies internally to follow up. Also those
| working closely in the field in other monitor these "findings"
| and try to align them in their own research whether directly
| comparing them or indirectly using those ideas to support or
| reject their own observation.
|
| It is never one study but years of past research that lays the
| foundation for a scientific discovery that finally creates a
| paradigm shift in our understanding of the field.
| BeetleB wrote:
| > Aren't we forced to require replication for a measure of
| truth in empirical studies?
|
| Almost no scientific discipline requires it, sadly. There's
| usually no career prospects if you do replication studies.
| pjc50 wrote:
| It's important to notice that the funding, which determines
| what research gets done, is conditional on "impact factor" and
| publication rates. The classic case of letting a single metric
| drive a business function without regard for what it actually
| _means_. This has been incredibly bad for research, which now
| requires very large amounts of gaming the system to get actual
| science done.
|
| Reproducibility and other forms of validity are not currently
| factored in very well.
| asplake wrote:
| A recipe for dysfunction not just in science but in any field
| ketzu wrote:
| Preregistering studies [1] are common in some fields and a nice
| idea. Unfortunately my personal experience is that
| reproductibility, or even providing the sources for your
| experiments, is not something many reviewers care about in
| comp-sci fields. At least no conference review I received ever
| mentioned reproductibility no matter if it was available or not
| (and I am not sure anyone ever mentioned source code
| availability either).
|
| [1] https://en.wikipedia.org/wiki/Preregistration_(science)
| rscho wrote:
| Pre-registration is very common in medicine. I don't see it
| change anything, because there are no real consequences for
| not doing what you registered for.
| analog31 wrote:
| Unless registration is an employment contract, it will be
| as binding and useful as project plans in business.
| fatsdomino001 wrote:
| I didn't realize how incredible gwern's website is. Wow. Very
| information dense and well thought out.
| huonib wrote:
| I think it should simply be two parts: 1). A list of what steps
| were taken 2). What conclusions can be drawn from it
|
| Instead we get something that combines both so it is hard to tell
| where researchers beliefs come into play.
| troelsSteegin wrote:
| I interpreted the "difference which makes a difference" as the
| sensitivity of our understanding of the status quo relative to
| the research instance. In that sense, the critique would be about
| estimating impact on some dimension of understanding conditioned
| on a finding of credible difference. That would say that big bets
| are better research.
|
| In other terms of critique, Gelman talks about two components
| (which may appear to be in tension), one referencing the craft of
| research, and the other in terms of apparent novelty of effect.
| My interpretation is that a critique of research should factor in
| both the character of the bet and an assessment of the
| evaluation. I don't think big is better, I think incisive is
| better, but that broader impact (news worthiness) is conditioned
| on credible surprise.
|
| [0]
| https://statmodeling.stat.columbia.edu/2014/08/01/scientific...
| PicassoCTs wrote:
| Throw grant-money at it to until those with weak evidence, buckle
| under the load?
| fooker wrote:
| Waiting for the article : "How should we critique research
| without being called anti-science?"
| mike_hearn wrote:
| Well, that looks at first like a pretty comprehensive list of
| possible problems in a scientific studies, but the correct
| reaction on reading it should be "wait, what about the really
| serious problems?". I mean, (ab)using Google Surveys or
| experimenter demand effects is one kind of problem that indeed
| probably deserves a lot of nuance about criticizing it. But it's
| also not really where the biggest problems are or where the
| attention should be focused.
|
| We're in an environment where people routinely discover flat out
| academic fraud, as in, scientists just made whole tables of data
| up out of thin air. Someone notices, informs the journal, and
| with 95% likelihood nothing happens. Or maybe the researcher is
| allowed to 'correct' their made up data. We're in an environment
| where you literally cannot take any number in a COVID-related
| paper at face value, even when there are multiple citations for
| it, because on checking citations routinely turn out to be
| fraudulent e.g. the cited paper doesn't actually contain the
| claimed data anywhere in it, or explicitly states the opposite of
| what's being claimed, or the number turns out to have been a
| hypothetical scenario but is being cited as a "widely believed"
| empirical fact, etc. We're in an environment where literature
| reviews argue that whilst very few public health models can ever
| be validated against reality, that's not a big deal and _they
| should be used anyway_.
|
| Gwern criticises people who learn about logical fallacies and
| then go around over-criticising people for engaging in them.
| Yeah, sure, if you claim someone is taking bribes and they
| actually are then it's technically an ad hominem but still
| correct to say. Granted. But we are not suffering from an over-
| abundance of nitpicky fallacy-criticizers. Where are these people
| when you need them? COVID related research frequently contains or
| is completely built on circular logic! That's a pretty basic
| fallacy yet papers that engage in it manage to be written by
| teams of 20, sail through multiple peer reviews and appear in
| Nature or the BMJ. As far as I can tell the scientific
| institutions cannot reliably detect logical fallacies even when
| papers are dealing with things that should be entirely logical
| like data, maths, code, study design.
|
| The notion that scientific criticism should be refined to focus
| on what really matters sounds completely reasonable in the
| abstract, and is probably an important discussion to have in some
| very specific fields (maybe genetics is one). But I'd worry that
| if people are arguing about p-hacking or lack of negative
| results, that means they're not arguing about the apparently
| legion researchers making "mistakes" that cannot plausibly be
| actual mistakes. Stuff that should be actually criminal needs to
| be fixed first, before worrying about mere low standards.
| exmadscientist wrote:
| In practice, there's a _massive_ bikeshedding problem in
| scientific review. It is very frustrating to see proposals or
| preprints criticized for missing a dot in paragraph 37 when the
| bigger problem is that the experiment is overall worthless
| because it can never change a decision.... This is bad enough
| when it 's an honest mistake, but then there are all the
| dishonest or plainly incompetent people to worry about.
|
| So, no, in fact I think we _are_ suffering from an abundance of
| nitpickers. Science _desperately_ needs more reviewers who can
| see the bigger pictures.
| mike_hearn wrote:
| That's fair enough. I was meaning nitpickers about logical
| fallacies specifically, not stuff like grammar or minor
| details of an experiment.
|
| That said, is peer review really the place to dunk on a study
| because the whole goal is pointless? The work is done by that
| point, it's too late. It's the granting bodies that should be
| getting peer reviewed in that regard, but one of the root
| causes of the malaise in research is that the granting bodies
| appear to be entirely blind buyers. They care about
| dispersing money, not what they get out of the spending. If
| they didn't spend the money they'd be fired, so that's
| understandable. The core problem IMHO is the huge level of
| state funding of research. The buck stops at politicians but
| they are in no position to evaluate the quality of academic
| studies.
| guscost wrote:
| Have you heard about the "grounded theory" methodology? It is
| astonishing - some academic circles have _formally standardized_
| the practice of fitting a model to data, and then immediately
| drawing conclusions from it.
| rscho wrote:
| I have a particularly funny recent anecdote: in the past year, a
| professor at my dept had two papers of his attacked by aggressive
| letters to the editor pointing out flaws in both methodology and
| interpretation. The end result has been that we published two
| additional papers titled "a reanalysis of [insert title of 1st
| paper]". So a critique of bad research ended up producing twice
| the amount of bad research!
|
| It's so absurd that it has a certain Monty Python flavour twist
| of comedy.
| SubiculumCode wrote:
| It does not automatically follow that the reanalysis is also
| bad...and by saying "we" you presumably were an author on this
| paper. If you thought it was bad, why did you put your name on
| it?
| rscho wrote:
| I didn't ask for it. I just found about it after the fact and
| not being a professor I have no say in the matter. Also, I'd
| prefer to keep my job since I have a family to feed.
|
| Yes, the reanalysis is terrible and it follows from the fact
| that the base methodology is unsound.
| caddemon wrote:
| Maybe it's a field specific thing, or maybe one of the
| other authors pretended to be you, but I am surprised
| something could be published with your name on the author
| list without you knowing about it. Every paper I've ever
| been listed on, sometimes 10+ authors deep, has required me
| to confirm that I want my name on the paper/attest I
| contributed.
| SubiculumCode wrote:
| You absolutely can ask to have your named removed. To be
| diplomatic, you can always claim that you didn't do enough
| to deserve credit. Even now, you could contact the journal
| to request the change.
|
| In your particular case, I will not contest that the
| reanalysis was terrible. My point was that reanalysis of a
| paper after an issue is raised is not automatically also
| crap. It may indeed be correcting the issue.
| rscho wrote:
| Yes, I could ask to lose my job. But again, I prefer not
| to. Research in certain (most?) fields is a dictatorship
| and even suggesting something to be fishy puts a target
| on your back.
|
| In this particular case, there were multiple outcomes
| tested (read: tens of them) and alpha was .05
| SubiculumCode wrote:
| Man, all I can say is you are working for a bad prof
| then. I can't imagine any of my former
| advisors/colleagues/etc punishing me for declining to be
| a co-author.
| rscho wrote:
| I am guessing that you don't do clinical research, then
| ;-)
| samatman wrote:
| I'm reminded of a bit of dialogue from Fight Club:
|
| "Which car company do you work for?"
|
| "...a major one."
| SubiculumCode wrote:
| Im on the edge of clinical work. I work with some
| clinical people, but am not clinical myself. I would say
| that they are a bit more willing to be instrumental and
| throw a bunch of comparisons out and see what sticks. Its
| largely because of a lack of awareness about statistical
| issues though, and they respond well to my objections.
| rscho wrote:
| Well, most professors around here don't respond well at
| all to objections. All (MD) professors I've met are
| dangerous Excel spreadsheet warriors. Note that we have a
| statistician in our unit, that no one listens to because
| "we always did it that way, right?".
|
| However, in a wider view the issue is far bigger than
| that: for a clinician nowadays it's become basically
| impossible to both be a really honest researcher while
| acquiring sufficient publication velocity to rise above.
| In a word, our system selects for bad actors.
| nerdponx wrote:
| Out of curiosity: why were these issues being flagged by
| readers after publication, and not during peer review?
| pmichaud wrote:
| This is a really great article, and one thing in particular that
| struck me was something super obvious in retrospect but that I
| didn't think of before:
|
| Replicability doesn't mean the study was right. If a study
| doesn't replicate then it's almost certainly nonsense. If it does
| replicate that just means that it's self consistent--but if it
| was garbage in, then it will invariably be garbage out.
| jasonhong wrote:
| > If a study doesn't replicate then it's almost certainly
| nonsense.
|
| For some fields, especially natural sciences, this statement
| makes sense. However, for sciences of the artificial (to use
| Herb Simon's term) I would respectfully disagree.
|
| For example, sometimes the underlying context has changed. I
| remember a luminary in my field (HCI) once argued that we
| should consider re-doing many studies every decade, because
| each cohort uses a different set of tools and has a different
| set of experiences, and because the underlying demographics of
| those cohorts change.
| randcraw wrote:
| Replicability shows that your method leads to consistent
| results, not that your hypothesis correctly explains the cause
| of those results. Yes, your intervention did provoke the causal
| chain into action, but it may not necessarily have correctly
| identified or thoroughly characterized the component you
| identified as the trigger. Your method may work great but only
| under the perverse set of conditions you happened to explore.
|
| Conversely, if your method fails to drive the desired outcome,
| your hypothesis could still be correct, just incomplete. Maybe
| your perturbation of the chemical reaction didn't _quite_ reach
| the activation energy. Or maybe other essential components in
| the mechanism were overlooked, given the set of conditions you
| happened to explore.
|
| Complex black box systems like the brain are notoriously
| perverse in reproducibly giving up their secrets, even when
| your hypothesis is correct and your method robust.
| splithalf wrote:
| "doesn't replicate then it's almost certainly nonsense"
|
| Disagree. A significant finding is expected to occur due to
| chance in direct proportion to the surface area for such
| possibilities. More studies, more forking paths within those
| studies, more models, all increase the frequency of spurious
| findings. So one can do everything perfectly and still get
| "garbage out". It is certain.
| pmichaud wrote:
| I'm not super sure I understand your point, but I think
| you're saying that it's possible to run a good replication
| attempt on a good study and still have it not replicate. I
| agree with that. I'm not super sure how to correctly estimate
| the chance of that happening, but one dumb way I can think of
| is just using p value, so if it was .05 then you have a 1/20
| chance of failing to replicate a study even if everything was
| done correctly.
|
| However, when I said "doesn't replicate" I didn't have a
| single attempt with a 5% chance of failure in mind. I had a
| field's aggregate attempt to confirm the result in mind,
| which would include multiple attempts and checking for
| statistical bullshit and all that.
|
| Under those conditions the chances are vanishingly small of a
| whole field getting massively unlucky when trying to
| replicate a well-done study that theoretically should
| replicate.
|
| That's what I had in mind, and I still think it's right.
|
| --
|
| Rereading what you wrote, a different interpretation of what
| you said is that the original investigators might have done
| everything perfectly, and nevertheless found a significant
| result that was spurious just because that stuff can happen
| by chance. If that's what you meant, I don't understand the
| disagreement, except maybe semantically. I would call a
| perfectly done study that shows a spurious result "nonsense,"
| and I would expect replication attempts to show the result is
| nonsense, even if the process that generated the nonsense was
| perfect. Maybe you're just saying you wouldn't call a
| perfectly done study "nonsense," regardless of the outcome?
| kleer001 wrote:
| The problem with something being wrong is that there's nearly an
| endless number of ways for something to be wrong.
|
| However, with something correct, there's very little to say about
| it.
| SubiculumCode wrote:
| The point of the article I think is that while many criticisms
| of a piece of research are valid, they are not necessarily
| meaningful. Because of their validity (despite being not
| meaningful), such criticisms can be weaponized to undermine any
| piece of research that does not fit your worldview...indeed, I
| see this regularly on HN...which is why I posted the article.
___________________________________________________________________
(page generated 2021-04-16 22:01 UTC)