[HN Gopher] Researchers discover a new form of scientific fraud:...
___________________________________________________________________
Researchers discover a new form of scientific fraud: 'sneaked
references'
Author : toss1
Score : 64 points
Date : 2024-07-10 16:41 UTC (6 hours ago)
(HTM) web link (phys.org)
(TXT) w3m dump (phys.org)
| Isamu wrote:
| They mean references cited in the metadata but not in the actual
| paper. So it's "invisible" and can be gamed because some citation
| trackers rely on the metadata rather than having to parse the
| paper.
| lnwlebjel wrote:
| "The post caught the attention of several sleuths who are now
| the authors of the JASIST article. We used a scientific search
| engine to look for articles citing the initial article. Google
| Scholar found none, but Crossref and Dimensions did find
| references. The difference? Google Scholar is likely to mostly
| rely on the article's main text to extract the references
| appearing in the bibliography section, whereas Crossref and
| Dimensions use metadata provided by publishers."
|
| So Google Scholar uses the text which is good. Then obvious
| solution is to go and look up where it has been cited, which
| would be easy to do with google scholar.
|
| I don't know why anyone would jeopardize their career like
| this.
| pphysch wrote:
| > I don't know why anyone would jeopardize their career like
| this.
|
| Publish or perish.
| generationP wrote:
| This looks like someone on the editorial side snuck the
| references in. Don't think authors have a way to do it.
| sseagull wrote:
| That's weird. With every paper I've published (in chemistry)
| the authors don't handle the metadata. We upload our article
| sources (in latex or word) and the publisher handles the rest.
| I've never done anything more.
|
| Is this different in other fields? Or in sketchy journals?
| urspx wrote:
| It seems like it's being done entirely on the publisher end,
| with them - or friends - benefiting:
|
| > For example, a single researcher who was associated with
| Technoscience Academy benefited from more than 3,000
| additional illegitimate citations. Some journals from the
| same publisher benefited from a couple hundred additional
| sneaked citations.
|
| Perhaps this publisher or others also offer this as some kind
| of backroom deal / service.
| saulrh wrote:
| Extracting the key definition: These additional
| references were only in the metadata, distorting citation
| counts and giving certain authors an unfair advantage.
|
| Papers with metadata that doesn't match the contents of the
| paper. The article notes that Google Scholar is unaffected, as it
| extracts citations from the paper itself by parsing the text of
| the printed bibliography.
| ilamont wrote:
| Article doesn't really address _why_ this is happening. My guess
| would be financial incentives for researchers and professors to
| publish in international journals, a common practice in some
| countries. For instance, according to "Analysis of Chinese
| universities' financial incentives for academic publications":
|
| _In recent years payments based directly on the number of
| citations a paper receives have become more popular, but are
| still much less common than those based on the journal's impact
| factor._
|
| https://opportunities-insight.britishcouncil.org/insights-bl...
| beambot wrote:
| H-index is still a frequently cited measure for an academic
| researcher's "impact" when comparing individuals across fields
| -- the idea being that authors with more papers and more
| citations are "better" (all other things being equal). Papers
| with higher citation counts also appear more prominently among
| search results (e.g. Google Scholar).
|
| If you make the analogy between the www: H-index is pagerank,
| citations are back links, and authors (researchers) are the
| domain names. Gaming h-index is akin to SEO hacking for
| academic authors.
| Y_Y wrote:
| How would a paper have a h-index?
| beambot wrote:
| Ah, my original note was ambiguous - fixed. Link for
| convenience:
|
| https://en.m.wikipedia.org/wiki/H-index
| gred wrote:
| Something similar hit the news in Spain recently:
|
| https://cadenaser.com/castillayleon/2024/03/15/el-candidato-...
| spullara wrote:
| so is the implication those cited are paying the publishers to
| add them to the metadata of the papers they publish? what is the
| actual mechanism?
| smegsicle wrote:
| left as an exercise for the reader
| rdtsc wrote:
| So how does the fraud work? Researcher wants to boost his
| citation count so they can get more funding, respect, etc. They
| ask their friends to cite their paper in a metadata-only
| reference in their other papers, even though the papers didn't
| really reference anything from the original paper.
|
| They should be able to find citation "rings" then, whole groups
| which regularly do this, probably associated with specific
| institutions or journals.
|
| The linked study did part of this:
| https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.2489...
|
| > An analysis of the 10 sneaked references in Dimensions reveals
| that they benefit mainly two authors (Initials JNR & BK)
|
| Now, it would be interesting to see if JNR and BK's publications
| used this trick and in turn benefitted, some other group.
| orochimaaru wrote:
| Does LaTeX allow this to happen? Maybe a simple typesetting
| change to exclude references that are not mentioned in the
| text?
|
| This is a problem with the journal review and editors. Also,
| typesetting tools that create the final version can and should
| be setup to protect things like these. I know folks may want to
| go hunt for sexy genai tooling to solve this - but I think the
| solution is much simpler.
| resoluteteeth wrote:
| > Does LaTeX allow this to happen? Maybe a simple typesetting
| change to exclude references that are not mentioned in the
| text?
|
| > This is a problem with the journal review and editors.
| Also, typesetting tools that create the final version can and
| should be setup to protect things like these. I know folks
| may want to go hunt for sexy genai tooling to solve this -
| but I think the solution is much simpler.
|
| The issue in the article isn' a paper being listed in the
| references but not actually cited elsewhere of the paper;
| it's not something within the actual paper at all. It's
| metadata created by the publisher.
|
| So it presumably doesn't have anything to do with what latex
| allows or doesn't allow.
| resoluteteeth wrote:
| > So how does the fraud work? Researcher wants to boost his
| citation count so they can get more funding, respect, etc. They
| ask their friends to cite their paper in a metadata-only
| reference in their other papers, even though the papers didn't
| really reference anything from the original paper.
|
| This is probably the publisher's doing rather than the author
| of the paper.
| generationP wrote:
| More info at https://retractionwatch.com/2023/10/09/how-
| thousands-of-invi... and https://arxiv.org/abs/2310.02192 . As
| the latter makes clear, this type of fraud is most likely done by
| _journal editors_ (or their assistants), not by the authors:
|
| > When registering a new publication and its references at
| Crossref, a publisher may sneak extra undue references in the
| metadata sent in addition to the ones originally present. Then,
| digital libraries (e.g., SpringerLink) and bibliometric platforms
| (e.g., Dimensions) harvest these metadata, undue citations
| included. These sneaked references are processed and counted even
| if they are not present in the original publication.
|
| The three journals in this particular case are all published by
| _Technoscience Academy_ , an OA publisher operating out of India
| (not one of the well-known ones). I would think twice as an
| author before I submitted to any journal from this publisher,
| lest my paper is abused for manipulations like this (although I'm
| not sure if it has any journals worth submitting to anyway).
|
| NB (because I got confused first): This is not really about
| Hindawi. Hindawi published the (trash) article that these fake
| citations were pumping up, but the pumping-up happened using
| Technoscience Academy journals.
| jszymborski wrote:
| Not say that Hindawi is an uncontroversial party
| https://retractionwatch.com/2024/03/14/up-to-one-in-seven-of...
| motohagiography wrote:
| feels like a neural net could detect these by scoring and ranking
| the relevance of the content of a reference paper to the content
| of the paper citing it. maybe it's time for a citizen science
| project to dismantle these academic fraud rings? they form
| networks that capture academic administrations and have
| significant downstream effects on policy and education. just by
| identifying the worst and most egregious offenders, leaving the
| merely dodgy alone, it could break up the hold they have on
| institutions.
| droopyEyelids wrote:
| Wouldnt a text search for each metadata reference in the
| publication take care of this problem?
| probably_wrong wrote:
| I'm unclear on whether to pin this on the publisher or on the
| authors.
|
| In the first example shown in the linked pre-print [1] there's a
| paper with 62 downloads that's been cited 107 times within two
| months. The pre-print looks deeper into a paper with 7 "real"
| references whose metadata has an extra 40 references not found in
| the PDF. This leaves us with three options: * the
| author of a paper with 62 downloads (not an amazing number) was
| convinced into joining a citation ring along with 40 other
| authors, * the publisher has been sneaking references onto
| unsuspecting papers, or * the publisher has a vulnerability
| on their metadata system that's being actively exploited by the
| two scholars identified in the pre-print.
|
| Whatever the case, I'm glad the solution is as simple as "you
| should parse the references yourself". I do however wonder: is
| someone checking whether all of the references are actually
| referenced within the paper?
|
| [1] https://arxiv.org/pdf/2310.02192
| taeric wrote:
| Hilarious to me how this is basically, "researchers discover SEO
| techniques."
| alan-hn wrote:
| Editors and publishers, not researchers. Researchers aren't the
| ones handling metadata
| skyechurch wrote:
| >Some legitimate references were also lost, meaning they were not
| present in the metadata.
|
| It's possible that some of the inconsistency between metadata and
| text could just be due to incompetence - it's harder to find a
| profit motive for dropping legitimate citations. Why wouldn't
| this sort of metadata auto-generated from the text (aside from
| enabling fraud, of course)?
| feoren wrote:
| Which is harder to detect: replacing reference 17 with the one
| you're trying to pump, or adding reference 35 when the
| bibliography in the original paper clearly stops at 34?
| neilv wrote:
| > _it 's harder to find a profit motive for dropping legitimate
| citations_
|
| Competitiveness for citation points, especially with someone in
| or adjacent to your niche?
|
| Also, the non-profit: pettiness.
___________________________________________________________________
(page generated 2024-07-10 23:01 UTC)