[HN Gopher] Researchers discover a new form of scientific fraud:...
       ___________________________________________________________________
        
       Researchers discover a new form of scientific fraud: 'sneaked
       references'
        
       Author : toss1
       Score  : 64 points
       Date   : 2024-07-10 16:41 UTC (6 hours ago)
        
 (HTM) web link (phys.org)
 (TXT) w3m dump (phys.org)
        
       | Isamu wrote:
       | They mean references cited in the metadata but not in the actual
       | paper. So it's "invisible" and can be gamed because some citation
       | trackers rely on the metadata rather than having to parse the
       | paper.
        
         | lnwlebjel wrote:
         | "The post caught the attention of several sleuths who are now
         | the authors of the JASIST article. We used a scientific search
         | engine to look for articles citing the initial article. Google
         | Scholar found none, but Crossref and Dimensions did find
         | references. The difference? Google Scholar is likely to mostly
         | rely on the article's main text to extract the references
         | appearing in the bibliography section, whereas Crossref and
         | Dimensions use metadata provided by publishers."
         | 
         | So Google Scholar uses the text which is good. Then obvious
         | solution is to go and look up where it has been cited, which
         | would be easy to do with google scholar.
         | 
         | I don't know why anyone would jeopardize their career like
         | this.
        
           | pphysch wrote:
           | > I don't know why anyone would jeopardize their career like
           | this.
           | 
           | Publish or perish.
        
           | generationP wrote:
           | This looks like someone on the editorial side snuck the
           | references in. Don't think authors have a way to do it.
        
         | sseagull wrote:
         | That's weird. With every paper I've published (in chemistry)
         | the authors don't handle the metadata. We upload our article
         | sources (in latex or word) and the publisher handles the rest.
         | I've never done anything more.
         | 
         | Is this different in other fields? Or in sketchy journals?
        
           | urspx wrote:
           | It seems like it's being done entirely on the publisher end,
           | with them - or friends - benefiting:
           | 
           | > For example, a single researcher who was associated with
           | Technoscience Academy benefited from more than 3,000
           | additional illegitimate citations. Some journals from the
           | same publisher benefited from a couple hundred additional
           | sneaked citations.
           | 
           | Perhaps this publisher or others also offer this as some kind
           | of backroom deal / service.
        
       | saulrh wrote:
       | Extracting the key definition:                 These additional
       | references were only in the       metadata, distorting citation
       | counts and       giving certain authors an unfair advantage.
       | 
       | Papers with metadata that doesn't match the contents of the
       | paper. The article notes that Google Scholar is unaffected, as it
       | extracts citations from the paper itself by parsing the text of
       | the printed bibliography.
        
       | ilamont wrote:
       | Article doesn't really address _why_ this is happening. My guess
       | would be financial incentives for researchers and professors to
       | publish in international journals, a common practice in some
       | countries. For instance, according to  "Analysis of Chinese
       | universities' financial incentives for academic publications":
       | 
       |  _In recent years payments based directly on the number of
       | citations a paper receives have become more popular, but are
       | still much less common than those based on the journal's impact
       | factor._
       | 
       | https://opportunities-insight.britishcouncil.org/insights-bl...
        
         | beambot wrote:
         | H-index is still a frequently cited measure for an academic
         | researcher's "impact" when comparing individuals across fields
         | -- the idea being that authors with more papers and more
         | citations are "better" (all other things being equal). Papers
         | with higher citation counts also appear more prominently among
         | search results (e.g. Google Scholar).
         | 
         | If you make the analogy between the www: H-index is pagerank,
         | citations are back links, and authors (researchers) are the
         | domain names. Gaming h-index is akin to SEO hacking for
         | academic authors.
        
           | Y_Y wrote:
           | How would a paper have a h-index?
        
             | beambot wrote:
             | Ah, my original note was ambiguous - fixed. Link for
             | convenience:
             | 
             | https://en.m.wikipedia.org/wiki/H-index
        
       | gred wrote:
       | Something similar hit the news in Spain recently:
       | 
       | https://cadenaser.com/castillayleon/2024/03/15/el-candidato-...
        
       | spullara wrote:
       | so is the implication those cited are paying the publishers to
       | add them to the metadata of the papers they publish? what is the
       | actual mechanism?
        
         | smegsicle wrote:
         | left as an exercise for the reader
        
       | rdtsc wrote:
       | So how does the fraud work? Researcher wants to boost his
       | citation count so they can get more funding, respect, etc. They
       | ask their friends to cite their paper in a metadata-only
       | reference in their other papers, even though the papers didn't
       | really reference anything from the original paper.
       | 
       | They should be able to find citation "rings" then, whole groups
       | which regularly do this, probably associated with specific
       | institutions or journals.
       | 
       | The linked study did part of this:
       | https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.2489...
       | 
       | > An analysis of the 10 sneaked references in Dimensions reveals
       | that they benefit mainly two authors (Initials JNR & BK)
       | 
       | Now, it would be interesting to see if JNR and BK's publications
       | used this trick and in turn benefitted, some other group.
        
         | orochimaaru wrote:
         | Does LaTeX allow this to happen? Maybe a simple typesetting
         | change to exclude references that are not mentioned in the
         | text?
         | 
         | This is a problem with the journal review and editors. Also,
         | typesetting tools that create the final version can and should
         | be setup to protect things like these. I know folks may want to
         | go hunt for sexy genai tooling to solve this - but I think the
         | solution is much simpler.
        
           | resoluteteeth wrote:
           | > Does LaTeX allow this to happen? Maybe a simple typesetting
           | change to exclude references that are not mentioned in the
           | text?
           | 
           | > This is a problem with the journal review and editors.
           | Also, typesetting tools that create the final version can and
           | should be setup to protect things like these. I know folks
           | may want to go hunt for sexy genai tooling to solve this -
           | but I think the solution is much simpler.
           | 
           | The issue in the article isn' a paper being listed in the
           | references but not actually cited elsewhere of the paper;
           | it's not something within the actual paper at all. It's
           | metadata created by the publisher.
           | 
           | So it presumably doesn't have anything to do with what latex
           | allows or doesn't allow.
        
         | resoluteteeth wrote:
         | > So how does the fraud work? Researcher wants to boost his
         | citation count so they can get more funding, respect, etc. They
         | ask their friends to cite their paper in a metadata-only
         | reference in their other papers, even though the papers didn't
         | really reference anything from the original paper.
         | 
         | This is probably the publisher's doing rather than the author
         | of the paper.
        
       | generationP wrote:
       | More info at https://retractionwatch.com/2023/10/09/how-
       | thousands-of-invi... and https://arxiv.org/abs/2310.02192 . As
       | the latter makes clear, this type of fraud is most likely done by
       | _journal editors_ (or their assistants), not by the authors:
       | 
       | > When registering a new publication and its references at
       | Crossref, a publisher may sneak extra undue references in the
       | metadata sent in addition to the ones originally present. Then,
       | digital libraries (e.g., SpringerLink) and bibliometric platforms
       | (e.g., Dimensions) harvest these metadata, undue citations
       | included. These sneaked references are processed and counted even
       | if they are not present in the original publication.
       | 
       | The three journals in this particular case are all published by
       | _Technoscience Academy_ , an OA publisher operating out of India
       | (not one of the well-known ones). I would think twice as an
       | author before I submitted to any journal from this publisher,
       | lest my paper is abused for manipulations like this (although I'm
       | not sure if it has any journals worth submitting to anyway).
       | 
       | NB (because I got confused first): This is not really about
       | Hindawi. Hindawi published the (trash) article that these fake
       | citations were pumping up, but the pumping-up happened using
       | Technoscience Academy journals.
        
         | jszymborski wrote:
         | Not say that Hindawi is an uncontroversial party
         | https://retractionwatch.com/2024/03/14/up-to-one-in-seven-of...
        
       | motohagiography wrote:
       | feels like a neural net could detect these by scoring and ranking
       | the relevance of the content of a reference paper to the content
       | of the paper citing it. maybe it's time for a citizen science
       | project to dismantle these academic fraud rings? they form
       | networks that capture academic administrations and have
       | significant downstream effects on policy and education. just by
       | identifying the worst and most egregious offenders, leaving the
       | merely dodgy alone, it could break up the hold they have on
       | institutions.
        
         | droopyEyelids wrote:
         | Wouldnt a text search for each metadata reference in the
         | publication take care of this problem?
        
       | probably_wrong wrote:
       | I'm unclear on whether to pin this on the publisher or on the
       | authors.
       | 
       | In the first example shown in the linked pre-print [1] there's a
       | paper with 62 downloads that's been cited 107 times within two
       | months. The pre-print looks deeper into a paper with 7 "real"
       | references whose metadata has an extra 40 references not found in
       | the PDF. This leaves us with three options:                 * the
       | author of a paper with 62 downloads (not an amazing number) was
       | convinced into joining a citation ring along with 40 other
       | authors,       * the publisher has been sneaking references onto
       | unsuspecting papers, or       * the publisher has a vulnerability
       | on their metadata system that's being actively exploited by the
       | two scholars identified in the pre-print.
       | 
       | Whatever the case, I'm glad the solution is as simple as "you
       | should parse the references yourself". I do however wonder: is
       | someone checking whether all of the references are actually
       | referenced within the paper?
       | 
       | [1] https://arxiv.org/pdf/2310.02192
        
       | taeric wrote:
       | Hilarious to me how this is basically, "researchers discover SEO
       | techniques."
        
         | alan-hn wrote:
         | Editors and publishers, not researchers. Researchers aren't the
         | ones handling metadata
        
       | skyechurch wrote:
       | >Some legitimate references were also lost, meaning they were not
       | present in the metadata.
       | 
       | It's possible that some of the inconsistency between metadata and
       | text could just be due to incompetence - it's harder to find a
       | profit motive for dropping legitimate citations. Why wouldn't
       | this sort of metadata auto-generated from the text (aside from
       | enabling fraud, of course)?
        
         | feoren wrote:
         | Which is harder to detect: replacing reference 17 with the one
         | you're trying to pump, or adding reference 35 when the
         | bibliography in the original paper clearly stops at 34?
        
         | neilv wrote:
         | > _it 's harder to find a profit motive for dropping legitimate
         | citations_
         | 
         | Competitiveness for citation points, especially with someone in
         | or adjacent to your niche?
         | 
         | Also, the non-profit: pettiness.
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:01 UTC)