https://retractionwatch.com/2024/02/05/no-data-no-problem-undisclosed-tinkering-in-excel-behind-economics-paper/

Skip to content

Retraction Watch

Tracking retractions as a window into the scientific process

Menu and widgets

Pages

  * How you can support Retraction Watch
  * Invite us to speak
  * Meet the Retraction Watch staff
      + About Adam Marcus
      + About Ivan Oransky
  * Our Editorial Independence Policy
  * Papers that cite Retraction Watch
  * Privacy policy
  * Retracted coronavirus (COVID-19) papers
  * Retraction Watch Database User Guide
      + Retraction Watch Database User Guide Appendix A: Fields
      + Retraction Watch Database User Guide Appendix B: Reasons
      + Retraction Watch Database User Guide Appendix C: Article
        Types
      + Retraction Watch Database User Guide Appendix D: Changes
  * The Center For Scientific Integrity
      + Board of Directors
  * The Retraction Watch FAQ, including comments policy
      + The Retraction Watch Transparency Index
  * The Retraction Watch Hijacked Journal Checker
      + Methods
  * The Retraction Watch Leaderboard
      + Top 10 most highly cited retracted papers
  * The Retraction Watch Mass Resignations List
  * What people are saying about Retraction Watch

Search for: [                    ] [Search]
Recent Comments

  * Norbert on Prof who lost emeritus status for views on race and
    intelligence has paper flagged
  * Norbert on Prof who lost emeritus status for views on race and
    intelligence has paper flagged
  * Tim S. on Papers used by judge to justify abortion pill
    suspension retracted

Archives

Archives [Select Month        ]

No data? No problem! Undisclosed tinkering in Excel behind economics
paper

[heshmati]Almas Heshmati

Last year, a new study on green innovations and patents in 27
countries left one reader slack-jawed. The findings were no surprise.
What was baffling was how the authors, two professors of economics in
Europe, had pulled off the research in the first place. 

The reader, a PhD student in economics, was working with the same
data described in the paper. He knew they were riddled with holes -
sometimes big ones: For several countries, observations for some of
the variables the study tracked were completely absent. The authors
made no mention of how they dealt with this problem. On the contrary,
they wrote they had "balanced panel data," which in economic parlance
means a dataset with no gaps.

"I was dumbstruck for a week," said the student, who requested
anonymity for fear of harming his career. (His identity is known to
Retraction Watch.)

The student wrote a polite email to the paper's first author, Almas
Heshmati, a professor of economics at Jonkoping University in Sweden,
asking how he dealt with the missing data. 

In email correspondence seen by Retraction Watch and a follow-up Zoom
call, Heshmati told the student he had used Excel's autofill function
to mend the data. He had marked anywhere from two to four
observations before or after the missing values and dragged the
selected cells down or up, depending on the case. The program then
filled in the blanks. If the new numbers turned negative, Heshmati
replaced them with the last positive value Excel had spit out. 

The student was shocked. Replacing missing observations with
substitute values - an operation known in statistics as imputation -
is a common but controversial technique in economics that allows
certain types of analyses to be carried out on incomplete data.
Researchers have established methods for the practice; each comes
with its own drawbacks that affect how the results are interpreted.
As far as the student knew, Excel's autofill function was not among
these methods, especially not when applied in a haphazard way without
clear justification.

But it got worse. Heshmati's data, which the student convinced him to
share, showed that in several instances where there were no
observations to use for the autofill operation, the professor had
taken the values from an adjacent country in the spreadsheet. New
Zealand's data had been copied from the Netherlands, for example, and
the United States' data from the United Kingdom. 

This way, Heshmati had filled in thousands of empty cells in the
dataset - well over one in 10 - including missing values for the
study's outcome variables. A table listing descriptive statistics for
the study's 25 variables referred to "783 observations" of each
variable, but did not mention that many of these "observations" were
in fact imputations.

"This fellow, he imputed everything," the student said. "He is a
professor, he should know that if you do so much imputation then your
data will be entirely fabricated."

Other experts echoed the student's concerns when told of the Excel
operations underlying the paper.

"That sounds rather horrendous," said Andrew Harvey, a professor of
econometrics at the University of Cambridge, in England. "If you fill
in lots of data points in this way it will invalidate a lot of the
statistics and associated tests. There are ways of dealing with these
problems correctly but they do require some effort.

"Interpolating data is bad practice but lots of people do it and it's
not dishonest so long as it's mentioned," Harvey added. "The other
point about copying data from one country to another sounds much
worse."

Soren Johansen, an econometrician and professor emeritus at the
University of Copenhagen, in Denmark, characterized what Heshmati did
as "cheating." 

"The reason it's cheating isn't that he's done it, but that he hasn't
written it down," Johansen said. "It's pretty egregious." 

The paper, "Green innovations and patents in OECD countries," was
published in the Journal of Cleaner Production, a highly ranked title
from Elsevier. It has been cited just once, according to Clarivate's
Web of Science.

Neither the publisher nor the journal's editors, whom the student
said he alerted to his concerns, have responded to our requests for
comment.

Heshmati's coauthor, Mike Tsionas, a professor of economics at
Lancaster University in the UK, died recently. In a eulogy posted on
LinkedIn in January, the International Finance and Banking Society
hailed Tsionas as "a true luminary in the field of econometrics." 

In a series of emails to Retraction Watch, Heshmati, who, according
to the paper, was responsible for data curation, first said Tsionas
had been aware of how Heshmati dealt with the missing data.

"If we do not use imputation, such data is almost useless," Heshmati
said. He added that the description of the data in the paper as
"balanced" referred to "the final data" - that is, the mended
dataset.

Referring to the imputation, Heshmati wrote in a subsequent email:

    Of course, the procedure must be acknowledged and explained. I
    have missed to explain the imputation procedure in the data
    section unintentionally in the writing stage of the paper. I am
    fully responsible for imputations and missing to acknowledge it.

He added that when he was approached by the PhD student: 

    I offered him a zoom meeting to explain to him the procedure and
    even gave him the data. If I had other intensions [sic] and did
    not believe in my imputation approach, I would not share the data
    with him. If I had to start over again, I would have managed the
    data in the same way as the alternative would mean dropping
    several countries and years.

Gary Smith, a professor of economics at Pomona College in Claremont,
California, said the copying of data between countries was "beyond
concerning." He reviewed Heshmati's spreadsheet for Retraction Watch
and found five cases where more than two dozen data points had been
copied from one country to another. 

Marco Hafner, a senior economist at the RAND Corporation, a nonprofit
think tank, said "using the autofill function may not be the best of
ideas in the first place as I can imagine it is not directly evident
to what conditions missing values have been determined/imputed."

Hafner, who is research leader at RAND Europe, added that "under
reasonable assumptions and if it's really necessary for analytical
reasons, one could fill in data gaps for one country with data from
another country." But, he said, the impact of those assumptions would
need to be reported in a sensitivity analysis - something Heshmati
said he had not done. 

"At the bare minimum," Hafner said, the paper should have stated the
assumptions underlying the imputation and how it was done - something
that, he added, would have reduced the chances of the work getting
published should the reviewers find the methods inappropriate.

Like Retraction Watch? You can make a tax-deductible contribution to
support our work, subscribe to our free daily digest or paid weekly
update, follow us on Twitter, like us on Facebook, or add us to your 
RSS reader. If you find a retraction that's not in The Retraction
Watch Database, you can let us know here. For comments or feedback,
email us at team@retractionwatch.com.

Share this:

  * Email
  * Facebook
  * Twitter
  * 

Related

Posted on February 5, 2024February 5, 2024Author Frederik Joelving
Categories economics

28 thoughts on "No data? No problem! Undisclosed tinkering in Excel
behind economics paper"

 1. [b5b9e9] Concerned says:
    February 5, 2024 at 12:17 pm

    Imputation has been around for many decades. It permits the
    researcher to study, say, five predictor variables when one of
    the predictor variables shows incomplete sampling. As long as its
    use is described in methods, it is a reasonable procedure.

    Reply
     1. [731563] stewart says:
        February 5, 2024 at 4:39 pm

        As long as. Pulling in numbers from adjacent cells (countries
        next or following in the alphabet) without acknowledgement is
        not reasonable in any meaning.

        Reply
         1. [b5b9e9] Concerned says:
            February 6, 2024 at 6:05 am

            Correct.
            If you take two census bureau data sets, for example, one
            will likely not have all the counties of another. R's
            MICE program imputes the missing data cells.
            This is a failure to use competent statistical
            assistance, not plagiarism or outright fraud. Even
            competent statisticians will make errors. Statisticians
            can disagree about which statistic to calculate.

            Reply
     2. [b0c5c6] Fred PhD says:
        February 6, 2024 at 6:41 pm

        Some imputation is reasonable and defensible and some is not.
        Depends how you did it. This professor needs to read a book
        on imputation since the methods described are terrible on
        their face. The lack of transparency in the article makes it
        much worse.

        Reply
 2. [606099] humbry says:
    February 6, 2024 at 6:39 am

    "If I had other intensions [sic] and did not believe in my
    imputation approach, I would not share the data with him." The
    student was lucky he got any data whatsoever. Despite data
    sharing statements in publications most authors do not share raw
    data, nor are there any practical mechanisms for compelling them
    to do so when, for example, wishing to collect raw data to
    perform a meta-analysis.

    Reply
 3. [90da59] Anurag N Banerjee says:
    February 6, 2024 at 6:58 am

    Why not use EM algorithms!

    Reply
 4. [dff8e7] Albert Schram says:
    February 6, 2024 at 9:21 am

    These procedures seem highly questionable, especially copying
    data for one country to another one without disclosure. As an
    economic historian I have often worked with country data sets,
    but never even dreamed about this kind of "imputations". I am
    sure this is not the first time the professor has "massaged" the
    data beyond the breaking point. Full check on all his work,
    including plagiarism, please.

    Reply
 5. [a8e314] Dropna Everheardofit says:
    February 7, 2024 at 4:32 am

    I think it's embarrassing this article passed peer review. This,
    to me, is evidence of why we need methodological pre-registration
    and peer review *before* results are sought. It would at the very
    least invalidate this shoddy analysis and/or might have forced
    the author to rethink their approach.
    It's true that once you start inventing data you did not measure,
    you enter a certain realm of incredulity. If you explain in your
    paper that you chained a monkey to a typewriter and did analysis
    on the resulting gibberish, then your analysis could be
    technically correct but it would still be utterly useless (GIGO).
    If you're going to make up observations, at least attempt to make
    them defensible, or at least grounded in a sensible model. That
    way your results are only a partial departure from reality. 

    Reply
 6. [924118] M Warshaw says:
    February 7, 2024 at 8:06 am

    As a statistician, I find this appalling. Yes, there are times
    where imputation is useful. However, what this professor did
    bears no resemblance to any reputable imputation method and it's
    *never* acceptable to do imputation without describing what
    you've done in the methods section of the paper. There's no way a
    valid, reliable analysis could have been done on a dataset with
    that much fake data.
    Kudos to the student for being so attentive, proactive and
    investigating what had been done to the dataset. It's very
    intimidating for someone junior to question the work of senior
    established researchers, but it's important for such sloppy work
    to be retracted/corrected.

    Reply
 7. [731563] stewart says:
    February 8, 2024 at 5:30 am

    Thank you for removing nonsensical content from this discussion.
    Yes, the lack of transparency and lack of algorithm makes this
    paper nonsense. The student did what is right and important.
    Trash needs ot be removed more often.

    Reply
 8. [a44538] Hedvig says:
    February 10, 2024 at 7:19 am

    what percentage of the dataset was imputed?

    Reply
     1. [e492d6] glc says:
        February 10, 2024 at 10:55 am

        Roughly 133%. (This figure is also imputed. Details on
        request.)

        Reply
 9. [c0b808] Falafel Pizza says:
    February 10, 2024 at 4:31 pm

    Questions from a layperson. If the method is bad, then why is
    documenting it with "I used this method" any better than the
    alternative? Both situations use the bad method, so the only way
    to improve is to use a better method, correct? Wouldn't it have
    been better for the author to use modern GPT for this procedure
    instead of Excel?

    Reply
     1. [cc18b8] Jiri Baum says:
        February 10, 2024 at 9:59 pm

        If it's documented, everyone can judge the results based on
        the method being bad... starting with the reviewers, who might
        well have rejected the paper.

        Reply
     2. [1c92e5] Robin Adams says:
        February 11, 2024 at 12:52 am

        Documentation lets the peer reviewers and readers decide if
        the method is good or bad. We need to know where the numbers
        came from to decide how much trust to place in the paper's
        conclusions.
        If he'd documented it then this would have been just a bad
        piece of research (and hopefully not pass peer review) but
        not academic dishonesty. By generating the numbers but
        claiming they are measured values, he crosses the line to
        dishonesty.
        Using a GPT would be worse. At least with Excel the numbers
        have some relation to the measured values. A GPT would just
        make up data that looks plausible.

        Reply
     3. [e24701] Br Drnda says:
        February 11, 2024 at 1:22 am

        I guess using GPT is the only thing worse than using Excel
        without mentioning. There's no chance to know how GPT has
        come to it's values.

        Reply
     4. [a254a2] FFT says:
        February 11, 2024 at 4:20 am

        Disclosing it is better than the alternative because it can
        then at least be questioned. Reviewers can ask for a
        justification or perhaps recommend rejection, readers can
        take the results with a pinch of salt or possibly make their
        own conclusion that "come on, this is bullshit", etc, while
        without disclosure the assumption is that it's not been done
        and the dataset is clean.

        Reply
     5. [0acb8a] Michael E says:
        February 11, 2024 at 4:21 am

        Using GPT would be far worse. With the Excel method, the
        researcher mostly knows (or can know) how the values were
        imputed, and can describe the procedure; there is no way to
        know with GPT, and it would change from version to version as
        GPT is updated (or even due to randomness in response to
        prompts).

        Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked 
*

          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
          [                                             ]
Comment * [                                             ]

Name * [                              ]

Email * [                              ]

Website [                              ]

[ ] By using this form you agree with the storage and handling of
your data by this website per the terms of our privacy policy: http:/
/retractionwatch.com/privacy-policy/ *

[Post Comment] 

 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
 [                                             ] 
D[                                             ] 

This site uses Akismet to reduce spam. Learn how your comment data is
processed.

Post navigation

Previous Previous post: Could 'write once/read many' discourage
cheating?
Next Next post: Papers used by judge to justify abortion pill
suspension retracted
Privacy policy Proudly powered by WordPress