https://retractionwatch.com/2024/02/05/no-data-no-problem-undisclosed-tinkering-in-excel-behind-economics-paper/ Skip to content Retraction Watch Tracking retractions as a window into the scientific process Menu and widgets Pages * How you can support Retraction Watch * Invite us to speak * Meet the Retraction Watch staff + About Adam Marcus + About Ivan Oransky * Our Editorial Independence Policy * Papers that cite Retraction Watch * Privacy policy * Retracted coronavirus (COVID-19) papers * Retraction Watch Database User Guide + Retraction Watch Database User Guide Appendix A: Fields + Retraction Watch Database User Guide Appendix B: Reasons + Retraction Watch Database User Guide Appendix C: Article Types + Retraction Watch Database User Guide Appendix D: Changes * The Center For Scientific Integrity + Board of Directors * The Retraction Watch FAQ, including comments policy + The Retraction Watch Transparency Index * The Retraction Watch Hijacked Journal Checker + Methods * The Retraction Watch Leaderboard + Top 10 most highly cited retracted papers * The Retraction Watch Mass Resignations List * What people are saying about Retraction Watch Search for: [ ] [Search] Recent Comments * Norbert on Prof who lost emeritus status for views on race and intelligence has paper flagged * Norbert on Prof who lost emeritus status for views on race and intelligence has paper flagged * Tim S. on Papers used by judge to justify abortion pill suspension retracted Archives Archives [Select Month ] No data? No problem! Undisclosed tinkering in Excel behind economics paper [heshmati]Almas Heshmati Last year, a new study on green innovations and patents in 27 countries left one reader slack-jawed. The findings were no surprise. What was baffling was how the authors, two professors of economics in Europe, had pulled off the research in the first place. The reader, a PhD student in economics, was working with the same data described in the paper. He knew they were riddled with holes - sometimes big ones: For several countries, observations for some of the variables the study tracked were completely absent. The authors made no mention of how they dealt with this problem. On the contrary, they wrote they had "balanced panel data," which in economic parlance means a dataset with no gaps. "I was dumbstruck for a week," said the student, who requested anonymity for fear of harming his career. (His identity is known to Retraction Watch.) The student wrote a polite email to the paper's first author, Almas Heshmati, a professor of economics at Jonkoping University in Sweden, asking how he dealt with the missing data. In email correspondence seen by Retraction Watch and a follow-up Zoom call, Heshmati told the student he had used Excel's autofill function to mend the data. He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out. The student was shocked. Replacing missing observations with substitute values - an operation known in statistics as imputation - is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted. As far as the student knew, Excel's autofill function was not among these methods, especially not when applied in a haphazard way without clear justification. But it got worse. Heshmati's data, which the student convinced him to share, showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand's data had been copied from the Netherlands, for example, and the United States' data from the United Kingdom. This way, Heshmati had filled in thousands of empty cells in the dataset - well over one in 10 - including missing values for the study's outcome variables. A table listing descriptive statistics for the study's 25 variables referred to "783 observations" of each variable, but did not mention that many of these "observations" were in fact imputations. "This fellow, he imputed everything," the student said. "He is a professor, he should know that if you do so much imputation then your data will be entirely fabricated." Other experts echoed the student's concerns when told of the Excel operations underlying the paper. "That sounds rather horrendous," said Andrew Harvey, a professor of econometrics at the University of Cambridge, in England. "If you fill in lots of data points in this way it will invalidate a lot of the statistics and associated tests. There are ways of dealing with these problems correctly but they do require some effort. "Interpolating data is bad practice but lots of people do it and it's not dishonest so long as it's mentioned," Harvey added. "The other point about copying data from one country to another sounds much worse." Soren Johansen, an econometrician and professor emeritus at the University of Copenhagen, in Denmark, characterized what Heshmati did as "cheating." "The reason it's cheating isn't that he's done it, but that he hasn't written it down," Johansen said. "It's pretty egregious." The paper, "Green innovations and patents in OECD countries," was published in the Journal of Cleaner Production, a highly ranked title from Elsevier. It has been cited just once, according to Clarivate's Web of Science. Neither the publisher nor the journal's editors, whom the student said he alerted to his concerns, have responded to our requests for comment. Heshmati's coauthor, Mike Tsionas, a professor of economics at Lancaster University in the UK, died recently. In a eulogy posted on LinkedIn in January, the International Finance and Banking Society hailed Tsionas as "a true luminary in the field of econometrics." In a series of emails to Retraction Watch, Heshmati, who, according to the paper, was responsible for data curation, first said Tsionas had been aware of how Heshmati dealt with the missing data. "If we do not use imputation, such data is almost useless," Heshmati said. He added that the description of the data in the paper as "balanced" referred to "the final data" - that is, the mended dataset. Referring to the imputation, Heshmati wrote in a subsequent email: Of course, the procedure must be acknowledged and explained. I have missed to explain the imputation procedure in the data section unintentionally in the writing stage of the paper. I am fully responsible for imputations and missing to acknowledge it. He added that when he was approached by the PhD student: I offered him a zoom meeting to explain to him the procedure and even gave him the data. If I had other intensions [sic] and did not believe in my imputation approach, I would not share the data with him. If I had to start over again, I would have managed the data in the same way as the alternative would mean dropping several countries and years. Gary Smith, a professor of economics at Pomona College in Claremont, California, said the copying of data between countries was "beyond concerning." He reviewed Heshmati's spreadsheet for Retraction Watch and found five cases where more than two dozen data points had been copied from one country to another. Marco Hafner, a senior economist at the RAND Corporation, a nonprofit think tank, said "using the autofill function may not be the best of ideas in the first place as I can imagine it is not directly evident to what conditions missing values have been determined/imputed." Hafner, who is research leader at RAND Europe, added that "under reasonable assumptions and if it's really necessary for analytical reasons, one could fill in data gaps for one country with data from another country." But, he said, the impact of those assumptions would need to be reported in a sensitivity analysis - something Heshmati said he had not done. "At the bare minimum," Hafner said, the paper should have stated the assumptions underlying the imputation and how it was done - something that, he added, would have reduced the chances of the work getting published should the reviewers find the methods inappropriate. Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly update, follow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that's not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com. Share this: * Email * Facebook * Twitter * Related Posted on February 5, 2024February 5, 2024Author Frederik Joelving Categories economics 28 thoughts on "No data? No problem! Undisclosed tinkering in Excel behind economics paper" 1. [b5b9e9] Concerned says: February 5, 2024 at 12:17 pm Imputation has been around for many decades. It permits the researcher to study, say, five predictor variables when one of the predictor variables shows incomplete sampling. As long as its use is described in methods, it is a reasonable procedure. Reply 1. [731563] stewart says: February 5, 2024 at 4:39 pm As long as. Pulling in numbers from adjacent cells (countries next or following in the alphabet) without acknowledgement is not reasonable in any meaning. Reply 1. [b5b9e9] Concerned says: February 6, 2024 at 6:05 am Correct. If you take two census bureau data sets, for example, one will likely not have all the counties of another. R's MICE program imputes the missing data cells. This is a failure to use competent statistical assistance, not plagiarism or outright fraud. Even competent statisticians will make errors. Statisticians can disagree about which statistic to calculate. Reply 2. [b0c5c6] Fred PhD says: February 6, 2024 at 6:41 pm Some imputation is reasonable and defensible and some is not. Depends how you did it. This professor needs to read a book on imputation since the methods described are terrible on their face. The lack of transparency in the article makes it much worse. Reply 2. [606099] humbry says: February 6, 2024 at 6:39 am "If I had other intensions [sic] and did not believe in my imputation approach, I would not share the data with him." The student was lucky he got any data whatsoever. Despite data sharing statements in publications most authors do not share raw data, nor are there any practical mechanisms for compelling them to do so when, for example, wishing to collect raw data to perform a meta-analysis. Reply 3. [90da59] Anurag N Banerjee says: February 6, 2024 at 6:58 am Why not use EM algorithms! Reply 4. [dff8e7] Albert Schram says: February 6, 2024 at 9:21 am These procedures seem highly questionable, especially copying data for one country to another one without disclosure. As an economic historian I have often worked with country data sets, but never even dreamed about this kind of "imputations". I am sure this is not the first time the professor has "massaged" the data beyond the breaking point. Full check on all his work, including plagiarism, please. Reply 5. [a8e314] Dropna Everheardofit says: February 7, 2024 at 4:32 am I think it's embarrassing this article passed peer review. This, to me, is evidence of why we need methodological pre-registration and peer review *before* results are sought. It would at the very least invalidate this shoddy analysis and/or might have forced the author to rethink their approach. It's true that once you start inventing data you did not measure, you enter a certain realm of incredulity. If you explain in your paper that you chained a monkey to a typewriter and did analysis on the resulting gibberish, then your analysis could be technically correct but it would still be utterly useless (GIGO). If you're going to make up observations, at least attempt to make them defensible, or at least grounded in a sensible model. That way your results are only a partial departure from reality. Reply 6. [924118] M Warshaw says: February 7, 2024 at 8:06 am As a statistician, I find this appalling. Yes, there are times where imputation is useful. However, what this professor did bears no resemblance to any reputable imputation method and it's *never* acceptable to do imputation without describing what you've done in the methods section of the paper. There's no way a valid, reliable analysis could have been done on a dataset with that much fake data. Kudos to the student for being so attentive, proactive and investigating what had been done to the dataset. It's very intimidating for someone junior to question the work of senior established researchers, but it's important for such sloppy work to be retracted/corrected. Reply 7. [731563] stewart says: February 8, 2024 at 5:30 am Thank you for removing nonsensical content from this discussion. Yes, the lack of transparency and lack of algorithm makes this paper nonsense. The student did what is right and important. Trash needs ot be removed more often. Reply 8. [a44538] Hedvig says: February 10, 2024 at 7:19 am what percentage of the dataset was imputed? Reply 1. [e492d6] glc says: February 10, 2024 at 10:55 am Roughly 133%. (This figure is also imputed. Details on request.) Reply 9. [c0b808] Falafel Pizza says: February 10, 2024 at 4:31 pm Questions from a layperson. If the method is bad, then why is documenting it with "I used this method" any better than the alternative? Both situations use the bad method, so the only way to improve is to use a better method, correct? Wouldn't it have been better for the author to use modern GPT for this procedure instead of Excel? Reply 1. [cc18b8] Jiri Baum says: February 10, 2024 at 9:59 pm If it's documented, everyone can judge the results based on the method being bad... starting with the reviewers, who might well have rejected the paper. Reply 2. [1c92e5] Robin Adams says: February 11, 2024 at 12:52 am Documentation lets the peer reviewers and readers decide if the method is good or bad. We need to know where the numbers came from to decide how much trust to place in the paper's conclusions. If he'd documented it then this would have been just a bad piece of research (and hopefully not pass peer review) but not academic dishonesty. By generating the numbers but claiming they are measured values, he crosses the line to dishonesty. Using a GPT would be worse. At least with Excel the numbers have some relation to the measured values. A GPT would just make up data that looks plausible. Reply 3. [e24701] Br Drnda says: February 11, 2024 at 1:22 am I guess using GPT is the only thing worse than using Excel without mentioning. There's no chance to know how GPT has come to it's values. Reply 4. [a254a2] FFT says: February 11, 2024 at 4:20 am Disclosing it is better than the alternative because it can then at least be questioned. Reviewers can ask for a justification or perhaps recommend rejection, readers can take the results with a pinch of salt or possibly make their own conclusion that "come on, this is bullshit", etc, while without disclosure the assumption is that it's not been done and the dataset is clean. Reply 5. [0acb8a] Michael E says: February 11, 2024 at 4:21 am Using GPT would be far worse. With the Excel method, the researcher mostly knows (or can know) how the values were imputed, and can describe the procedure; there is no way to know with GPT, and it would change from version to version as GPT is updated (or even due to randomness in response to prompts). Reply Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * [ ] [ ] [ ] [ ] [ ] [ ] [ ] Comment * [ ] Name * [ ] Email * [ ] Website [ ] [ ] By using this form you agree with the storage and handling of your data by this website per the terms of our privacy policy: http:/ /retractionwatch.com/privacy-policy/ * [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] This site uses Akismet to reduce spam. Learn how your comment data is processed. Post navigation Previous Previous post: Could 'write once/read many' discourage cheating? Next Next post: Papers used by judge to justify abortion pill suspension retracted Privacy policy Proudly powered by WordPress