Here’s Why Dozens of Autism Publications Were Retracted



Springer Nature has retracted more than three dozen publications that relied on a problematic dataset, the publisher confirmed to MedPage Today.

All 38 of the papers, conference proceedings, and book chapters involved a dataset that purported to offer images of the faces of children with and without autism. However, there were major problems with how it was put together.

“From what we’ve managed to piece together, the original dataset was collected by a retired computer scientist who scraped images of children on websites related to autism, alongside a control dataset of images of children that had been scraped from across the internet,” Tim Kersjes, head of research integrity at Springer Nature, told MedPage Today.

The news was first reported in The Transmitter.

In all, the dataset contained more than 2,900 photos of kids’ faces that were labeled as autistic or not autistic. It was initially available on Kaggle — a data science and machine-learning platform owned by Google — and later on Google Drive.

Last fall, Springer Nature flagged a paper for investigation. At the same time, an independent sleuth brought another paper to the publisher’s attention for using phrases that raised suspicion of being written by artificial intelligence (AI). Both papers had used the same questionable autism dataset.

Kersjes said the dataset raised obvious ethical issues; first there was no proof that the kids in the photos were or were not autistic. Also, there was no way of knowing if the kids’ guardians had consented to the photos being used this way.

“This significant methodological issue undermined the results and conclusions of the publications,” Kersjes said.

On top of that, The Transmitter reported, the photos all had different lighting and angles, which would have made identifying any possible differences in facial features more difficult.

Springer Nature then searched for other publications that used the dataset, identifying a total of 38 papers, conference proceedings, and book chapters for retraction.

Nearly all of the retractions are complete at this point, and the few remaining are expected to be finished shortly, as retracting papers from conference proceedings takes a bit more time, a spokesperson for Springer Nature told MedPage Today.

The publisher also took the unusual step of not just retracting the flagged research, but removing it entirely due to the sensitive nature of the dataset, the spokesperson said. And it contacted other publishers to alert them about the problematic dataset as well.

The Institute of Electrical and Electronics Engineers placed expressions of concern on 25 articles that used the Kaggle dataset, noting ethical issues and “potentially questionable data.” Other publishers, including Elsevier and PLOS, retracted articles that used the dataset. Three articles published by Wiley used the Kaggle dataset and two of those have been retracted.

However, dozens of papers from other publications remain available, with no retraction notice or expression of concern.

Kersjes said he hopes the retractions and removals will deter further use of the dataset. While the original dataset was removed from Kaggle by its author in 2022, the files were later uploaded to Google Drive, according to The Transmitter. Springer Nature also found two datasets posted by other Kaggle users that appear to be replications of the original.

The Springer Nature spokesperson said the company has its own quality filters for concerns like plagiarism, conflicts of interest, and missing clinical trial numbers, but this sham dataset is not the type of issue those filters can catch. Plus, Kaggle as a platform has plenty of reliable datasets.

“This concern is one that would be picked up during the editorial evaluation process by editors and peer reviewers,” Kersjes told MedPage Today.

Since many of the papers were published in computer science journals or through computer science conferences, Kersjes said there may be a different level of awareness around ethical and privacy concerns than there would be for psychiatry or pediatrics research.

“Careful consideration should always be given to the source and legitimacy of any data and whether consent has been appropriately obtained,” he cautioned.

Please enable JavaScript to view the comments powered by Disqus.



Source link : https://www.medpagetoday.com/special-reports/features/120378

Author :

Publish date : 2026-03-19 12:42:00

Copyright for syndicated content belongs to the linked Source.
Exit mobile version