Deepfake X-Rays Sneak Past Radiologists and AI, Underscoring Abuse Potential

A majority of radiologists, as well as four artificial intelligence (AI) programs, did not consistently distinguish between authentic x-rays and AI-generated deepfakes.
Deepfake medical images pose a threat to research integrity and patient care.
Better training in recognition of deepfake images is needed for clinicians and AI programs.

A majority of radiologists could not distinguish artificial x-rays — deepfakes — from real ones when they evaluated a mix of real and fake images, according to a study published today.

Initially, only seven of 17 radiologists recognized deepfakes generated by an artificial intelligence (AI) program. Given two more chances with advance knowledge of the fakes, the radiologists still could not distinguish between real and artificial x-rays 25-30% of the time. Performance did not improve with increasing radiology experience. Offering more proof of the technology’s sophistication, four different AI programs could not identify all of the deepfake images, including the program used to create the fakes.

The findings underscore the potential threat to medical research and healthcare and the need for better training in deepfake recognition for radiologists and AI systems, reported Mickael Tordjman, MD, of the Icahn School of Medicine at Mount Sinai in New York City, and colleagues in Radiology.

“The moderate performance of radiologists and current multimodal LLMs [large language models] … in identifying synthetic radiographs, combined with the broad public availability of these tools, highlights the potential for malicious exploitation,” the authors concluded. “Future studies should evaluate a broader range of models and use a more representative sample with a lower prevalence of AI images, which would probably lower the sensitivity of the readers.”

“A multilayered response, including clinician education, automated deepfake detection systems, mandatory watermarking, and rigorous dataset governance, is essential to prevent this emerging novelty from evolving into a systemic threat,” they added.

The study reflects the “democratization of the deepfake,” according to the authors of an accompanying editorial. LLMs have made the capability to generate realistic radiographs available to anyone with an internet connection.

“This fundamentally alters the threat landscape,” asserted Rajesh Bhayana, MD, and Satheesh Krishna, MD, both of the University of Toronto. “It mirrors a pattern observed outside of medicine: Although image manipulation has long been technically feasible, widespread access to generative models has fueled the disturbing proliferation of nonconsensual deepfake imagery on social media. This issue, therefore, is not one of novelty but of scale; when fabrication becomes easy and accessible, risks shift from theoretical to systemic.”

Seconding the authors’ call for better education and more security checks, Bhayana and Krishna added, “AI has lowered the cost of fabricating medical truth to nearly nothing. During the past decade, we have focused on the potential of AI to help us see better. We must now grapple with its potential to make us see things that simply do not exist.”

Not a New Risk

That potential came to light early in the evolution of AI technology and has concerned radiologist Elliot Fishman, MD, of Johns Hopkins Medicine in Baltimore. Six years ago he co-authored an expression of concern about the potential dangers of AI in radiology. The authors described an existing “generative adversarial network” (GAN) that simultaneously trained a computerized system to generate images and discriminate between real and artificial images.

“GANs have the potential to improve image quality, reduce radiation dose, augment data for training algorithms, and perform automated image segmentation,” the authors noted. “However, there is also the potential for harm if these artificial images infiltrate our healthcare system by hackers with malicious intent… . These hacks can be targeted against specific patients or can be used as a more general attack on our radiologic data.”

In the realm of AI-generated images, cancers can appear and disappear, enlarge or shrink, and change locations and appearance. Fishman and colleagues referenced a previous article that described how malicious hackers might use artificial images “to stop a political candidate, sabotage research, commit insurance fraud, perform an act of terrorism, or even commit murder.”

“Conceptually, you might think, ‘Who would do something so stupid,'” Fishman told MedPage Today. “All you have to do is look at pathology, where people have made up pathology slides and things like that. People have done that for years and have gotten caught. Millions of articles about people going to court. They are looking at cells and they create extra data to make their data prove a certain point.”

Compromising research integrity to improve chances of getting a grant or journal acceptance are probably the most common types of abuses, he added. However, the leap to malicious acts against medical records or patient care is not so great.

“We know people do all sorts of things just because they can,” said Fishman. “They’re just bad actors.”

Study Design, Objective, Results

Tordjman and colleagues reported findings from a retrospective study of radiologists’ accuracy for identifying fake x-rays generated by the LLM ChatGPT. The study had three components. During the first part, 17 radiologists with clinical experience ranging from 0-40 years reviewed 154 x-rays, half authentic and half produced by ChatGPT, and rated image quality and determined diagnoses, without knowing the study’s objective. After being informed of the study objective, the radiologists were asked to determine whether each image was real or AI-generated. Investigators also evaluated four LLMs’ ability to distinguish the authentic x-rays from fakes. Finally, 110 new images, half real and half generated by the RoentGen program, were evaluated by the radiologists and the LLMs.

The primary outcome was per-reader accuracy during the second and third phases of the study. At the outset, 10 of the radiologists were familiar with AI-generated x-rays but 13 of the 17 did not know that ChatGPT could generate images.

During the first exercise, 10 of the radiologists identified deepfake images as real. When the readers repeated the exercise, they had an overall accuracy of 74.8% for distinguishing fake from real x-rays. Individual reader accuracy ranged from 58.4% to 91.6% (pooled sensitivity of 69.1% and specificity of 80.4%).

The investigators observed no evidence of an association between a radiologist’s years of clinical experience and accuracy. Familiarity with AI-generated images also did not influence results. The radiologists’ accuracy also did not differ significantly for chest x-rays versus other types, pathologic versus normal images, fracture versus nonfracture images, or different anatomical structures. Musculoskeletal radiologists performed better than other radiologists (83.0% vs 70.3%, P=0.04).

During the third phase of the study, radiologists had an overall accuracy of 70% for distinguishing between authentic and RoentGen-produced images. Reader- or image-specific characteristics did not significantly alter the results.

The LLMs were far from perfect. GPT-4o had 85.1% accuracy for GPT-4o images and 75.5% accuracy for the RoentGen images. Accuracy of GPT-5 for the two image sets were 82.5% and 89.1%, respectively. Accuracy declined substantially with Gemini 2.5 Pro (56.5% and 60.9%) and Llama 4 Maverick (59.1% and 51.8%).

The quality of AI-generated images is both a strength and a potential downside. When images are consistently excellent, almost too good to be true, that makes Fishman suspicious.

“With a typical slide, it’s not perfectly stained. The edges look kind of funny,” he said. “When you have an AI-created image, it’s like a Picasso. Every image is perfect. You can have one good image, but when it starts looking too good … I think that’s what [developers] are doing with AI, training it to spot [that type of perfection]. AI will look at 100 cases and say ‘there’s too little variability.’ If you have 100 images of feet or ankles or CTs of the abdomen or chest, the patients are different — some are fatter, some are skinnier, so there’s more noise in the images. If the noise is the same in every image, if the edges are perfect, something is wrong.”

The potential risk created by AI has become so great that Fishman wants proof before accepting an image as authentic.

“I do AI of pancreatic cancer, and we just look at proven consecutive cases,” he said. “When I look at a case as pancreatic cancer, I promise you that if I don’t have a biopsy, I am not using that case, even though I would bet my last dollar — or nearly my last dollar — that it’s pancreatic cancer. Every once in a while you get a strange diagnosis, but to make things really accurate, I always have proof.”

To support physician training in recognition of AI-generated medical images, Tordjman and colleagues have developed a free curated deepfake dataset.

Source link : https://www.medpagetoday.com/radiology/diagnosticradiology/120469

Author :

Publish date : 2026-03-24 20:09:00

Copyright for syndicated content belongs to the linked Source.

Deepfake X-Rays Sneak Past Radiologists and AI, Underscoring Abuse Potential

Can a Steroid Swap Protect Bone Health in Adrenal Insufficiency?

Dementia Risk Rises After Severe Infection

Related Posts

Dementia Risk Rises After Severe Infection

Can a Steroid Swap Protect Bone Health in Adrenal Insufficiency?

FDA Flags Misleading Claims on Cancer Drug Made by Billionaire Owner of ImmunityBio

Childhood Oral Health Linked to Cardiovascular Disease Later in Life

EULAR Updates Its Guidance on Managing Behcet’s Syndrome

Joss Reimer Becomes Canada’s Chief Public Health Officer

Dementia Risk Rises After Severe Infection