Deepfake Medical Images: Why Even Radiologists and AI Models Can’t Reliably Detect Fake X-Rays

Science & Technology (Commonwealth Union) – In the world of radiology it is a challenge for both radiologists and advanced multimodal large language models (LLMs) to reliably tell the difference between authentic X-ray images and AI-generated “deepfake” versions, according to a study appearing in Radiology. The research highlights the growing risks posed by synthetic medical images and underscores the need for improved detection tools and training to safeguard the reliability of medical imaging and help healthcare professionals identify manipulated scans.

A “deepfake” refers to a video, photograph, image, or audio recording that looks genuine but has been created or altered using artificial intelligence.

The study lead author of the study, Mickael Tordjman, MD, a postdoctoral fellow at the Icahn School of Medicine at Mount Sinai in New York indicated that their findings show that these deepfake X-rays are convincing enough to mislead radiologists—the specialists most highly trained to interpret medical images—even when they know AI-generated images may be included.

He further pointed out that this raises serious concerns about potential legal fraud, since a fabricated fracture could appear indistinguishable from a real injury. Tordjman pointed out that there is also a major cybersecurity concern if hackers were to infiltrate hospital networks and insert synthetic images, potentially influencing diagnoses or disrupting clinical care by eroding trust in digital medical records.

Seventeen radiologists from 12 medical centres across six countries—the United States, France, Germany, Turkey, the United Kingdom and the United Arab Emirates—took part in the retrospective study.

Their professional experience ranged from newly qualified to 40 years in practice. The study analysed 264 X-ray images, with half being genuine scans and the other half produced using artificial intelligence. The radiologists reviewed two separate image datasets, with no overlap between them.

The first dataset contained both real images and AI-generated images created using ChatGPT, covering several anatomical regions. The second dataset focused on chest X-rays, again split evenly between authentic scans and images generated by RoentGen, an open-source generative diffusion AI model developed by researchers at Stanford Medicine.

When the radiologists were not told the true aim of the study and were simply asked—after assessing the technical quality of each ChatGPT image—whether they noticed anything unusual, only 41% independently recognised that some images were AI-generated. However, once they were informed that the dataset included synthetic images, their average accuracy in distinguishing real X-rays from AI-generated ones rose to 75%.

Radiologists’ ability to correctly identify images generated by ChatGPT varied widely, with individual accuracy rates ranging from 58% to 92%. The performance of four multimodal large language models—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—was similar, with detection accuracy between 57% and 85%. Even GPT-4o, the same model used to produce the deepfake images, did not successfully recognise all of them, although it outperformed the Google and Meta models by a clear margin.

When identifying synthetic chest X-rays produced by the RoentGen system, radiologists achieved accuracy levels between 62% and 78%, while the LLMs recorded results ranging from 52% to 89%.

The study also found no relationship between a radiologist’s years of professional experience and their success in spotting AI-generated X-ray images. However, specialists in musculoskeletal radiology were significantly more accurate than radiologists working in other subspecialties.

“Deepfake medical images often look too perfect,” explained Dr. Tordjman. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone.”

As technology further enhances many challenges often come with the benefits of these advances.

Deepfake Medical Images: Why Even Radiologists and AI Models Can’t Reliably Detect Fake X-Rays

Dr Roberta Bondar: From Childhood Curiosity to Pioneering Space Exploration and Scientific Legacy

How DNA Barcodes Are Helping Scientists Identify Gold Nanoparticles That Target Cancer at the Cellular Level

Cyprus Hosts EU Summit as Leaders Focus on Security, Energy Prices, and Financial Framework

Fiji’s Hidden Crisis: How a Meth Epidemic Is Shattering the Pacific Paradise

Can Jordan and Bahrain Prevent a Wider Middle East Crisis? Their Urgent Diplomatic Move Signals a High-Stakes Warning

Related Articles

Next-Gen 3D Skin Model Aims to Reduce Animal Testing and Personalize Treatments

Scientists Turn Everyday Tape into a Memory Device — Could It Replace Simple Computers?

Can an Injection Fix Breathing Issues in Pugs and Bulldogs?

Why Timing Matters in Biology: Scientists Uncover How Cells Respond to Mechanical Stress

Are We Closer to Decoding Cellular Organisation Thanks to This High-Resolution Microscopy Breakthrough?

BRICS expansion: five countries join, another 25 to follow in 2024

Chinese state media lauds India’s achievements under PM Modi!

Security alarm for India: Bangladesh frees Al-Qaeda terror group chief