Why can't AI serve as qualified doctors despite high test scores?-Xinhua

Why can't AI serve as qualified doctors despite high test scores?

Source: Xinhua

Editor: huaxia

2026-04-23 01:41:15

A man takes photos of his hand and a robotic hand at the Hannover Messe 2026 in Hannover, Germany, April 21, 2026. Hannover Messe 2026, Germany's flagship industrial trade fair, opened on Monday with industrial artificial intelligence (AI) and humanoid robots taking center stage for the first time. (Xinhua/Zhang Haofu)

As AI is pushed further into clinical practice, the risks that come with it also require close attention. Kleesiek emphasized that the human factor is still crucial and that AI must be deployed under the guidance of physicians with the expertise to understand the technology and use it properly.

BERLIN, April 22 (Xinhua) -- Is a headache a warning sign of stroke? Does a cough require an X-ray? What do abnormal test results really mean?

With just a few taps to describe symptoms and upload medical reports, people can receive a polished, seemingly professional assessment from artificial intelligence (AI) in seconds. More and more people have begun to turn to it for medical advice before seeing a doctor.  

But does that mean AI can truly diagnose and treat patients?


STRAIGHT-A STUDENT in STANDARDIZED TESTS

A study published in early April by researchers at Germany's Marburg University and University Hospital Giessen and Marburg found that in a standardized knowledge test on acute kidney injury (AKI), several large language models (LLMs) outperformed the medical professionals who took part in the assessment.

Researchers compared 13 publicly available LLMs with 123 volunteer participants at the 131st Annual Congress of the German Society of Internal Medicine, including medical students and physicians in internal medicine. Both groups completed the same AKI knowledge assessment, which consisted of two case vignettes and 15 single-best-answer multiple-choice questions.

A man looks at a humanoid robot at the exhibition area of Dassault Systemes at the Hannover Messe 2026 in Hannover, Germany, April 21, 2026. Hannover Messe 2026, Germany's flagship industrial trade fair, opened on Monday with industrial artificial intelligence (AI) and humanoid robots taking center stage for the first time. (Xinhua/Zhang Haofu)

The LLMs achieved a mean score of 13.5 out of 15, or 90 percent, with several models reaching a perfect score, while the human participants averaged 7.3 out of 15, or 48.7 percent. The models also completed the test far more quickly.

"These findings show that LLMs can provide factual medical knowledge very quickly. That creates opportunities for everyday clinical practice," said Philipp Russ, the study's corresponding author.


WEAK LINK in CLINICAL REASONING

High scores on standardized tests, however, do not necessarily mean AI has the judgment required for real-world clinical care.

A study published in JAMA Network Open on April 13 found that LLMs still fall short in clinical reasoning, especially in the early stages of a case, when limited information often prevents them from generating an appropriate differential diagnosis.

To better reflect how diagnosis unfolds in practice, the researchers at Mass General Brigham and other institutions evaluated 21 frontier LLMs using 29 standardized clinical vignettes. The models were fed information step by step, beginning with basic details such as a patient's age, gender and symptoms, and followed by physical examination findings and laboratory results. Their performance at each stage was assessed by medical student evaluators.

The result showed that all the models failed to produce an appropriate differential diagnosis more than 80 percent of the time. That means they often could not reliably determine the most likely cause, rule out serious disease or offer sound guidance on what should be investigated next.

A man looks at a humanoid robot at the exhibition area of Agile Robots at Hannover Messe 2026 in Hannover, Germany, April 20, 2026. (Xinhua/Zhang Haofu)

"Differential diagnoses are central to clinical reasoning and underlie the 'art of medicine' that AI cannot currently replicate," said corresponding author Marc Succi, adding that the promise of AI in clinical medicine continues to lie in its potential to augment, not replace, physician reasoning.


DOCTOR-LED COLLABORATION

If AI is not ready to practice medicine on its own, what role should it play in healthcare? 

Jens Kleesiek, director of the Institute for Artificial Intelligence in Medicine at Essen University Hospital and the University of Duisburg-Essen, said that thanks to AI, the collaboration between doctors and computers is constantly improving.

"We are at a point where digital systems no longer just provide support, but actively intervene in processes. For example, by taking over documentation or coordinating procedures," Kleesiek said at the opening of the 2026 Annual Congress of the German Society of Internal Medicine on April 18. "This will fundamentally change medical care."

Even so, the doctor's primary responsibility remains unchanged. Kleesiek emphasized that the human factor is still crucial and that AI must be deployed under the guidance of physicians with the expertise to understand the technology and use it properly.

A man shakes hands with a humanoid robot from PaXini at the Hannover Messe 2026 in Hannover, Germany, April 21, 2026. Hannover Messe 2026, Germany's flagship industrial trade fair, opened on Monday with industrial artificial intelligence (AI) and humanoid robots taking center stage for the first time. (Xinhua/Zhang Haofu)

Marc Succi made a similar point, saying that "LLMs in healthcare continue to require a 'human in the loop' and very close oversight."

As AI is pushed further into clinical practice, the risks that come with it also require close attention. Fares Alahdab, an associate professor at the University of Missouri School of Medicine, warned that experienced clinicians are often better able to spot flawed AI-generated suggestions, while medical students may lack the judgment needed to detect subtle but potentially dangerous errors.

"A more insidious risk is the outsourcing of reasoning, a process that tends to occur gradually and almost imperceptibly," he said, adding that AI models produce fluent, polished responses that can lead users to abandon independent information-seeking, critical appraisal and knowledge synthesis. Over time, this may erode skills that should be continuously reinforced. 

Comments

Comments (0)
Send

    Follow us on