"Nature" publishes research: AI in providing diagnostic and treatment decisions can rival or even surpass doctors

Two professional medical AI tools have demonstrated clinical performance surpassing human doctors in controlled simulated tests, but researchers and independent experts caution that this result does not yet mean the tools are ready for real clinical settings.

According to a report by the Financial Times on Wednesday, the findings were published Wednesday in the academic journal Nature. Among them, Mira, developed by German researchers, outperformed doctors in analyzing a variety of diseases including pancreatic cancer and pneumonia; Google's Amie was more precise than human doctors in devising treatment plans and test schedules. This marks the latest step in proving the clinical value of specialized medical large language models.

The research results send important market signals for medical AI, indicating that professional medical AI tools can already provide medical advice superior to general consumer AI models in certain scenarios. However, researchers and independent experts emphasize that the tests were conducted under controlled simulated conditions, and neither tool is currently ready for direct application in real clinical settings.

Mira: Diagnostic Accuracy of 87%, Surpassing Six Specialist Doctors

Mira was jointly developed by academic teams at Dresden University of Technology and Heidelberg University and can access patient data from electronic health records, making decisions from over 85,000 options covering diagnosis tests, drug prescriptions, and surgical arrangements.

The research team tested Mira using information from over 500 emergency department clinical cases, delivered to the system in dialogue form by simulated patient AI agents. According to the Nature paper, Mira achieved an overall diagnostic accuracy of 87.1% across eight conditions such as appendicitis and pulmonary embolism, while the accuracy of a review panel of six specialists was 78.1%.

Jakob Kather, who helped develop Mira, said, "We are previewing how AI will change medicine." He likened AI agents to airplane autopilot systems, believing they can handle routine tasks and relieve medical professionals, but "ultimate responsibility always rests with the doctor."

The researchers also acknowledged that Mira has limitations. The paper notes the tool still gives "departures from best practice" treatment advice for "a small but significant number" of patients. Moreover, the case information provided by AI agents may be "more structured" than real emergency room patient statements, with fewer omissions and contradictions.

Amie: Treatment Plans Closer to Clinical Guidelines, but Potential Reasoning Errors Exist

Google's Amie was built based on its Gemini AI model, generating responses by receiving data provided by actors playing patients. Researchers compared Amie with 21 general practitioners in 100 multi-visit case scenarios, benchmarked against current UK clinical practice guidelines and drug recommendations.

The results showed that Amie matched real doctors in patient management reasoning, and its treatment plans aligned more closely with clinical guidelines than those of human doctors. Amie outperformed human doctors in complex medication reasoning.

Amie’s development team called the result a "milestone," but also noted the cases used for testing and the text-based patient scenarios do not represent real clinical environments. They stated that Amie has shown "exciting capabilities," but is "not yet ready for real-world use," with further work needed to resolve potential reasoning errors and other issues.

Independent Experts: Significant Gap Remains Between Simulated and Real Clinical Settings

Independent experts who were not involved in the studies affirmed the rigor of the research but also highlighted its limitations.

Professor Catherine Pope of Oxford University’s medical sociology said, "There is still quite a distance from the messy, complex human world of everyday healthcare."

Julie Jacko, Chair Professor of Health Informatics and Data Science at the University of Edinburgh, pointed out that the advantages reflected by AI models are mostly about their "precision and completeness" rather than "clear differences in clinical correctness." She believes this is "a strong experimental study and meaningful progress, but it demonstrates performance under structured standards rather than fully presenting the complexity of real clinical decision-making."

Wei Xing, Assistant Professor at the College of Mathematics and Physical Sciences, University of Sheffield, questioned the source of Amie’s advantage. He noted that, in one benchmark, the scores of general AI models were similar to Amie’s, "suggesting Amie’s advantage may reflect the overall rapid progress of AI models more than any special characteristic of its custom-built system."

Risk Warning and DisclaimerThe market has risks; investment must be cautious. This article does not constitute personal investment advice and does not take into account the individual user's special investment goals, financial situation, or needs. Users should consider whether any opinions, views, or conclusions in this article are appropriate for their particular circumstances. Investing based on this is at your own risk.