ChatGPT Paves Way for AI in Medical Diagnosis
Written by Shaveta Arora, Arushi Sharma
Tokyo Medical and Dental University's research shows that AI chatbot ChatGPT's diagnostic accuracy for orthopedic conditions varies widely, with implications for patient self-diagnosis.
Tokyo Medical and Dental University's study reveals that AI chatbot ChatGPT has identified inconsistencies in its ability to self-diagnose common orthopedic symptoms for healthcare purposes.
While AI holds promise, its potential to benefit patients may be undermined if its diagnostic accuracy and medical recommendations are not precise. A collaborative research effort involving teams from Japan and the United States has revealed that ChatGPT's diagnostic precision and the quality of its medical consultation recommendations necessitate further enhancement.
ChatGPT's Diagnostic Confusion
A study by Tokyo Medical and Dental University (TMDU) assessed ChatGPT's response accuracy in addressing five prevalent orthopedic conditions: carpal tunnel syndrome, cervical myelopathy, and hip osteoarthritis. The study was conducted due to the high frequency of orthopedic complaints in clinical practice, accounting for up to 26% of cases.
During a 5-day study period, each researcher involved in the study posed the same set of questions to ChatGPT. The study also analyzed the consistency of ChatGPT's responses across different days and among various researchers. Furthermore, the study assessed the strength of the recommendations provided by ChatGPT regarding the need for patients to seek medical attention.
"We found that the accuracy and reproducibility of ChatGPT's diagnosis are not consistent over the five conditions. ChatGPT's diagnosis was 100% accurate for carpal tunnel syndrome, but only 4% for cervical myelopathy," says lead author Tomoyuki Kuroiwa.
Furthermore, the study observed varying levels of reproducibility between days and researchers when assessing the five orthopedic conditions, ranging from "poor" to "almost perfect" consistency, despite researchers inputting identical questions on each occasion.
ChatGPT's recommendations regarding medical consultation also displayed inconsistency. While nearly 80% of its responses suggested seeking medical advice, only 12.8% met the study's criteria for a strong recommendation.
"Without direct language, the patient may be left confused after self-diagnosis, or worse, experience harm from a misdiagnosis," says Kuroiwa.
How Does ChatGPT Affect Patients' Self-Diagnosis?
This study represents the inaugural attempt to assess both the reproducibility and the strength of medical consultation recommendations provided by ChatGPT in the context of its self-diagnostic capabilities.
"In its current form, ChatGPT is inconsistent in both accuracy and precision to help patients diagnose their disease," explains senior author Koji Fujita.
"Given the risk of error and potential harm from misdiagnosis, it is important for any diagnostic tool to include clear language alerting patients to seek expert medical opinions for confirmation of a disease."
The researchers also acknowledge several study limitations, which encompass the use of questions generated by the research team instead of patient-derived inquiries, the focus on just five orthopedic conditions, and the utilization of only ChatGPT.
Though the prospects of using AI for self-diagnosis are premature, there's potential for improvement through training ChatGPT on specific diseases of interest. Subsequent research endeavors can further illuminate the evolving role of AI as a diagnostic tool.