Abstract
This study evaluated the performance of four large language model based chatbots (LLMs) (ChatGPT-4.0, ChatGPT o1-preview, Gemini, and Meta AI) as decision-support systems for interpreting histopathologic descriptions of oral lesions, assessing agreement between their s generated a suggested primary interpretation and three differential diagnoses. Outputs were categorized as Different, Similar, or Correct compared to the consensus reference diagnosis established by two board-certified pathologists. Statistical analyses included the Friedman test to compare model performance, Wilcoxon signed-rank tests for pairwise comparisons, Cohen’s κ to assess agreement, and regression analyses to evaluate the influence of age and sex. Differential diagnosis performance was also analyzed. ChatGPT o1-preview demonstrated the highest proportion of outputs concordant with the reference diagnosis (68.6%), followed by Meta AI (65.7%), ChatGPT-4.0 (59.8%), and Gemini (27.5%). In terms of agreement with oral pathologists, ChatGPT o1-preview (κ = 0.66) and Meta AI (κ = 0.63) showed substantial agreement, ChatGPT-4.0 demonstrated moderate agreement (κ = 0.57), and Gemini showed poor agreement (κ = 0.24). Increasing patient age was associated with a mild but statistically significant reduction in model performance for ChatGPT-4.0, Meta AI, and Gemini, while no significant age effect was observed for ChatGPT o1-preview; patient sex had no significant impact. Among the evaluated chatbots, ChatGPT o1-preview showed the highest alignment with oral pathologists’ reference diagnoses. These findings support the potential role of LLMs as complementary decision-support tools for interpreting oral histopathology descriptions, while highlighting substantial inter-model variability and the need for cautious implementation with continued human oversight.
| Original language | English |
|---|---|
| Article number | 11272 |
| Journal | Scientific Reports |
| Volume | 16 |
| Issue number | 1 |
| DOIs | |
| State | Published - 27 Feb 2026 |
Bibliographical note
© 2026. The Author(s).Keywords
- Artificial intelligence
- ChatGPT
- Chatbot
- Gemini
- Large language models
- Meta AI
- Oral and maxillofacial pathology
- Diagnosis, Differential
- Humans
- Middle Aged
- Decision Support Techniques
- Male
- Decision Support Systems, Clinical
- Young Adult
- Language
- Adolescent
- Female
- Adult
- Aged
- Pathology, Oral/methods
- Large Language Models
Fingerprint
Dive into the research topics of 'Comparative analysis of large language models as decision support tools in oral pathology'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver