Artificial Intelligence-Based Chatbots May Influence Decision-Making in Patients with Respiratory and Non-Respiratory Allergic Diseases

Богомолов, А. Є.; Бондарчук, О. Б.; Кириченко, Л. М.; Корицька, І. В.; Зайков, С. В.

Домівка
→
Кафедра фтизіатрії з курсом клінічної імунології та алергології
→
Наукові публікації каф. фтизіатрії
→
Перегляд матеріалів

Artificial Intelligence-Based Chatbots May Influence Decision-Making in Patients with Respiratory and Non-Respiratory Allergic Diseases

Богомолов, А. Є.; Бондарчук, О. Б.; Кириченко, Л. М.; Корицька, І. В.; Зайков, С. В.

URI: https://dspace.vnmu.edu.ua/123456789/11036

Дата: 2025

Короткий опис (реферат):

Objective — to assess the potential influence of artificial intelligence-generated responses on decision- making and care pathways in patients with respiratory and non-respiratory allergic conditions. Materials and methods. Twelve questions were submitted to two of the most widely used artificial intelligence-based chatbots: ChatGPT-4o and Gemini 2.0 Flash. Half of these questions were developed through an analysis of Google Trends data from the past year in Ukraine (October 2023—September 2024). Another half were compiled from an online survey of practising physicians, who identified the most frequently asked questions by patients during clinical consultations. Five experts independently assessed each chatbot’s responses based on three parameters: accuracy, correctness, and comprehensiveness, using a 0—3 scale (0 = completely inaccurate/incorrect/non-comprehensive, 3 = fully accurate/correct/comprehensive). Later, all these topics were categorised into two blocks: Decision-Making block and Awareness block. Results and discussion. The mean scores were as follows: ChatGPT achieved 2.08 ± 0.46 (accuracy), 2.07 ± 0.52 (correctness) and 2.10 ± 0.57 (comprehensiveness) points, while Gemini scored 1.97 ± 0.71 (accuracy), 2.00 ± 0.69 (correctness) and 2.05 ± 0.67 (comprehensiveness) points. These results indicate a slight overall advantage for ChatGPT, with the largest difference observed in accuracy (0.11 points). Statistical analysis indicated moderate to strong agreement between experts, which is generally sufficient to validate the results. The analysis revealed that Decision-Making block questions were answered by ChatGPT with accuracy = 2.20 ± 0.52, correctness = 2.17 ± 0.50, comprehensiveness = 2.26 ± 0.48 points, and by Gemini with accuracy = 2.00 ± 0.58, correctness = 1.91 ± 0.60, comprehensiveness = 2.06 ± 0.55 points. Awareness block questions were answered by ChatGPT with accuracy = 1.92 ± 0.60, correctness = 1.92 ± 0.58 and comprehensiveness = = 1.88 ± 0.62 points, and by Gemini with accuracy = 1.88 ± 0.65, correctness = 2.12 ± 0.57 and comprehensiveness = 2.04 ± 0.59 points. Conclusions. The experts in our study evaluated the answers on a four-point scale (from 0 to 3 points), and both chatbots answered on average 2 points out of 3 possible for all parameters — accuracy, correctness and completeness, which is a reasonable indicator. However, the analysis clearly showed (and this is noticeable in the average deviation) that the range of Gemini’s answer scores was higher than ChatGPT’s, that is, the chatbot gave both more high-quality and low-quality answers. This increases the chance for the questioner to receive both a bad and a good answer, which is an indicator of the chatbot’s lower predictability. In the Decision-Making block, ChatGPT was statistically significantly better, but in the Awareness block, ChatGPT was better only in the accuracy of answers, while Gemini statistically significantly answered questions more completely and correctly according to expert assessments. ChatGPT consistently outperfor-med Gemini in the Decision-Making block, indicating its suitability for tasks requiring structured decision-making. In contrast, Gemini outperformed in the Awareness block, especially in correctness and completeness, indicating its effectiveness for information queries.

Показати повний опис матеріалу