Короткий опис (реферат):
Objective — to assess the potential influence of artificial intelligence-generated responses on decision-
making and care pathways in patients with respiratory and non-respiratory allergic conditions.
Materials and methods. Twelve questions were submitted to two of the most widely used artificial
intelligence-based chatbots: ChatGPT-4o and Gemini 2.0 Flash. Half of these questions were developed
through an analysis of Google Trends data from the past year in Ukraine (October 2023—September 2024).
Another half were compiled from an online survey of practising physicians, who identified the most frequently
asked questions by patients during clinical consultations. Five experts independently assessed each chatbot’s
responses based on three parameters: accuracy, correctness, and comprehensiveness, using a 0—3 scale
(0 = completely inaccurate/incorrect/non-comprehensive, 3 = fully accurate/correct/comprehensive). Later,
all these topics were categorised into two blocks: Decision-Making block and Awareness block.
Results and discussion. The mean scores were as follows: ChatGPT achieved 2.08 ± 0.46 (accuracy),
2.07 ± 0.52 (correctness) and 2.10 ± 0.57 (comprehensiveness) points, while Gemini scored 1.97 ± 0.71
(accuracy), 2.00 ± 0.69 (correctness) and 2.05 ± 0.67 (comprehensiveness) points. These results indicate a
slight overall advantage for ChatGPT, with the largest difference observed in accuracy (0.11 points).
Statistical analysis indicated moderate to strong agreement between experts, which is generally sufficient to
validate the results.
The analysis revealed that Decision-Making block questions were answered by ChatGPT with accuracy =
2.20 ± 0.52, correctness = 2.17 ± 0.50, comprehensiveness = 2.26 ± 0.48 points, and by Gemini with accuracy
= 2.00 ± 0.58, correctness = 1.91 ± 0.60, comprehensiveness = 2.06 ± 0.55 points. Awareness block questions
were answered by ChatGPT with accuracy = 1.92 ± 0.60, correctness = 1.92 ± 0.58 and comprehensiveness =
= 1.88 ± 0.62 points, and by Gemini with accuracy = 1.88 ± 0.65, correctness = 2.12 ± 0.57 and
comprehensiveness = 2.04 ± 0.59 points.
Conclusions. The experts in our study evaluated the answers on a four-point scale (from 0 to 3 points), and
both chatbots answered on average 2 points out of 3 possible for all parameters — accuracy, correctness and
completeness, which is a reasonable indicator. However, the analysis clearly showed (and this is noticeable in
the average deviation) that the range of Gemini’s answer scores was higher than ChatGPT’s, that is, the
chatbot gave both more high-quality and low-quality answers. This increases the chance for the questioner to
receive both a bad and a good answer, which is an indicator of the chatbot’s lower predictability.
In the Decision-Making block, ChatGPT was statistically significantly better, but in the Awareness block,
ChatGPT was better only in the accuracy of answers, while Gemini statistically significantly answered questions more completely and correctly according to expert assessments. ChatGPT consistently outperfor-med Gemini in the Decision-Making block, indicating its suitability for tasks requiring structured decision-making. In contrast, Gemini outperformed in the Awareness block, especially in correctness and completeness, indicating its effectiveness for information queries.