人工智能对肺结节检出及定性的效能评估

姚威; 李培秀; 霍英杰; 梁建利; 张新成; 冯长明; 王红辉; 张翔辰

doi:10.3760/cma.j.cn121381-202304006-00359

人工智能对肺结节检出及定性的效能评估

Evaluation of artificial intelligence in the detection and characterization of pulmonary nodules

摘要

摘要:
目的评估人工智能（AI）对肺结节的检出及定性的诊断效能。
方法采用回顾性研究方法，通过简单随机抽样选取2020—2021年河北中石油中心医院肺结节病例库中的355例患者女性205例、男性150例，年龄（55.1±12.2）岁的肺部CT图像并导入AI系统。将AI与3名初级职称医师的诊断结果进行对比，2名中级职称医师按照双盲原则对CT图像进行审核，并以2名中级职称医师的一致性意见作为真结节诊断的参考标准，比较AI与初级职称医师对肺结节检出的灵敏度。105例患者于术前CT引导下行穿刺组织病理学检查或肺组织切除术后组织病理学检查，以组织病理学检查结果为“金标准”，比较AI与副主任医师对肺结节定性诊断的灵敏度、特异度、阳性预测值、阴性预测值、准确率。计数资料的组间比较采用卡方检验或Fisher精确概率检验。
结果 355例患者的CT图像中共检出真结节1072个，其中AI共检出真结节1063个，漏诊9个，其灵敏度为99.16%（1063/1072）；初级职称医师共检出真结节1009个，漏诊63个，其灵敏度为94.12%（1009/1072）。在肺结节检出方面，AI的灵敏度明显高于初级职称医师，且差异有统计学意义（χ²=41.907，P<0.05）。105例患者经组织病理学检查结果证实为恶性结节88例、良性结节17例。其中AI对肺结节定性诊断的真阳性86例、假阳性15例、真阴性2例、假阴性2例；副主任医师对肺结节定性诊断的真阳性83例、假阳性1例、真阴性16例、假阴性5例。副主任医师对肺结节定性诊断的特异度及阳性预测值均明显高于AI94.12%（16/17）对11.76%（2/17），Fisher精确概率检验，P<0.05；98.81%（83/84）对85.15%（86/101），χ²=9.172，P<0.05；副主任医师对肺结节定性诊断的灵敏度低于AI，但差异无统计学意义94.32%（83/88）对97.73%（86/88），χ²=0.595，P>0.05；副主任医师对肺结节定性诊断的阴性预测值高于AI，但差异无统计学意义76.19%（16/21）对50.00%（2/4），Fisher精确概率检验，P>0.05；总的来说，副主任医师对肺结节定性诊断的准确率高于AI94.29%（99/105）对83.81%（88/105），χ²=8.796，P<0.05。
结论在肺结节的检出及定性诊断中，AI具有较高的灵敏度，但特异度、阳性预测值、阴性预测值及准确率较低。临床工作中，医师可以利用AI良好的灵敏度，帮助提高工作效率，但不能替代人工分析结果作为肺结节定性诊断的标准。

Abstract:
Objective To evaluate the detection and qualitative diagnostic efficacy of artificial intelligence (AI) in pulmonary nodules.

Method A retrospective study method was used to select 355 patients (205 females and 150 males, aged (55.1±12.2) years old) from the lung nodule case database of Hebei Petro China Central Hospital from 2020 to 2021 through simple random sampling. Lung CT images were imported into the AI system. The diagnostic results of AI were compared with those of three junior professional physicians. Two intermediate professional physicians reviewed the CT images in accordance with the double-blind principle, and the consistent opinions of two intermediate professional physicians were used as reference standards for the diagnosis of true nodules. The sensitivities of AI and junior professional physicians in the detection of pulmonary nodules were also compared. A total of 105 patients underwent preoperative-CT guided puncture histopathological examination or postoperative histopathological examination after lung tissue resection. The histopathological examination results were used as the "gold standard" to compare the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of AI and the deputy chief physician in the qualitative diagnosis of pulmonary nodules. Intergroup comparison of counting data was conducted using chi-square or Fisher's exact probability test.

Results A total of 1 072 true nodules were detected in the CT images of 355 patients. Among these nodules, 1 063 were detected by AI, with a sensitivity of 99.16% (1 063/1 072), and 9 were missed. A total of 1 009 true nodules were detected by the junior professional physicians, with a sensitivity of 94.12% (1 009/1 072), and 63 were missed. In terms of pulmonary nodule detection, AI exhibited a significantly higher sensitivity than the junior professional physicians, and the difference was statistically significant (χ²=41.907, P<0.05). Meanwhile, 105 patients were confirmed to have 88 malignant nodules and 17 benign nodules via histopathological examination. A total of 86 cases were true positive, 15 were false positive, 2 were true negative, and 2 were false negative in the qualitative diagnosis of pulmonary nodules using AI. The deputy chief physician identified 83 true positive cases, 1 false positive case, 16 true negative cases, and 5 false negative cases in the qualitative diagnosis of pulmonary nodules. The specificity and positive predictive value of the qualitative diagnosis of pulmonary nodules by the deputy chief physician were significantly higher than those of AI (94.12% (16/17) vs. 11.76% (2/17) and Fisher's exact probability test, P<0.05; 98.81% (83/84) vs. 85.15% (86/101)), χ²=9.172, P<0.05).The deputy chief physician attained a lower sensitivity than AI in terms of the in qualitative diagnosis of pulmonary nodules, but no statistically significant difference was observed (94.32% (83/88) vs. 97.73% (86/88), χ²=0.595, P>0.05). A higher negative predictive value was detected in the qualitative diagnosis of pulmonary nodules by the deputy chief physician compared with that of the AI. However, the difference was not statistically significant (76.19% (16/21) vs. 50.00% (2/4), Fisher's exact probability test, P>0.05). Overall, the deputy chief physician attained a higher accuracy in the qualitative diagnosis of pulmonary nodules compared with the AI (94.29% (99/105) vs. 83.81% (88/105)), χ²=8.796, P<0.05).

Conclusions AI shows a high sensitivity in the detection and qualitative diagnosis of pulmonary nodules. However, its specificity, positive predictive value, negative predictive value, and accuracy are low. In clinical work, physicians can use the good sensitivity of AI to improve work efficiency, but it cannot replace manual analysis results as the standard for the qualitative diagnosis of pulmonary nodules.

HTML全文

参考文献(23)

施引文献

资源附件(0)