-
据统计,全球约1/4与癌症相关的死亡是由肺癌引起的[1] 。大多数肺癌最初表现为结节,其是肺癌早期诊断的依据,肺结节的定义是最大径<3 cm的局灶性异常密度影[2]。由于人类视觉的限制,应用CT对较小结节进行筛查成为一项具有挑战性且耗时的任务。此外,目前良恶性结节的区分仍有赖于影像科医师的视觉评估及对结节大小和密度测量的综合评价,也是一项具有挑战性的任务。
人工智能(artificial intelligence,AI)是约翰·麦卡锡 (John McCarthy) 于1956年在达特矛斯会议上首次提出的,它是一种使用计算机及其技术来模拟与人类相当的智能行为和批判性思维的方法,能够分析和解释复杂的医疗数据,从而帮助诊断、管理和预测不同临床表现患者的治疗效果[3-4]。近年来,出现了许多对肺结节进行自动检测和分类的AI 算法,其可以帮助影像科医师进行日常的胸部CT图像评估。但由于这些AI 算法存在一定的局限性,导致其在临床工作中的应用仍然有限。本研究通过比较AI与影像科医师在肺结节检出及定性中的诊断情况,评估AI在临床应用中的价值。
-
355例患者的CT图像中共检出真结节1072个。其中,AI共检出真结节1063个,漏诊9个,灵敏度为99.16%(1063/1072);初级职称医师共检出真结节1009个,漏诊63个,灵敏度为94.12%(1009/1072),二者的差异有统计学意义(χ2=41.907,P<0.05)。由表1可知,在按结节大小分类的3组中,AI与初级职称医师检出灵敏度的比较,最大径<5 mm组的差异有统计学意义(P<0.05),而5 mm≤最大径<10 mm组、10 mm组≤最大径<30 mm组的差异无统计学意义(χ2=0.000、Fisher精确概率检验,均P>0.05);在按结节密度分类的3组中,AI与初级职称医师检出灵敏度的比较,实性结节组、磨玻璃结节组的差异有统计学意义(均P<0.05),而部分分实性结节组的差异无统计学意义(Fisher精确概率检验,P>0.05)。典型病例的图像见图1。
组别 AI检出的灵敏度 初级职称医师
检出的灵敏度结节大小 最大径<5 mm组 99.01%(703/710) 91.55%(650/710)a 5 mm≤最大径<10 mm组 98.99%(196/198) 98.48%(195/198) 10 mm≤最大径<30 mm组 100%(164/164) 100%(164/164) 结节密度 实性结节组 99.81%(520/521) 97.50%(508/521)a 部分实性结节组 100%(72/72) 100%(72/72) 磨玻璃结节组 98.33%(471/479) 89.35%(428/479)a 注:a表示与AI检出的灵敏度相比,差异均有统计学意义(χ2=44.002、8.761、33.396,均P<0.05)。AI为人工智能 表 1 AI与影像科医师对1072个肺结节检出灵敏度的比较
Table 1. Comparison of sensitivity between artificial intelligence and imaging physicians in detecting 1072 pulmonary nodules
-
105例患者经组织病理学检查结果证实为恶性结节88例、良性结节17例。其中AI对肺结节定性诊断的真阳性86例、假阳性15例、真阴性2例、假阴性2例;副主任医师对肺结节定性诊断的真阳性83例、假阳性1例、真阴性16例、假阴性5例。副主任医师对肺结节定性诊断的特异度及阳性预测值均明显高于AI[94.12%(16/17)对11.76%(2/17),Fisher精确概率检验,P<0.05;98.81%(83/84)对85.15%(86/101),χ2=9.172,P<0.05];副主任医师对肺结节定性诊断的灵敏度低于AI,但差异无统计学意义[94.32%(83/88)对97.73%(86/88),χ2=0.595,P>0.05];副主任医师对肺结节定性诊断的阴性预测值高于AI,但差异无统计学意义[76.19%(16/21)对50.00%(2/4),Fisher精确概率检验,P>0.05];总的来说,副主任医师对肺结节定性诊断的准确率高于AI[94.29%(99/105)对83.81%(88/105),χ2=8.796,P<0.05]。典型病例的图像及组织病理学检查图见图2。
人工智能对肺结节检出及定性的效能评估
Evaluation of artificial intelligence in the detection and characterization of pulmonary nodules
-
摘要:
目的 评估人工智能(AI)对肺结节的检出及定性的诊断效能。 方法 采用回顾性研究方法,通过简单随机抽样选取2020—2021年河北中石油中心医院肺结节病例库中的355例患者[女性205例、男性150例,年龄(55.1±12.2)岁]的肺部CT图像并导入AI系统。将AI与3名初级职称医师的诊断结果进行对比,2名中级职称医师按照双盲原则对CT图像进行审核,并以2名中级职称医师的一致性意见作为真结节诊断的参考标准,比较AI与初级职称医师对肺结节检出的灵敏度。105例患者于术前CT引导下行穿刺组织病理学检查或肺组织切除术后组织病理学检查,以组织病理学检查结果为“金标准”,比较AI与副主任医师对肺结节定性诊断的灵敏度、特异度、阳性预测值、阴性预测值、准确率。计数资料的组间比较采用卡方检验或Fisher精确概率检验。 结果 355例患者的CT图像中共检出真结节1072个,其中AI共检出真结节1063个,漏诊9个,其灵敏度为99.16%(1063/1072);初级职称医师共检出真结节1009个,漏诊63个,其灵敏度为94.12%(1009/1072)。在肺结节检出方面,AI的灵敏度明显高于初级职称医师,且差异有统计学意义(χ2=41.907,P<0.05)。105例患者经组织病理学检查结果证实为恶性结节88例、良性结节17例。其中AI对肺结节定性诊断的真阳性86例、假阳性15例、真阴性2例、假阴性2例;副主任医师对肺结节定性诊断的真阳性83例、假阳性1例、真阴性16例、假阴性5例。副主任医师对肺结节定性诊断的特异度及阳性预测值均明显高于AI[94.12%(16/17)对11.76%(2/17),Fisher精确概率检验,P<0.05;98.81%(83/84)对85.15%(86/101),χ2=9.172,P<0.05];副主任医师对肺结节定性诊断的灵敏度低于AI,但差异无统计学意义[94.32%(83/88)对97.73%(86/88),χ2=0.595,P>0.05];副主任医师对肺结节定性诊断的阴性预测值高于AI,但差异无统计学意义[76.19%(16/21)对50.00%(2/4),Fisher精确概率检验,P>0.05];总的来说,副主任医师对肺结节定性诊断的准确率高于AI[94.29%(99/105)对83.81%(88/105),χ2=8.796,P<0.05]。 结论 在肺结节的检出及定性诊断中,AI具有较高的灵敏度,但特异度、阳性预测值、阴性预测值及准确率较低。临床工作中,医师可以利用AI良好的灵敏度,帮助提高工作效率,但不能替代人工分析结果作为肺结节定性诊断的标准。 -
关键词:
- 人工智能 /
- 肺结节 /
- 体层摄影术,X线计算机
Abstract:Objective To evaluate the detection and qualitative diagnostic efficacy of artificial intelligence (AI) in pulmonary nodules. Method A retrospective study method was used to select 355 patients (205 females and 150 males, aged (55.1±12.2) years old) from the lung nodule case database of Hebei Petro China Central Hospital from 2020 to 2021 through simple random sampling. Lung CT images were imported into the AI system. The diagnostic results of AI were compared with those of three junior professional physicians. Two intermediate professional physicians reviewed the CT images in accordance with the double-blind principle, and the consistent opinions of two intermediate professional physicians were used as reference standards for the diagnosis of true nodules. The sensitivities of AI and junior professional physicians in the detection of pulmonary nodules were also compared. A total of 105 patients underwent preoperative-CT guided puncture histopathological examination or postoperative histopathological examination after lung tissue resection. The histopathological examination results were used as the "gold standard" to compare the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of AI and the deputy chief physician in the qualitative diagnosis of pulmonary nodules. Intergroup comparison of counting data was conducted using chi-square or Fisher's exact probability test. Results A total of 1 072 true nodules were detected in the CT images of 355 patients. Among these nodules, 1 063 were detected by AI, with a sensitivity of 99.16% (1 063/1 072), and 9 were missed. A total of 1 009 true nodules were detected by the junior professional physicians, with a sensitivity of 94.12% (1 009/1 072), and 63 were missed. In terms of pulmonary nodule detection, AI exhibited a significantly higher sensitivity than the junior professional physicians, and the difference was statistically significant (χ2=41.907, P<0.05). Meanwhile, 105 patients were confirmed to have 88 malignant nodules and 17 benign nodules via histopathological examination. A total of 86 cases were true positive, 15 were false positive, 2 were true negative, and 2 were false negative in the qualitative diagnosis of pulmonary nodules using AI. The deputy chief physician identified 83 true positive cases, 1 false positive case, 16 true negative cases, and 5 false negative cases in the qualitative diagnosis of pulmonary nodules. The specificity and positive predictive value of the qualitative diagnosis of pulmonary nodules by the deputy chief physician were significantly higher than those of AI (94.12% (16/17) vs. 11.76% (2/17) and Fisher's exact probability test, P<0.05; 98.81% (83/84) vs. 85.15% (86/101)), χ2=9.172, P<0.05).The deputy chief physician attained a lower sensitivity than AI in terms of the in qualitative diagnosis of pulmonary nodules, but no statistically significant difference was observed (94.32% (83/88) vs. 97.73% (86/88), χ2=0.595, P>0.05). A higher negative predictive value was detected in the qualitative diagnosis of pulmonary nodules by the deputy chief physician compared with that of the AI. However, the difference was not statistically significant (76.19% (16/21) vs. 50.00% (2/4), Fisher's exact probability test, P>0.05). Overall, the deputy chief physician attained a higher accuracy in the qualitative diagnosis of pulmonary nodules compared with the AI (94.29% (99/105) vs. 83.81% (88/105)), χ2=8.796, P<0.05). Conclusions AI shows a high sensitivity in the detection and qualitative diagnosis of pulmonary nodules. However, its specificity, positive predictive value, negative predictive value, and accuracy are low. In clinical work, physicians can use the good sensitivity of AI to improve work efficiency, but it cannot replace manual analysis results as the standard for the qualitative diagnosis of pulmonary nodules. -
Key words:
- Artificial intelligence /
- Pulmonary nodule /
- Tomography, X-ray computed
-
表 1 AI与影像科医师对1072个肺结节检出灵敏度的比较
Table 1. Comparison of sensitivity between artificial intelligence and imaging physicians in detecting 1072 pulmonary nodules
组别 AI检出的灵敏度 初级职称医师
检出的灵敏度结节大小 最大径<5 mm组 99.01%(703/710) 91.55%(650/710)a 5 mm≤最大径<10 mm组 98.99%(196/198) 98.48%(195/198) 10 mm≤最大径<30 mm组 100%(164/164) 100%(164/164) 结节密度 实性结节组 99.81%(520/521) 97.50%(508/521)a 部分实性结节组 100%(72/72) 100%(72/72) 磨玻璃结节组 98.33%(471/479) 89.35%(428/479)a 注:a表示与AI检出的灵敏度相比,差异均有统计学意义(χ2=44.002、8.761、33.396,均P<0.05)。AI为人工智能 -
[1] Siegel RL, Miller KD, Fuchs HE, et al. Cancer statistics, 2021[J]. CA Cancer J Clin, 2021, 71(1): 7−33. DOI: 10.3322/caac.21654. [2] Ather S, Kadir T, Gleeson F. Artificial intelligence and radiomics in pulmonary nodule management: current status and future applications[J]. Clin Radiol, 2020, 75(1): 13−19. DOI: 10.1016/j.crad.2019.04.017. [3] Ampavathi A, Saradhi TV. Multi disease-prediction framework using hybrid deep learning: an optimal prediction model[J]. Comput Methods Biomech Biomed Engin, 2021, 24(10): 1146−1168. DOI: 10.1080/10255842.2020.1869726. [4] Nishio M, Nishizawa M, Sugiyama O, et al. Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization[J/OL]. PLoS One, 2018, 13(4): e0195875[2023-04-05]. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0195875. DOI: 10.1371/journal.pone.0195875. [5] 中华医学会呼吸病学分会肺癌学组, 中国肺癌防治联盟专家组. 肺结节诊治中国专家共识(2018年版)[J]. 中华结核和呼吸杂志, 2018, 41(10): 763−771. DOI: 10.3760/cma.j.issn.1001-0939.2018.10.004.
Lung Cancer Study Group, Chinese Thoracic Society, Chinese Medical Association, Expert Group of Chinese Alliance Against Lung Cancer. Chinese Expert Consensus on the Diagnosis and Treatment of Pulmonary Nodules (2018 Edition)[J]. Chin J Tuberc Respir Dis, 2018, 41(10): 763−771. DOI: 10.3760/cma.j.issn.1001-0939.2018.10.004.[6] The International Early Lung Cancer Action Program Investigators. Survival of patients with stage I lung cancer detected on CT screening[J]. N Engl J Med, 2006, 355(17): 1763−1771. DOI: 10.1056/NEJMoa060476. [7] Birring SS, Peake MD. Symptoms and the early diagnosis of lung cancer[J]. Thorax, 2005, 60(4): 268−269. DOI: 10.1136/thx.2004.032698. [8] Wang XJ, Liu HL, Shen YB, et al. Low-dose computed tomography (LDCT) versus other cancer screenings in early diagnosis of lung cancer: a meta-analysis[J]. Medicine (Baltimore), 2018, 97(27): e11233. DOI: 10.1097/MD.0000000000011233. [9] 李培秀, 徐晓磊, 张强, 等. 64层螺旋CT胸部低剂量扫描的临床研究[J]. 医疗卫生装备, 2014, 35(10): 82−84, 96. DOI: 10.7687/J.ISSN.1003-8868.2014.10.082.
Li PX, Xu XL, Zhang Q, et al. Clinical research of low dose chest scanning by 64-slice spiral CT[J]. Chin Med Equip J, 2014, 35(10): 82−84, 96. DOI: 10.7687/J.ISSN.1003-8868.2014.10.082.[10] Obuchowski NA, Bullen JA. Statistical considerations for testing an AI algorithm used for prescreening lung CT images[J/OL]. Contemp Clin Trials Commun, 2019, 16: 100434[2023-04-05]. https://www.sciencedirect.com/science/article/pii/S2451865419301966?via%3Dihub. DOI: 10.1016/j.conctc.2019.100434. [11] Hwang EJ, Park CM. Clinical implementation of deep learning in thoracic radiology: potential applications and challenges[J]. Korean J Radiol, 2020, 21(5): 511−525. DOI: 10.3348/kjr.2019.0821. [12] Yuan R, Vos PM, Cooperberg PL. Computer-aided detection in screening CT for pulmonary nodules[J]. AJR Am J Roentgenol, 2006, 186(5): 1280−1287. DOI: 10.2214/AJR.04.1969. [13] Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209−249. DOI: 10.3322/caac.21660. [14] Murthy SC, Rice TW. The solitary pulmonary nodule: a primer on differential diagnosis[J]. Semin Thorac Cardiovasc Surg, 2002, 14(3): 239−249. DOI: 10.1053/stcs.2002.34450. [15] Lachance CC, Walter M. Artificial intelligence for classification of lung nodules: a review of clinical utility, diagnostic accuracy, cost-effectiveness, and guidelines [R]. Ottawa, ON: Canadian Agency for Drugs and Technologies in Health, 2020. [16] Loverdos K, Fotiadis A, Kontogianni C, et al. Lung nodules: a comprehensive review on current approach and management[J]. Ann Thoracic Med, 2019, 14(4): 226−238. DOI: 10.4103/atm.ATM_110_19. [17] Swensen SJ, Jett JR, Hartman TE, et al. CT screening for lung cancer: five-year prospective experience[J]. Radiology, 2005, 235(1): 259−265. DOI: 10.1148/radiol.2351041662. [18] McWilliams A, Tammemagi MC, Mayo JR, et al. Probability of cancer in pulmonary nodules detected on first screening CT[J]. N Engl J Med, 2013, 369(10): 910−919. DOI: 10.1056/NEJMoa1214726. [19] Kang GX, Liu K, Hou BB, et al. 3D multi-view convolutional neural networks for lung nodule classification[J/OL]. PLoS One, 2017, 12(11): e0188290[2023-04-05]. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188290. DOI: 10.1371/journal.pone.0188290. [20] Lyu J, Ling SH. Using multi-level convolutional neural network for classification of lung nodules on CT images[C]//Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Honolulu: IEEE, 2018: 686−689. DOI: 10.1109/EMBC.2018.8512376. [21] Shaffie A, Soliman A, Fraiwan L, et al. A generalized deep learning-based diagnostic system for early diagnosis of various types of pulmonary nodules[J]. Technol Cancer Res Treat, 2018, 17: 1533033818798800. DOI: 10.1177/1533033818798800. [22] Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis[J]. Med Image Anal, 2017, 42: 60−88. DOI: 10.1016/j.media.2017.07.005. [23] Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biol Cybern, 1980, 36(4): 193−202. DOI: 10.1007/BF00344251.