XGboost机器学习算法构建肺腺癌STAS临床-CT影像组学预测模型及构成模型特征的可视化分析

Construction of a clinical-CT radiomics prediction model for STAS of lung adenocarcinoma by XGboost machine learning algorithm and visualization of constituent model features

  • 摘要:
    目的 探讨基于临床征象、CT征象和影像组学评分构建的影像组学模型预测肺腺癌气腔播散(STAS)的临床价值,并通过沙普利加性解释(SHAP)方法对模型进行可视化分析。
    方法 收集2020年11月至2024年3月于淮南阳光新康医院就诊的176例肺腺癌患者的临床资料、CT平扫影像资料及手术病理资料进行回顾性分析,其中男性96例、女性80例,年龄(62.1±10.8)岁。将患者分为STAS阳性组、STAS阴性组,并采用随机数字表法按7∶3的比例分为训练组和验证组。提取瘤体、瘤周3 mm、瘤周5 mm的CT影像组学特征,采用Elastic-Logistic回归分析计算影像组学得分。符合正态分布的计量资料的组间比较采用独立样本t检验;不符合正态分布的计量资料的组间比较采用Mann-Whitney U检验;计数资料采用卡方检验进行组间比较;分别采用单因素、多因素Logistic回归分析与STAS相关的临床、影像特征;利用Logistic回归和XGboost算法构建临床-CT影像组学模型,采用受试者工作特征(ROC)曲线的曲线下面积(AUC)评价模型的预测效能。采用SHAP方法可视化分析构成模型的相关特征对评估效能的权重。
    结果 与单区域影像组学模型相比,瘤体+瘤周5 mm影像组学模型预测肺腺癌STAS具有更好的诊断效能,其在训练组及验证组中的AUC分别为 0.831(95%CI:0.751~0.897)、0.842(95%CI:0.719~0.941)。单因素、多因素 Logistic回归分析结果表明,分叶征、空泡征是预测肺腺癌 STAS的独立危险因素。训练组和验证组临床模型的AUC分别为0.851(95%CI:0.784~0.911)、0.821(95%CI:0.703~0.922)。结合影像组学得分、分叶征和空泡征构建的临床-CT影像学模型(Combined_XGboost.model)对肺腺癌STAS具有较好的评估效能,其在训练组和验证组中的AUC分别为0.902(95%CI:0.842~0.949)、0.896(95%CI:0.802~0.968)。SHAP方法直观展示了Combined_XGboost.model中各特征之间的交互关系,病例分析显示模型评估结果与组织病理学结果具有一致性。
    结论 XGboost机器学习算法构建的临床-CT影像组学模型及其模型特征SHAP方法可视化分析有助于临床医师术前精确、直观地评估肺腺癌STAS情况。

     

    Abstract:
    Objective To explore the clinical value of a combined model based on imaging and clinical signs, and CT radiomics score in predicting spread through air space (STAS) in lung adenocarcinoma and to conduct a visual analysis of the model using the shapley additive explanations (SHAP) method.
    Methods Clinical data, CT plain scan images, and surgical pathological data of 176 patients with lung adenocarcinoma who were treated at Huainan Yangguang Xinkang Hospital from November 2020 to March 2024 were retrospectively studied. Among them, 96 were male, and 80 were female, with an average age of (62.1±10.8) years. The patients were divided into the STAS positive group and the STAS negative group and were randomly divided into training and validation groups at a ratio of 7∶3 using the random number table method. CT radiomics features of the tumor body, including areas 3 and 5 mm around the tumor, were extracted, and the radiomics score was calculated using the Elastic-Logistic regression analyse. The differences between groups with normally distributed measurement data were compared using the independent sample t-test, the differences between groups with non-normally distributed measurement data were compared using Mann-Whitney U test, and count data were compared using the chi-square test. Univariate and multivariate Logistic regression analyses were conducted to analyze the clinical and radiological features related to STAS; the clinical-CT radiomics model was constructed using Logistic regression and extreme gradient boosting (XGBoost) algorithms, and the predictive efficacy of the model was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC). The SHAP method was used to visualize the weights of the relevant features in the model for evaluating the efficacy.
    Results Compared with the single-region radiomics model, the radiomics model of the tumor body+5 mm around the tumor had better diagnostic efficacy for STAS in lung adenocarcinoma. The AUC in the training group and the validation group was 0.831 (95%CI: 0.751–0.897) and 0.842 (95%CI: 0.719–0.941), respectively. Results of univariate and multivariate Logistic regression analyses indicated that lobulation and air cyst signs were independent risk factors for STAS in lung adenocarcinoma. The AUCs of the clinical model training group and validation group were 0.851 (95%CI: 0.784–0.911) and 0.821 (95%CI: 0.703–0.922), respectively. The clinical-CT radiological model (Combined_XGboost.model) combining radiomics score, lobulation sign, and air cyst sign had good evaluation efficacy for STAS in lung adenocarcinoma, with AUC in the training group and validation group being 0.902 (95%CI: 0.842–0.949) and 0.896 (95%CI: 0.802–0.968), respectively. The SHAP method visually displayed the interaction relationships between the features of the Combined_XGboost.model, and the case analysis showed that the model evaluation results were consistent with the histopathological results.
    Conclusion The clinical-CT radiomics model constructed by combining the XGBoost machine learning algorithm and the visual analysis of its model features using the SHAP method can help clinicians make precise and intuitive preoperative evaluations of STAS in lung adenocarcinoma.

     

/

返回文章
返回