基于18F-FDG PET/CT图像建立的影像组学分析通路在肺腺癌患者表皮生长因子受体突变状态预测中的应用

Application of radiomics analysis pipeline based on 18F-FDG PET/CT images in predicting the mutation status of epidermal growth factor receptor in patients with lung adenocarcinoma

  • 摘要:
    目的 基于18F-氟脱氧葡萄糖(FDG) PET/CT图像建立影像组学分析通路,评估所有影像组学分析方法建立的影像组学分析通路对肺腺癌表皮生长因子受体(EGFR)突变状态的预测价值,得到最佳影像组学分析通路。
    方法 回顾性分析2016年6月至2017年9月就诊于天津医科大学总医院及天津医科大学其他附属医院多中心的115例肺腺癌患者的病历及18F-FDG PET/CT影像资料,其中,男性53例、女性62例,年龄(60.6±8.6)岁,EGFR野生型51例、EGFR突变型64例。在CT图像和PET图像上勾画肿瘤感兴趣区,提取影像组学特征。结合多种数据缩放方法Min-max算法(A)、Max-abs算法(B)、Z-score算法(C)、无中心化缩放的Z-score算法(D)、特征选择方法方差阈值(a)、t检验(b)、逻辑回归嵌入式技术(c)、决策树嵌入式技术(d)、随机森林嵌入式技术(e)、互信息(f)、最小绝对收缩和选择算子(LASSO,g)和模型构建方法逻辑回归(Ⅰ)、决策树(Ⅱ)、随机森林(Ⅲ)和支持向量机(Ⅳ)构建影像组学分析通路。采用准确率、受试者工作特征曲线的曲线下面积(AUC)和 F1得分评估不同通路的预测效能。对上述3个指标进行加权平均,建立新的评价指标即加权平均指数(AVE),评估通路的综合预测效能。
    结果 在基于CT图像的所有影像组学通路中,CT+A+g+Ⅱ通路的准确率最高,为0.905(95%CI:0.850~0.959),AVE亦最大,为0.875;CT+C+e+Ⅰ通路的AUC最大,为0.916(95%CI:0.856~0.977);CT+B+g+Ⅱ通路的F1得分最高,为0.869(95%CI:0.798~0.941)。在基于PET图像的所有影像组学分析通路中,PET+C+e+Ⅳ通路的准确率最高,为0.888(95%CI:0.822~0.954),AUC最大,为0.962(95%CI:0.924~1.000),AVE亦最大,为0.899;PET+C+e+Ⅰ通路的F1得分最高,为0.874(95%CI:0.804~0.945)。
    结论 在基于CT图像的影像组学分析通路中,采用LASSO(g)和决策树(Ⅱ)方法建立的分析通路的预测效能较好;在基于PET图像的影像组学分析通路中,采用Z-score算法(C)和随机森林嵌入式技术(e)建立的分析通路的预测效能较好。

     

    Abstract:
    Objective  To establish radiomics analysis pipelines based on 18F-fluorodeoxyglucose (FDG) PET/CT images and evaluate their predictive values for the mutation status of epidermal growth factor receptor (EGFR) in lung adenocarcinoma, and to obtain the optimal radiomics analysis pipeline.
    Methods  A retrospective analysis was conducted on medical records and 18F-FDG PET/CT imaging data of 115 patients with lung adenocarcinoma who visited multiple centers at Tianjin Medical University General Hospital and other affiliated hospitals of Tianjin Medical University from June 2016 to September 2017. The participants included 53 males and 62 females aged (60.6±8.6) years old; of which, 51 cases had EGFR wild-type and 64 cases had EGFR mutant. Tumor regions of interest on CT and PET images were drawn, and radiomics features were extracted. Multiple data scaling methods (Min-max algorithm (A), Max-abs algorithm (B), Z-score algorithm (C), decentralized scaling Z-score algorithm (D)), feature selection methods (variance threshold (a), t-test (b), embedding technique for Logistic regression (c), embedding technique for decision tree (d), embedding technique for random forests (e), mutual information (f), the least absolute shrinkage and selection operator (LASSO, g)), and model construction methods (Logistic regression (Ⅰ), decision tree (Ⅱ), random forest (Ⅲ), and support vector machine (Ⅳ)) were combined to construct radiomics analysis pipelines. The predictive performance of different pipelines was evaluated using accuracy, area under the curve (AUC) of receiver operating characteristic curve, and F1 score. Weighted average of the above three indicators was used to establish a new evaluation index, and the weighted average index (AVE) was applied to assess the comprehensive predictive performance of the pipeline.
    Results  Among all radiomics analysis pipelines based on CT images, the CT+A+g+Ⅱ pipeline has the highest accuracy of 0.905 (95%CI: 0.850−0.959) and the highest AVE of 0.875. The CT+C+e+Ⅰ pipeline has the highest AUC of 0.916 (95%CI: 0.856−0.977). The CT+B+g+Ⅱ pipeline has the highest F1 score of 0.869 (95%CI: 0.798−0.941). Among all radiomics analysis pipelines based on PET images, the PET+C+e+Ⅳ pipeline has the highest accuracy of 0.888 (95%CI: 0.822−0.954), the highest AUC of 0.962(95%CI: 0.924−1.000), and the highest AVE of 0.899; The PET+C+e+Ⅰ pipeline has the highest F1 score of 0.874 (95%CI: 0.804−0.945).
    Conclusions  In the radiomics analysis pipelines based on CT images, the analysis pipeline established using LASSO (g) and decision tree (Ⅱ) methods has better predictive performance. In the radiomics analysis pipelines based on PET images, the prediction performance of the analysis pipelines established using Z-score algorithm (C) and embedding technology for random forest (e) is better.

     

/

返回文章
返回