Abstract:
Objective To establish radiomics analysis pipelines based on 18F-fluorodeoxyglucose (FDG) PET/CT images and evaluate their predictive values for the mutation status of epidermal growth factor receptor (EGFR) in lung adenocarcinoma, and to obtain the optimal radiomics analysis pipeline.
Methods A retrospective analysis was conducted on medical records and 18F-FDG PET/CT imaging data of 115 patients with lung adenocarcinoma who visited multiple centers at Tianjin Medical University General Hospital and other affiliated hospitals of Tianjin Medical University from June 2016 to September 2017. The participants included 53 males and 62 females aged (60.6±8.6) years old; of which, 51 cases had EGFR wild-type and 64 cases had EGFR mutant. Tumor regions of interest on CT and PET images were drawn, and radiomics features were extracted. Multiple data scaling methods (Min-max algorithm (A), Max-abs algorithm (B), Z-score algorithm (C), decentralized scaling Z-score algorithm (D)), feature selection methods (variance threshold (a), t-test (b), embedding technique for Logistic regression (c), embedding technique for decision tree (d), embedding technique for random forests (e), mutual information (f), the least absolute shrinkage and selection operator (LASSO, g)), and model construction methods (Logistic regression (Ⅰ), decision tree (Ⅱ), random forest (Ⅲ), and support vector machine (Ⅳ)) were combined to construct radiomics analysis pipelines. The predictive performance of different pipelines was evaluated using accuracy, area under the curve (AUC) of receiver operating characteristic curve, and F1 score. Weighted average of the above three indicators was used to establish a new evaluation index, and the weighted average index (AVE) was applied to assess the comprehensive predictive performance of the pipeline.
Results Among all radiomics analysis pipelines based on CT images, the CT+A+g+Ⅱ pipeline has the highest accuracy of 0.905 (95%CI: 0.850−0.959) and the highest AVE of 0.875. The CT+C+e+Ⅰ pipeline has the highest AUC of 0.916 (95%CI: 0.856−0.977). The CT+B+g+Ⅱ pipeline has the highest F1 score of 0.869 (95%CI: 0.798−0.941). Among all radiomics analysis pipelines based on PET images, the PET+C+e+Ⅳ pipeline has the highest accuracy of 0.888 (95%CI: 0.822−0.954), the highest AUC of 0.962(95%CI: 0.924−1.000), and the highest AVE of 0.899; The PET+C+e+Ⅰ pipeline has the highest F1 score of 0.874 (95%CI: 0.804−0.945).
Conclusions In the radiomics analysis pipelines based on CT images, the analysis pipeline established using LASSO (g) and decision tree (Ⅱ) methods has better predictive performance. In the radiomics analysis pipelines based on PET images, the prediction performance of the analysis pipelines established using Z-score algorithm (C) and embedding technology for random forest (e) is better.