基于SEER数据库构建用于<sup>18</sup>F-FDG PET/CT适宜患者筛选的甲状腺癌远处转移预测模型

谭昆; 奚海林; 刘岳鹏; 王秀力; 李睿

doi:10.3760/cma.j.cn121381-202502029-00582

基于SEER数据库构建用于¹⁸F-FDG PET/CT适宜患者筛选的甲状腺癌远处转移预测模型

Development of a prediction model for distant metastasis in thyroid cancer to screen suitable patients for ¹⁸F-FDG PET/CT based on the SEER database

摘要

摘要:
目的基于美国国家癌症研究所监测、流行病学和结果（SEER）数据库，构建用于 ¹⁸F-氟脱氧葡萄糖（FDG）PET/CT 适宜患者筛选的甲状腺癌远处转移预测模型，为临床精准选择检查患者提供量化工具。
方法该研究为回顾性病例-对照研究。下载SEER数据库中2000—2019年的90934例甲状腺癌患者的数据，其中女性69716例、男性21218例，年龄范围1~84岁，中位年龄50岁。采用分层随机抽样法，将 SEER 数据库来源的甲状腺癌患者数据按 3∶1 的比例分为训练集和验证集，在训练集中分别采用逻辑回归算法和堆叠聚合算法构建甲状腺癌远处转移预测模型（即逻辑回归模型和堆叠聚合模型）；以受试者工作特征曲线下面积（AUROC）、精确率-召回率曲线下面积（AUPRC）、校准曲线及Brier评分为核心评估指标，在SEER来源的训练集和验证集中完成模型的内部验证，同时选取癌症基因组图谱（TCGA）数据库的235例甲状腺癌患者数据及徐州市中心医院2022—2024年甲状腺外科住院的205例甲状腺癌患者数据（即本地数据）进行外部测试；采用DeLong检验比较逻辑回归模型与堆叠聚合模型AUROC之间的差异；采用决策曲线分析（DCA）确定模型临床应用的候选决策阈值范围；应用R4.3.2语言的DALEX包中的置换特征算法对模型进行全局性解释，评估各个变量对预测结果的贡献度；应用R4.3.2语言的Shiny 包开发网页应用程序（APP）。
结果逻辑回归模型和堆叠聚合模型的效能评估结果显示，在训练集中，2种模型的AUROC分别为0.894、0.907（Z=0.163，P=0.103），AUPRC分别为0.153、0.190，Brier评分分别为0.011、0.013；在验证集中，2种模型的AUROC分别为0.881、0.883（Z=0.147，P=0.251），AUPRC分别为0.155、0.153，Brier评分均为0.010；在TCGA数据外部测试中，2种模型的AUROC分别为0.838、0.833（Z=0.749，P=0.520），AUPRC分别为0.103、0.113，Brier评分分别为 0.019、0.020；在本地数据外部测试中，2种模型的AUROC分别为0.866、0.907（Z=2.513，P=0.012），AUPRC 分别为 0.372、0.356，Brier 评分分别为 0.026、0.029。DCA结果显示，模型的候选决策阈值范围为0.001~0.240；置换特征算法分析结果表明，美国癌症联合委员会T分期是2种模型中对预测结果影响最大的变量。该研究成功构建了多功能的网页APP。
结论基于SEER数据库构建了用于¹⁸F-FDG PET/CT适宜患者筛选的甲状腺癌远处转移预测模型，经内部验证与外部测试均证实了模型的效能符合临床要求，有望提高 ¹⁸F-FDG PET/CT适宜患者的筛选效率，为甲状腺癌患者的个体化诊疗提供支持。

Abstract:
Objective A prediction model for distant metastasis in thyroid cancer was developed utilizing data from the Surveillance, Epidemiology, and End Results (SEER) database of the U.S. National Cancer Institute to serve as a quantitative tool to assist clinicians in identifying suitable patients for ¹⁸F-fluorodeoxyglucose (FDG) PET/CT precisely.
Methods This investigation was conducted as a retrospective case-control study. Data were downloaded from 90 934 patients diagnosed with thyroid cancer (comprising 69 716 females and 21 218 males, aged 1–84 years, with a median age of 50 years) from the SEER database, covering the period from 2000 to 2019. Employing stratified random sampling, the patient data derived from SEER database were divided into a training set and a validation set at a ratio of 3∶1. Two prediction models for distant metastasis in thyroid cancer, namely, the Logistic regression model and the stacking ensemble model, were developed using the Logistic regression algorithm and stacking ensemble algorithm, respectively, in the training set. Internal validation was performed using the SEER-based training and validation sets, with primary evaluation metrics including the area under receiver operating characteristic curve (AUROC), the area under precision-recall curve (AUPRC), calibration curve, and Brier score. For external testing, data from 235 patients with thyroid cancer obtained from The Cancer Genome Atlas (TCGA) database, along with data from 205 patients with thyroid cancer admitted to the Department of Thyroid Surgery at Xuzhou Central Hospital between 2022 and 2024 (i.e., local data), were utilized. The DeLong test was employed to compare the differences in AUROC between the Logistic regression model and the stacking ensemble model. Decision curve analysis (DCA) was conducted to determine the candidate decision threshold range for the clinical application of the models. The permutation feature algorithm within the DALEX package of R4.3.2 was applied for the global interpretation of models and to evaluate the contribution of each variable to the prediction outcomes. Furthermore, a web application (APP) was developed using the Shiny package in R4.3.2.
Results The efficacy evaluation results of the Logistic regression model and the stacking ensemble model demonstrated that, in the training set, the AUROCs of the two models were 0.894 and 0.907, respectively (Z=0.163, P=0.103). The AUPRCs were 0.153 and 0.190, and the Brier scores were 0.011 and 0.013. In the validation set, the AUROCs of the two models were 0.881 and 0.883 (Z=0.147, P=0.251), with AUPRCs of 0.155 and 0.153 and Brier scores of 0.010 for both models. External testing utilizing TCGA data yielded AUROCs of 0.838 and 0.833 (Z=0.749, P=0.520), AUPRCs of 0.103 and 0.113, and Brier scores of 0.019 and 0.020. External testing with local data produced AUROCs of 0.866 and 0.907 (Z=2.513, P=0.012), AUPRCs of 0.372 and 0.356, and Brier scores of 0.026 and 0.029. DCA result indicated that the models' candidate decision threshold ranged from 0.001 to 0.240. The permutation feature algorithm analysis revealed that the American Joint Committee on Cancer T stage was the variable with the most significant influence on prediction results in both models. This research successfully developed a multifunctional, web-based APP.
Conclusions A prediction model for distant metastasis in thyroid cancer, designed to identify suitable patients for ¹⁸F-FDG PET/CT, was developed utilizing the SEER database. Internal validation and external testing verified that the model's efficacy adhered to clinical standards. The model is anticipated to improve screening efficiency for appropriate patients undergoing ¹⁸F-FDG PET/CT and facilitate personalized diagnosis and treatment of patients with thyroid cancer.

HTML全文

参考文献(19)

施引文献

资源附件(0)

基于SEER数据库构建用于18F-FDG PET/CT适宜患者筛选的甲状腺癌远处转移预测模型

Development of a prediction model for distant metastasis in thyroid cancer to screen suitable patients for 18F-FDG PET/CT based on the SEER database

基于SEER数据库构建用于¹⁸F-FDG PET/CT适宜患者筛选的甲状腺癌远处转移预测模型

Development of a prediction model for distant metastasis in thyroid cancer to screen suitable patients for ¹⁸F-FDG PET/CT based on the SEER database