基于FP-Growth算法构建乳腺癌放疗相关放射性皮炎预测模型的研究

Study on construction of a prediction model for radiodermatitis associated with breast cancer radiotherapy based on the FP-Growth algorithm

  • 摘要:
    目的 基于频繁模式增长(FP-Growth)算法构建乳腺癌放疗相关放射性皮炎(RD)的预测模型,并评估其对急性RD与晚期RD的预测效能,为临床 RD 风险预警提供参考。
    方法 回顾性分析2010年1月至2024年1月河北北方学院附属第一医院收治的1000例接受乳腺癌改良根治术后放疗的女性患者年龄(50.6±18.4)岁的临床资料。采用简单随机抽样法将患者分为建模组和验证组。采用FP-Growth算法对建模组患者基线资料进行关联规则分析,构建急性RD与晚期RD预测模型。采用一致性指数(CI)、校准曲线进行模型内部验证。采用受试者工作特征(ROC)曲线下面积(AUC)进行模型外部验证。计数资料的组间比较采用χ2检验,等级资料的组间比较采用Wilcoxon秩和检验。采用Delong检验比较建模组与验证组AUC的差异,采用Hosmer-Lemeshow检验评价ROC曲线的拟合优度。
    结果 建模组512例、验证组488例。RD预测模型最终筛选出2 条有效强关联规则。(1)急性RD的有效强关联规则组合:体重指数(≥35 kg/m2)、肿瘤大小(4~5 cm)、有化疗史及白蛋白水平(<35 g/L),急性RD发生率为79%;(2)晚期RD的有效强关联规则组合:年龄(60~69 岁)、卡氏功能状态评分(<70 分)、有化疗史、有糖尿病病史、白蛋白水平(<35 g/L),晚期RD发生率为69%。内部验证结果显示,RD预测模型预测建模组患者急性RD、晚期RD的CI分别为0.863(95%CI:0.646~0.932,P=0.011)、0.812(95%CI:0.669~0.892,P=0.023);校准曲线显示,该模型预测概率与实际概率的一致性较好。外部验证结果显示,模型预测建模组、验证组急性RD的AUC分别为0.882、0.876,差异无统计学意义(Z=0.334,P=0.205);预测晚期RD的AUC分别为0.673、0.668,差异无统计学意义(Z=0.982,P=0.092)。Hosmer-Lemeshow检验结果显示,急性RD、晚期RD的预测模型拟合均良好(χ2=4.921、5.039,P=0.125、0.327)。
    结论 基于FP-Growth算法构建的RD预测模型对急性RD和晚期RD的预测效能均能够满足临床要求,可为乳腺癌放疗相关RD的风险预警与临床干预提供参考。

     

    Abstract:
    Objective To construct a prediction model for radiodermatitis (RD) associated with breast cancer radiotherapy based on frequent pattern growth (FP-Growth) algorithm, and to evaluate its prediction efficiency for acute and advanced RD, providing a reference for clinical RD risk warning.
    Methods A retrospective analysis was conducted on the clinical data of 1000 female patients (age (50.6±18.4) years) who received radiotherapy after modified radical mastectomy for breast cancer from January 2010 to January 2024 at the First Affiliated Hospital of Hebei North University. The patients were divided into modeling and validation groups using simple random sampling method. The FP-Growth algorithm was used to perform the association rule analysis on the baseline data of the patients in the modeling group, and a prediction model for acute and advanced RD was established. The model was internally validated using the consistency index (CI) and calibration curve. Meanwhile, external validation was realized using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. Comparison between the groups of count data and rank data was conducted using the chi square test and the Wilcoxon rank sum test, respectively. The Delong test was also used to compare differences in AUC between the modeling and validation groups, and the Hosmer-Lemeshow test was used to evaluate the goodness-of-fit of the ROC curves.
    Results The modeling and validation groups comprised 512 and 488 patients, respectively. The RD prediction model ultimately selected two effective strong association rules. (1) The combination of effective strong association rules for acute RD: body mass index (≥35 kg/m2), tumor size (4–5 cm), history of chemotherapy, and albumin level (<35 g/L). The incidence of acute RD was 79%. (2) The combination of effective strong association rules for advanced RD: age (60–69 years), Karnofsky Performance Status score (<70 points), history of chemotherapy, history of diabetes, and albumin level (<35 g/L). The incidence of advanced RD was 69%. Internal validation results showed that the CI of the RD prediction model for acute RD and advanced RD in the modeling group was 0.863 (95%CI: 0.646–0.932, P=0.011) and 0.812 (95%CI: 0.669–0.892, P=0.023), respectively. The calibration curve indicated good consistency between the predicted probability and the actual probability of the model. External validation results revealed that the AUC for acute RD prediction in the modeling and validation groups was 0.882 and 0.876, respectively, revealing no statistically significant difference (Z=0.334, P=0.205). AUC for advanced RD prediction in the modeling and validation groups was 0.673 and 0.668, respectively, revealing no statistically significant difference (Z=0.982, P=0.092). The Hosmer-Lemeshow test results showed that the prediction models for acute and advanced RD were well fitted (χ2=4.921, 5.039; P=0.125, 0.327).
    Conclusion The efficiency of the RD prediction model for acute and advanced RD based on the FP-Growth algorithm can meet clinical requirements, serving as a reference for risk warning and clinical intervention for RD associated with breast cancer radiotherapy.

     

/

返回文章
返回