Machine learning to improve prognosis prediction of metastatic clear-cell renal cell carcinoma treated with cytoreductive nephrectomy and systemic therapy

Cytoreductive nephrectomy (CN) combined with systemic therapy is commonly used to treat metastatic clear-cell renal cell carcinoma (mccRCC). However, prognostic models for these patients are limited. In the present study, the clinical data of 782 mccRCC patients who received both CN and systemic therapy were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010–2016), and patients were divided into training and internal test cohorts. A total of 144 patients who met the same criteria from our center (Peking Union Medical College Hospital) were placed in the external test cohort. The cancer-specific survival rate (CSS) at 1, 3, and 5 years was set as the research outcome. Then, four ML models, i.e., a gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and logistic regression (LR), were established. Fifteen potential independent features were included in this study. Model performance was evaluated using the area under the receiver operating characteristic curves (AUC), calibration plots, and decision curve analysis (DCA). Seven clinical features, namely, pathological grade, T stage, N stage, number of metastatic sites, brain or liver metastases, and metastasectomy, were selected for subsequent analysis via the recursive feature elimination (RFE) algorithm. In conclusion, the GBM model performed best at 1-, 3- and 5-year CSS prediction (0.836, 0.819, and 0.808, respectively, in the internal test cohort and 0.819, 0.805, and 0.786, respectively, in the external cohort). Furthermore, we divided the patients into three strata (high-, intermediate-, and low-risk) via X-tile analysis and concluded that clinically individualized treatment can be aided by these practical prognostic models.


Introduction
Renal cell carcinoma (RCC) is one of the most common genitourinary cancers, with an increasing incidence and morbidity rate worldwide. Clear-cell RCC (ccRCC) remains the most prevalent histological subtype of renal cancer (accounting for 80%-85%). Although an ever-growing number of renal cancer patients can be detected at an early stage, 25%-30% of patients have metastasized at the time of diagnosis, and over 20% of patients will develop metastases after curative surgery, which contributes to a poor prognosis in patients [1][2][3].
Standard treatment strategies for metastatic ccRCC (mccRCC) have progressed substantially over the past decades. The majority of studies have shown that cytoreductive nephrectomy (CN), also termed radical nephrectomy for primary lesions, provides a significant survival benefit for patients with mccRCC. Theoretically, removing the primary lesion can effectively reduce the tumor burden and create favorable conditions for subsequent systemic therapy [4][5][6][7]. In an era where cytokine therapy has been the only systemic therapy option for patients with mccRCC, a combined treatment regimen of CN and interferon-based adjunctive therapy has shown to be more effective than interferon therapy alone [8,9]. In the targeted therapy age, several rigorous randomized clinical trials have also demonstrated the promising effectiveness of CN combined with systemic therapy in patients with mccRCC. Owing to the spectacular advantages in relieving local symptoms, such as chronic pain and hematuria, a substantial number of mccRCC patients have received both CN and systemic therapy in the real world [10,11].
Building a prognostic prediction model is an effective method for identifying the patients who will benefit the most from these treatments. However, a model for predicting the survival of mccRCC patients treated with CN and systemic therapy is still lacking.
As an important subfield of artificial intelligence, machine learning (ML) algorithms involve multiple disciplines and demonstrate advancement compared with traditional tools. Mounting evidence has revealed that ML models can provide a more accurate prognosis prediction by comprehensively integrating and analyzing the complex connections between clinical features and outcomes [12][13][14]. For example, one of the most prevalent ML algorithms, the gradient boosting machine (GBM), has shown excellent performance in terms of speed and accuracy in both classification and regression models [15]. Nevertheless, the benefit of an ML model in predicting the prognosis of patients with mccRCC receiving both CN and systemic therapy has yet to be fully explored.
In this study, we developed several clinical ML models to predict cancer-specific survival (CSS) for patients in this cohort according to data available from the Surveillance, Epidemiology and End Results (SEER) database. Although the specific drug regimens were not available in the SEER database, considering the widespread use of targeted drugs since 2005, systemic therapy in this study mainly refers to angiogenesis therapies and mammalian rapamycin (mTOR) therapies [16]. Furthermore, we used an external test cohort from the Peking Union Medical College Hospital (PUMCH) to test the validity of the models we developed.

Study population
Data concerning mccRCC patients treated with CN combined with systemic therapy were retrospectively extracted from two sources: (I) the SEER database between 2010 and 2016 (https:// seer.cancer.gov/) and (II) the PUMCH medical records between 2008 and 2018. The inclusion criteria were as follows: (I) age ≥18 years; (II) confirmation of ccRCC by histology; (III) distant metastases based on the 8th American Joint Committee on Cancer staging systems; (IV) CN treatment; and (V) a history of receiving systemic therapy.
The exclusion criteria were as follows: (I) incomplete information, including unknown age, sex, race, laterality, pathological grade, T stage, N stage, tumor size, metastatic organ sites (bone, brain, liver, and lung), number of metastatic sites, metastasectomy, or radiotherapy; (II) diagnosis of malignant tumors other than ccRCC. The study population selection process is illustrated in Figure 1.
Research involving human participants was reviewed and approved by Peking Union Medical College's Ethical Committee and Institutional Review Board. All patients have signed an informed consent declaration.
Outcome and data collection Considering the relatively high mortality rates of mccRCC patients, we selected CSS as the primary endpoint in this study to avoid discrepancies in deaths. CSS is the interval between the date of treatment and the date of death caused by the tumor. Deaths caused by any factors unrelated to cancer or intervention were identified as non-cancer-specific and censored at the date of death. Patients in the external test (PUMCH) cohort underwent regular physical examinations, laboratory tests, urological ultrasound scans, bone scintigraphy, enhanced computerized tomography (CT) or magnetic resonance imaging every six months. The follow-up was terminated on April 30, 2021.

ML model establishment and performance evaluation
We employed a recursive feature elimination (RFE) algorithm to select important features for model building. In short, RFE starts with a model that covers all features and gradually removes the features that have the least impact on model performance until the retained features exceed a set performance threshold. The feature subset with the highest accuracy is then selected as the optimal feature combination. To determine the optimal hyperparameters, all the models were trained using 10-fold cross-validation in the training cohort. The data were split into ten parts: one part was assigned to the validation cohort and the other nine parts were used for training. The cross-validation process was repeated ten times, with each part validated once, and the average accuracy of the ten validations gave the final accuracy. Then, four ML algorithms, i.e., the GBM, support vector machine (SVM), random forest (RF), and logistic regression (LR) algorithms, were employed to construct ML models. Receiver operating characteristic (ROC) curves were employed to estimate the ML model accuracy by calculating the area under the curve (AUC). The ROC curve is calculated for all possible cut points (thresholds) and shows the correlation between sensitivity and specificity, thus providing a dynamic and objective response to the model's performance. Higher AUC values indicated better accuracy of the predictive model. The model fit was evaluated using calibration plots. The decision curve analysis (DCA) method was used to visualize the net benefits and usefulness of the prediction models.

Statistical analysis
To estimate the differences between groups, categorical variables were expressed as numbers and percentages, and comparisons were made using Chi-square tests or Fisher's exact tests. The patients were divided into three risk groups: low, intermediate, and high, using the X-tile software (version 3.6.1). All statistical analyses in this study were performed using R software. Statistical significance was defined as P < 0.05.

Baseline characteristics
This study examined clinical data from 782 patients with mccRCC treated with CN combined with systemic therapy from the SEER database. Overall, 70% (n = 550) of the patients were randomly divided into the training cohorts, while the remaining 30% (n = 232) were assigned to the internal test cohort. The median follow-up was 25 (17-31) months in the SEER database. A total of 144 patients were included in the external test (PUMCH) cohort, and the median follow-up for this cohort was 37 (24-52) months. In the SEER cohort, 567 patients (72.5%) had single organ metastases (lung, bone, brain, or liver), 184 (23.5%) had double organ metastases, and 31 (4.0%) had three or more organ metastases. In the PUMCH cohort, 110 patients (76.4%) had single organ metastases, 30 (20.8%) had double organ metastases, and 4 (2.8%) had three or more organ metastases. The lungs were the most common site of metastases at a total of 589 SEER cases (75.3%) and 119 PUMCH cases (82.6%). This was followed by 283 SEER (36.2%) and 43 PUMCH cases (29.9%) of bone metastases, 83 SEER (10.6%) and 13 PUMCH (9.0%) cases of liver metastases, and 74 SEER (9.5%) and 8 PUMCH (5.5%) cases of brain metastases. In addition, 154 SEER (19.7%) and 35 PUMCH (24.3%) patients underwent metastasectomy. Other characteristics of the clinical population and demographics are summarized in Table 1.

Feature selection
Incorporating redundant features may degrade the performance of an ML model [17]. The RFE algorithm was employed as a feature selection method to identify the optimal feature subset among all features. After RFE screening, seven important features, namely, pathological grade, T stage, N stage, the number of metastatic sites, brain or liver metastases, and metastasectomy, were determined. These features were then included in all our ML models in both the training and testing cohorts ( Figure S1).

ML models accurately predicted patient prognosis
We established prognosis prediction models for mccRCC patients treated with CN and systemic therapy using four ML algorithms (GBM, SVM, RF, and LR). To evaluate the discriminatory abilities of these models, ROC curves for 1-, 3-, and 5-year CSS were constructed. In the training cohort, the AUC values of these ML models for the prediction of 1- Clinical value of the ML models DCA is a novel method for visualizing whether the use of a prediction model in clinical practice will benefit decision-making. The percentage of threshold probability is displayed on the X-axis, and the net benefit is indicated on the Y-axis [18,19]. In our study, we hypothesized that patients whose predicted probability exceeds a set threshold would benefit from CN combined with systemic therapy. DCA indicated that all ML models achieved a net benefit. The DCA of the GBM has higher net benefits in the majority of the cohort subgroups, indicating that it had better clinical outcome values (Figure 4). Finally, the varying importance of the features for predicting CSS in each ML model is shown in Figure S2.

Risk stratification
As GBM was the optimal ML model based on the performance evaluation above, we set two optimal cut-off values (−5.4 and −4.8) depending on the GBM prediction score and divided patients into high-, intermediate-, and low-risk groups via X-tile analysis ( Figure S3). In the training cohort, the long-term survival ( stratifications also showed that the GBM model provided excellent prognostic stratification ( Figure 5A-5C).

Discussion
At present, positive associations between cytoreductive resection of primary tumors and superior overall survival rates have been demonstrated in a variety of solid metastatic tumors, such as advanced-stage endometrial cancer, cohesive gastric cancer, and ovarian cancer [20][21][22]. As for mccRCC, a significant survival benefit associated with CN and systemic therapy was found in a study by Chakiryan et al., who constructed an analysis containing 5005 mccRCC patients using the National Cancer  [23][24][25]. However, given the potential surgical complications and toxic effects of systemic therapy, a corresponding prognosis prediction model needs to be developed to improve patient selection and outcome prediction. Several prognostic prediction models have been used for mccRCC patients, such as the International Metastatic Renal Cell Carcinoma Database Consortium model and the Memorial Sloan-Kettering Cancer Center model. However, models that specialize in predicting the prognosis of mccRCC patients receiving both CN and systemic therapy, especially ML models, have yet to be developed [26,27].
Therefore, we aimed to develop a practical survival prediction model to accurately predict the individualized survival of mccRCC patients using ML algorithms. In this study, we established four ML models to predict the prognosis of these patients. Among them, GBM was the best model in terms of accuracy, fitness, and clinical application. To our knowledge, this is the first study to apply ML algorithms to predict mccRCC patient survival in such a large patient cohort.
One of the most significant advantages of ML models over traditional predictive models is their ability to analyze feature importance and provide optimal feature subsets without requiring manual processing, resulting in models with high accuracy and stability. To date, multiple ML models have been developed and validated for predicting the prognosis of ccRCC patients based on medical imaging, gene expression data, or clinical information. For instance, Nazari et al. constructed a radiomics-based predictor using several ML algorithms to analyze CT images, which could accurately predict the 5-year survival of ccRCC patients [28]. However, a specific ML model for predicting the prognosis of patients with mccRCC remains undeveloped.
In our study, the clinical data of over 900 mccRCC patients treated with CN and systemic therapy were included. We then constructed four ML models (GBM, SVM, RF, and LR) for 1-, 3-, and 5-year CSS prediction. Model performance was evaluated using ROC curves, calibration plots, and DCA. In the training, internal validation, and external validation cohorts, the GBM model demonstrated the highest level of prediction accuracy and more favorable correlations. Furthermore, according to the DCA results, the GBM model can effectively assess the advantages and disadvantages of clinical decisions. In clinical research, GBM models are gaining increasing traction. We are the first to apply the GBM model to mccRCC survival prediction.
Selecting the most effective features from the original variables to reduce the dimensionality of the datasets is a key  step in improving the performance of an ML model [17]. Using RFE methods, seven features, including pathological grade, T stage, N stage, number of metastatic sites, brain or liver metastases, and metastasectomy, were selected, and the optimal feature subset was chosen for further analysis. In the next step, the relative importance of each input feature was ranked using ML models. Despite the slight differences in feature importance ranking, histological grade, tumor stage, lymph node stage, number of metastatic sites, and metastasectomy were ranked in the top five in all ML models. These results revealed that mccRCC patients with lower histological grade, earlier tumor stage, no indication of lymph node metastases, fewer metastatic sites, and who received metastasectomy may have better prognoses after treatment with CN and systemic therapy.
Previous studies have also indicated that radiotherapy is significantly associated with better prognosis in patients with mccRCC [29,30]. However, in this study, radiotherapy was not selected by the RFE algorithm for the optimal feature subset. Further studies on the influence of radiotherapy on the prognosis of patients with mccRCC treated with CN and systemic therapy are needed.
There have been dramatic changes in the treatment of mccRCC over the past few decades. The role of CN has become increasingly unclear with the advent of new treatment options. According to the results of the CARMENA trial and SURTIME trials, in comparison with the CN, sunitinib alone did not show any inferiority. However, all of the above studies were deemed to be underpowered, limited by insufficient study subjects, slow to accrue and lack of homogeneity in patients selection. Based on these limitations, it is important to select patients appropriately and publish prospective studies that contain high levels of evidence [31].
After the era of cytokine and targeted therapies, mccRCC treatment has gradually entered the era of immunotherapy. Nirmish et al. compared the prognosis of mccRCC patients treated with immune checkpoint inhibitors (ICIs) alone or CN combined with ICIs. The result showed that CN plus immunotherapy had a longer OS than immunotherapy alone based on the NCDB datasets [32]. Thus, despite the uncertainty of efficacy, CN remains an important treatment option, especially for alleviating hematuria or pain. However, optimal candidates for CN need to be carefully screened. Recent perspectives supported that CN could be performed in a patient with a kidney that is in place and a disease of favorable or intermediate risk [33].
In this study, the GBM model identified approximately 15% of patients with mccRCC in the high-risk group who experienced an extremely poor 5-year survival rate after receiving CN and systemic therapy. Conversely, more than half of the mccRCC patients were classified as low-risk patients. The low-risk subset displayed relatively satisfactory long-term CSS, indicating that CN combined with systemic therapy is particularly suitable for this population. With the above risk stratification, overtreatment can be largely avoided.
Present study has several limitations: First, because the training set data used for building models were from the SEER database, the treatment regimen and information on systemic therapy were not available. The latest studies have reported that deferred CN may be more beneficial for patients who respond favorably to systemic therapy compared with upfront CN [6,34,35]. Second, the SEER database lacks data on some vital clinical characteristics, such as basic diseases, surgical complications, and biochemical indicators, which may influence the accuracy of the model prediction. Third, there is inter-group heterogeneity in the external validation cohort compared with the data from the SEER database, which may be related to the relatively small amount of data in the external validation cohort. Therefore, a multi-center study must be conducted to further validate the model performance.
Despite mentioned limitations, these ML models, built on a large population database and validated with data from external groups, provide the first targeted and practical survival prediction tools for patients with mccRCC receiving both CN and systemic therapy, which have a high potential for use in clinical practice.

Conclusion
We developed and validated four ML models based on significant clinicopathological characteristics for predicting CSS in patients with mccRCC treated with CN and systemic therapy. These models will not only be used in CSS prediction, patient risk stratification, and clinical decision making but also encourage further research on the use of ML algorithms to improve personalized prognostic prediction.

Conflicts of interest:
The authors declare no conflicts of interest.