Development and validation of a SEER-based prognostic nomogram for cervical cancer patients below the age of 45 years

In this study, we established a nomogram for the prognostic prediction of patients with early-onset cervical cancer (EOCC) for both overall survival (OS) and cancer-specific survival (CSS). The Surveillance, Epidemiology, and End Results (SEER) database was used to identify 10,079 patients diagnosed with EOCC between 2004 and 2015; these cases were then randomly divided into training and validation sets. The independent prognostic factors were identified in a retrospective study of 7,055 patients from the training set. A prognostic nomogram was developed using R software according to the results of multivariable Cox regression analysis. Furthermore, the model was externally validated using the data from the remaining 3,024 patients diagnosed at different times and enrolled in the SEER database. For the training set, the C-indexes for OS and CSS prediction were determined to be 0.831 (95 % confidence interval [CI]: 0.815–0.847) and 0.855 (95 % CI: 0.839–0.871), respectively. Receiver operating characteristic (ROC) analysis has revealed that the nomograms were a superior predictor compared with TNM stage and SEER stage. The areas under the curve (AUC) of the nomogram for OS and CSS prediction in the ROC analysis were 0.855 (95 % CI: 0.847–0.864) and 0.782 (95 % CI: 0.760–0.804), respectively. In addition, calibration curves indicated a perfect agreement between the nomogram-predicted and the actual 1-, 3-, and 5-year OS and CSS rates in the validation cohort. Thus, in this study, we established and validated a prognostic nomogram that provides an accurate prediction for 3-, 5-, and 10-year OS and CSS of EOCC patients. This will be useful for clinicians in guiding counseling and clinical trial design for cervical cancer patients.


INTRODUCTION
Cervical cancer remains the fourth most common female malignancy in the world [1,2], ranking second as the leading cause of cancer-related deaths in young women aged 20 to 39 years in 2020 [1]. The global incidence of cervical cancer is approximately 500,000 cases annually [3], and the number of new cases and deaths in China accounts for more than onefourth that of the entire world [4]. At present, surgery, radiotherapy, chemotherapy, and immunotherapy serve as the primary treatments for this type of cancer [5]. Clinically, patients younger than 45 years old are defined as having early-onset cervical cancer. Although significant progress has been made in surgery, radiotherapy, and chemotherapy for treating cervical cancer, there remain significant differences in clinical prognosis, especially with elderly patients. Therefore, accurate prognostic markers and improved individualized treatment are needed.
At present, the tumor lymph node metastasis (TNM) staging system proposed by the American Joint Commission on Cancer (AJCC) has been widely used to predict the prognosis of various cancers. This system considers tumor invasion (T), regional lymph node (N), and distant metastasis (M) as predictors [6]. However, prognosis based on the TNM staging system remains limited and does not accurately predict prognosis. To establish an individualized treatment plan, it is necessary to consider all the risk factors related to cancer, especially for the treatment of EOCC patients.
In recent years, nomograms based on the regression coefficient of each variable integrate multiple prognostic factors and may better predict survival rate [7]. It has been used to predict the prognosis of various cancers such as gastric cancer [8] and breast cancer [9]. As a prognostic tool, nomograms can accurately predict the overall survival rate (OS) and cancer-specific survival rate (CSS) of patients, which is based on multiple clinical variables included in the calculation. In this study, we established nomograms to predict the 3-, 5-, and 10-year OS and CSS of EOCC patients, which may be deemed useful for establishing individualized treatments and improving patient outcome. www.bjbms.org unknown), radiotherapy (no or yes), and chemotherapy (no or yes). Tumor grades I-IV were categorized as well differentiated, moderately differentiated, poorly differentiated, and undifferentiated. OS time refers to the survival time of the patient from diagnosis to any cause of death or the date at which data were deleted. The CSS time refers to the cancer-related survival time from diagnosis to death, excluding other factors. The study end point was survival (OS and CSS).

Ethical statement
Since the clinical data in this study were collected from a publicly available database, there were no local or state ethical issues. In addition, because this retrospective study was based on public data from the SEER database, informed consent was not required.

Statistical analysis
Kaplan-Meier curve and logrank tests were used to examine the OS and CSS of the EOCC patients. The objective was to identify the predictive clinical factors for OS and CSS in patients with EOCC by univariate and multivariate regression analyses. The Cox proportional hazard results were used as the basis of nomogram construction and validation. R software version 3.5.1 (http://www.R-project.org) was used for creating nomograms. A consistency index (C-index) and calibration curve were used to evaluate the performance and accuracy of the nomogram. The C-index value ranged from 0.50 to 1.00 and was positively correlated with the prediction performance of the model. This shows that the model is accompanied with a perfect discrimination ability when the value is 1.00. When the calibration curve is applied to a fully calibrated model, the prediction will fall on the 45° diagonal in the figure.
In addition, receiver operating characteristic (ROC) and curves were used to evaluate the predictive performance of nanograms, TNM stage and SEER stage. Statistical analyses were conducted using Statistical Package for the Social Sciences software (version 20.0; SPSS Inc, Chicago, IL, USA). When p < 0.05, the results were statistically significant.
In addition, receiver operating characteristic (ROC) curves were used to evaluate the predictive performance of nomograms, TNM stage, and SEER stage. Statistical analyses were conducted using the Statistical Package for the Social Sciences software (version 20.0; SPSS Inc., Chicago, IL, USA). P-values < 0.05 were considered statistically significant.

Patient baseline characteristics
In total, 10,079 eligible patients, who were diagnosed as having EOCC from 2004 to 2015, were identified in the SEER

Data source and patients
From 2004 to 2015, we adopted SEER * stat software [version 8.3.5; SEER 18 Regs Custom Data (including additional treatment fields), November 2018 sub (1975-2016 varying) database] in order to identify 10,079 eligible patients who were diagnosed with EOCC from the SEER database of the National Cancer Institute, which includes clinicopathological and individualized prognosis data. The exclusion criteria were as follows: (I) patients over 45 years old; (II) patients with multiple primary tumors; (III) unknown survival time; (IV) non-histological studies; (V) unknown AJCC stage; (VI) unknown TNM stage; and (VII) patients with no surgery. The screening scheme for the subjects is provided in Figure 1. All eligible EOCC patients were randomly assigned to a training and validation set.

Study variables
Clinical variables extracted from the SEER database included age at diagnosis, race, marital status, histological type of origin, primary tumor site, histologic type, tumor academic stage, NM stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy. The EOCC patients were then divided into three groups based on age (<37, 37-40, and >40; Fig. S1) according to the optimal cut-off value calculated by X-tile software (version 3.6.1, Yale University School of Medicine, USA). The clinical characteristics included race (white, black, and others) and marital status (married, unmarried, unknown). The tumor variables included the histological type (squamous cell carcinoma, adenocarcinoma, and others), SEER stage (localized, regional, and distant), tumor size (cm) (≤4 cm, >4 cm, and www.bjbms.org on the "Total points" scale, thus providing each patient with 3-, 5-, and 10-year OS and CSS probabilities.

Validation and calibration of the nomogram for OS and CSS
The time-dependent ROC curves for OS and CSS were used to evaluate the prediction performance of the nomogram in different sets. An AUC value of 0.5 indicates that the nomogram has no predictive effect, whereas an AUC value of 1 indicates that the nomogram can completely distinguish patients with different survival rates. The higher the value between 0.5 and 1, the stronger the resolution of the nomogram. The area under the curve (AUC) values of the nomogram for OS ( Figure 3A) and CSS ( Figure 3B Next, the C-index was then used to verify the nomogram. Significant differences were noted in OS and CSS among the nomogram, TNM stage, and SEER stage (Table 4). In the training set, the C-index for OS predicted by the nomogram was 0.831 (95 % CI: 0.815-0.847), whereas the C-index for CSS was 0.855 (95 % CI: 0.839-0.871), which was higher compared with that of the TNM and SEER stages (P < 0.05). The same conclusion was drawn from the results of the validation dataset. The C-index for OS predicted by the nomogram was 0.832 (95 % CI: 0.807-0.857) and that of the CSS was 0.863 (95 % CI: 0.839-0.887) ( Table 4). We have also generated a calibration curve to compare the nomogram with a perfect curve. As per the results, the 3-, 5-, and 10-year OS (Figs. 4 A,B,C) and CSS (Figs.4 D,E,F) nomograms for the training set exhibited good consistency with the actual observation, and this consistency was also evident in the validation set (Fig. S3). The calibration curves were very close to a perfect curve. The above results indicate that the predicted values of the nomogram were in good agreement with the observed values in the training and verification sets.

DISCUSSION
Cervical cancer remains to be the fourth most common malignant tumor among women threatens the lives of most women across the globe [11]. In practice, there is a need to database; these cases were then randomly divided into a training set (n = 7,055) and a validation set (n = 3,024

Identification of independent prognostic factors of OS and CSS in training set
Univariate and multivariate Cox regression analyses were performed to identify the independent prognostic factors for OS and CSS. For the univariate analysis, these included age, race, marital status, histological type, grade, AJCC stage, T stage, M stage, N stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy as prognostic factors for OS and CSS. Meanwhile, for multivariate Cox analysis, four variables (age, marital status, N stage, and M stage) were excluded from the independent prognostic factors for OS ( Table 2). The multivariate analysis also indicated that race, histological type, grade, AJCC stage, T stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy were independent prognostic factors affecting the CSS of EOCC patients ( Table 3).

Development of a prognostic nomogram for OS and CSS
The prognostic nomograms were based on the multivariate Cox regression results. The prognostic nomogram for 3-, 5-, and 10-year OS and CSS (Figs. 2A and 2B) consisted of the following independent prognostic factors: race, histological type, grade, AJCC stage, T stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy. The length of the line corresponding to each variable in the nomogram represents the contribution of the predictors to survival outcome.
Each subtype of the variables that made up the nomogram corresponds to a point on the "Points" scale. We then calculated the total score of a specific EOCC patient by adding the scores of each subtype corresponding to each variable. Then, a straight line was drawn from the position of these total scores www.bjbms.org improve survival rates, accurately predict EOCC patient prognosis, and formulaworldwide [1,2], and its age of onset tends to be younger [10]. In addition, the mortality rate for cervical cancer ranks first among female malignant tumors, being a major disease that seriously te individualized treatment plans. Our goal is thus to develop a robust system to comprehensively consider multiple prognostic factors to accurately predict the survival time of EOCC patients.

www.bjbms.org
A nomogram is a graphical representation of a multivariable prognostic model that integrates multiple prognostic factors and can be used to accurately evaluate individual probabilities of survival at certain times. This present study focused on prognosis prediction for EOCC patients based on the construction of a nomogram. First, univariate    The ROC curve, DCA curve, and C-index were used to verify the clinical practicability and predictive performance of the nomogram, which revealed its superiority compared with TNM stage and SEER stage. At the same time, the prediction accuracy for OS and CSS at 3-, 5-, and 10-years was evaluated using a calibration curve, which was in good agreement with the actual observational results. In practice, the area under the curve (AUC) values of the nomogram for OS ( Figure 3A (Table 4). This confirms the good predictive ability of the nomogram. The results of the DCA curve also provided proof for the robust clinical value of the nomogram.
As a prognostic tool to graphically display clinical results, a nomogram can accurately predict the overall survival (OS) and cancer-specific survival (CSS) rates of patients, which is attributed to the multiple clinical variables included in its calculation. Recently, nomograms consisting of various clinical variables have been used to predict the prognosis of different CC patients [12][13][14][15]. Wang et al. [14] analyzed cervical cancer patient data recorded in the SEER database and established a nomogram for postoperative survival prediction. It provided patients with resected CC with an accurate individualized prediction of OS, thus assisting clinicians in decision-making. Similarly, Xie et al. [15] developed a nomogram for predicting the prognosis of cervical cancer patients aged 65 years or older and comprehensively analyzed the independent prognostic factors including race, marriage, histological types, www.bjbms.org grade, FIGO, regional lymph node, surgery, radiotherapy, and chemotherapy.
In this present study, clinical variables including race, histological type, grade, AJCC stage, T stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy were independent risk factors affecting the prognosis of EOCC patients. Other studies reported that age and race are risk factors for the prognosis of various cancers [12,13,14]. Genetic differences among different races are also a significant risk factor for tumor prognosis that has been widely recognized [14,15].
The grade, tumor size, and histological type of the tumor also significantly influence patient prognosis [16,17]. In this study, these results were supported by a statistical analysis. At present, TNM stage is the most common tumor staging system in the world. It is determined by laboratory and postoperative pathological examination. However, TNM stage has limitations and does not provide an individualized prognosis prediction for a patient. In addition to TNM stage, the prognosis of patients is closely related to a variety of clinical variables. Accurate prediction depends on the consideration of all independent risk factors. We thus have successfully established an effective nomogram based on the following factors: race, histological type, grade, AJCC stage, T stage, SEER stage, tumor size (cm), chemotherapy, and radiotherapy. It proved to be a better predictive algorithm compared with TNM stage and SEER stage. The establishment of this nomogram will be useful for designing individualized treatments for EOCC patients.
Our study had some limitations. First, the SEER database did not contain detailed information regarding chemotherapy, such as the use of targeted drugs, which are important in the prognosis of CC. Because of lacking information on living environment, lifestyle, adjuvant therapy, and commodities, it was not possible to consider all prognostic factors comprehensively, which was also an intrinsic limitation of SEER research. Secondly, we did not conduct an external validation to further assess this nomogram.

CONCLUSION
Our study is the first to construct a precise nomogram for predicting the 3-, 5-, and 10-year OS and CSS for EOCC patients, which exhibited a better predictive performance compared with TNM and SEER stages. This model may assist clinicians in designing personalized treatments for EOCC patients.