Incorporating ultrasound-based lymph node staging significantly improves the performance of a clinical nomogram for predicting preoperative axillary lymph node metastasis in breast cancer

Models for predicting axillary lymph node metastasis (ALNM) in breast cancer patients are lacking. We aimed to develop an efficient model to accurately predict ALNM. Three hundred fifty-five breast cancer patients were recruited and randomly divided into the training and validation sets. Univariate and multivariate logistic regressions were applied to identify predictors of ALNM. We developed nomograms based on these variables to predict ALNM. The performance of the nomograms was tested using the receiver operating characteristic curve and calibration curve, and a decision curve analysis was performed to assess the clinical utility of the prediction models. The nomograms that included clinical N stage (cN), pathological grade (pathGrade), and hemoglobin accurately predicted ALNM in the training and validation sets (area under the curve [AUC] 0.80 and 0.80, respectively). We then explored the importance of the cN and pathGrade signatures used in the integrated model and developed new nomograms by removing the two variables. The results suggested that the combine-pathGrade nomogram also accurately predicted ALNM in the training and validation sets (AUC 0.78 and 0.78, respectively), but the combine-cN nomogram did not (AUC 0.64 and 0.60, in the training and validation sets, respectively). We described a cN-based ALNM prediction model in breast cancer patients, presenting a novel efficient clinical decision nomogram for predicting ALNM.


Introduction
Breast cancer is one of the most commonly diagnosed cancers worldwide, with over 2 million new cases in 2020 based on the GLOBOCAN [1]. Axillary lymph node (ALN) status is a meaningful indicator for clinical staging in patients with breast cancer, and it is also one of the most crucial prognostic factors, thus influencing clinical decision making [2,3]. Axillary lymph node dissection (ALND) is the gold standard to evaluate axillary lymph node metastasis (ALNM). However, ALND is an invasive procedure that might cause operative complications [4,5]. Sentinel lymph node biopsy (SLNB) is the current standard method for ALN staging, which determines whether or not the doctor should perform ALND and guides the surgeon's decision for subsequent treatment [6,7]. Unfortunately, both ALND and SLNB are invasive methods and may lead to some unacceptable complications, which would greatly reduce the quality of life of patients [8,9]. Moreover, a long wait for SLNB results during surgery can unavoidably prolong the operation time and reduce efficiency. Therefore, there is an urgent need for a noninvasive and efficient diagnostic tool for preoperatively estimating ALNM.
Traditional noninvasive methods to confirm the ALN status are mainly preoperative imaging examinations, such as ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI). However, these approaches may lead to some ALNM patients being missed due to low sensitivity [10,11]. Previous studies have identified some risk factors for ALNM in breast cancer, but individual assessment for patients is lacking [12,13]. With the advent of artificial intelligence, precision medicine for breast cancer has entered a bright era and new noninvasive methods have emerged to evaluate ALNM [14]. The machine learning-based approach used for evaluating the diagnosis of ALNM in breast cancer patients shows certain advantages in personalized prognostics [15]. Although several machine learning-based approaches have been used for predicting ALN status in patients, they provided little value for clinical application [16]. In addition, some imaging procedures, such as MRI and CT, are too expensive so that some patients cannot afford them. Therefore, these methods are not suitable for all patients. Hence, there is an urgent need for an accurate, efficient, clinically applicable, and extensive diagnostic method to estimate preoperative ALNM.
In the current study, we aimed to build and validate a noninvasive predictive model to accurately predict ALNM in breast cancer patients. We collected patients' demographics and laboratory tests and extracted clinical N stage (cN) signature, which was assessed by the Breast Surgery multidisciplinary team. Then, we used clinical factors and cN to generate nomograms for preoperative prediction of ALN status in breast cancer patients to optimize decision making for personalized cancer treatment.

Study design
Based on the inclusion and exclusion criteria, the entire cohort ended up including 355 patients. We established and validated nomograms based on patients' preoperative clinical characteristics and pathological features to predict ALNM in breast cancer patients. Figure 1 shows the workflow of our study.

Study population
A total of 397 consecutive breast cancer patients treated at the Department of Breast Surgery at Xiangya Hospital (Changsha city, Hunan Province, China) from January 2011 to December 2012 were retrospectively screened, of whom 355 met the inclusion criteria and were included. The inclusion criteria were: 1) Female adults (age >18 years); 2) Histologically first confirmed breast cancer; 3) Patients who underwent ALND or SLNB to determine ALN status; 4) Patients with the absence of any distant metastasis at initial diagnosis; 5) Patients with no history of breast surgery or irradiation; 6) Patients with no other concomitant malignancy. Forty-two patients were excluded due to incomplete medical records, and a total of 355 patients were recruited in the end. The outcome was the ALN status at the surgery.
The included patients were randomly divided into two sets in a ratio of 7:3, with 249 patients in the training set and 106 patients in the validation set. The training set was used to filter the significant variables and develop nomogram, and the validation set was used to test the results obtained from the training set.

Clinicopathologic data collection
In this study, all patients enrolled underwent palpation and breast ultrasound at diagnosis and were diagnosed with breast cancer by pathologic biopsy. Information regarding ALN status by SLNB or ALND was extracted from the records.
Data collection and analysis were conducted from December 2021 to June 2022. The privacy information of patients was protected during the research process. We did not collect information related to personal privacy, and patient identity was used only for sample coding. During the data analysis, we did not have access to patient privacy. We extracted the demographics data and laboratory parameters at diagnosis. Among these, demographics included sex, age, weight, height, and body mass index (BMI). BMI was calculated using a formula that divides weight by height into squares citation. Laboratory parameters at diagnosis included routine blood examination, electrolytes, liver function, and coagulation indices. We also collected data on TNM classification, hormone receptor status and human epidermal growth factor receptor 2 receptor (HER2) status, Ki-67, clinical and pathological staging, and tumor pathological type. Ultrasound evaluation of ALN status was performed by experienced breast radiologists, and the assessment of normal/abnormal was at the discretion of the evaluating radiologist based on ultrasound diagnostic criteria [17,18].
TNM stage of patients was accessed by Breast Surgery multidisciplinary team according to American Joint Committee on Cancer criteria, then the clinical T stage and N stages were extracted from the patient's preoperative TNM stage. All patients accepted SLNB or ALND. The status of the patient's ALN was assessed according to previous reported criteria [19]. We used these variables to identify the potential independent risk factors for ALNM.

Nomogram development
To develop a nomogram, we used a three-step approach. Univariate logistic regression was applied to identify the patient's signature in the training set that was associated with ALNM. Second, we included the significant variables identified by univariate logistic regression with P < 0.05 in a multivariate logistic regression analysis to determine which factors were independent predictors of ALNM [20]. Next, the independent factors of ALNM were used to establish the nomograms for predicting ALNM, and we investigated the predictive ability of each model with and without cN or pathological grade (pathGrade).

Ethical statement
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of Xiangya Hospital of Central South University (No. 202112189). Since this study was a retrospective study, the Ethics Committee waived the need to obtain informed consent from the patients.

Statistical analysis
We used R software to randomly divide the cohort into a training set and a validation set in the ratio of 7:3, and the chi-squared test was performed to check the differences in the categorical variables between the two groups. We used the receiver operating characteristic (ROC) curve to evaluate discriminative ability and calibration plots to evaluate calibrating ability of the nomograms. Additionally, decision curve analysis (DCA) was used to quantify the clinical benefit of the cN-based nomograms. P < 0.05 was considered statistically significant.

Patient characteristics
A flowchart of the study is shown in Figure 1. A total of 355 breast cancer patients meeting the requirements from January 2011 to December 2012 were enrolled in this study. Tables 1  and 2 show the clinical characteristics of the whole set (355 patients), the training set (249 patients) and the validation set (106 patients). According to the result of ALND, ALNM was confirmed in 157 (44.23%) patients, 110 (44.18%) in the training set and in 47 (44.34) in the validation set. According to American Joint Committee on Cancer TNM staging system, clinical stages I, II, and III accounted for 11.3%, 67.0%, and 21.7%, respectively, in the entire set; 20.48%, 60.64%, and 18.87%, respectively, in the training set, and 13.20%, 66.98%, and 17.92%, respectively, in the validation set.

Signature screening and nomogram construction
The training set contained 249 patients, of whom 110 (44.18%) patients had ALNM. Univariate and multivariate logistic regression analyses were used to identify the independent risk factors for ALNM in breast cancer patients (Tables 3 and S1). We found that hemoglobin (odds ratio [OR] 1.03, 95% confidence interval [CI] 1.00-1.06), cN (OR 5.32, 95% CI 3.05-9.26), and pathGrade (OR 1.93, 95% CI 1.13-3.30) were independent predictors of ALNM (Table 3). Then, we constructed a nomogram using all these independent predictors for ALNM whose P < 0.05 ( Figure 2A). Finally, we explored the importance of the cN and pathGrade included in the integrated model and developed a new nomogram by removing the pathGrade variable ( Figure 2B).

Nomogram validation
We performed ROC analysis on the two sets using different models. Both models highlighted satisfactory accuracy in predicting the probability of ALNM. The area under the curve (AUC) of the ROC curves ( Figure 3A) showed valuable discriminative ability for predicting ALNM in the combined model in the training set (AUC 0.80, 95% CI 0.74-0.86) and in   predicting ALNM. The calibration curve ( Figure 3B) showed that the predicted and observed probability of ALNM were in good agreement. Meanwhile, the DCA exhibited great clinical benefits for predicting ALNM. When the threshold probability was > 15% in the training set, using the combine model and combine-pathGrade model added more benefits than the treat-all-patients scheme or the treat-none scheme in predicting ALNM in breast cancer patients ( Figure 4). Actually, the combine and combine-pathGrade models showed better results than the combine-cN model in predicting ALNM. Therefore, our nomograms show a good ability in forecasting the probability of ALNM.

Discussion
The global incidence of breast cancer is increasing rapidly, and it is of great clinical significance to study the diagnostic prediction of breast cancer [21]. ALN status of breast cancer patients affects their prognosis and also affects doctors' decision on treatment options. The misjudgment of the ALNM may lead to inappropriate treatment of the patients [22]. At the time of the initial diagnosis of breast cancer, ALNM predicts poor future treatment outcomes in breast cancer patients. Studies have reported a worse prognosis in ALN-positive patients than in ALN-negative patients, so an accurate assessment of the status of ALN in breast cancer patients before treatment can optimize treatment strategies and improve outcomes [23,24]. Therefore, it is of great interest to clarify the ALN status of breast cancer patients at initial diagnosis.
Most previous research on ALNM in breast cancer patients has only focused on single independent risk factors for ALNM, such as tumor size and grade [25,26]. Recently, a number of researchers have also developed multivariate models to predict ALNM based on patient clinical information [25,27]. The greatest advantage we have over these studies is that our model is noninvasive and convenient. There have also been attempts to investigate the relationship between the tumor immune microenvironment and ALNM. A previous study used the tumor-infiltrating lymphocytes signature to predict ALNM status in breast cancer patients, but in their study, few HER2positive breast cancer and triple negative breast cancer patients were included. In addition, they only focused on T1 breast cancer [28]. Some researchers investigated the use of MRI radiomic signature to develop a noninvasive preoperative model for predicting ALNM [15,29]. Liu et al. [30] used the MRI radiomic signature to predict ALN status, however, the sample size of this single-center study was too small. In the present study, we enrolled a larger cohort including 355 breast cancer patients to develop nomograms. Our approach can be extended to a variety of clinical and experimental applications. Yao et al. used MRI radiomics signature to develop and validate a model for predicting ALNM and disease-free survival in patients with early-stage breast cancer. Their study showed that the predictive model combining radiological and clinical information for predicting ALNM was better than a model using either one alone [15]. However, in this multicenter retrospective study, there was heterogeneity in the magnetic resonance versions. Similarly, another retrospective study demonstrated that preoperative internal enhancement on dynamic contrast-enhanced MRI might help predict sentinel lymph node metastasis in patients with invasive breast cancer [29]. But the indicator was subjective because it was set by the radiologists, and as little is known about the reproducibility of measurements, the reproducibility of this method is uncertain. Unfortunately, the cost of MRI is relatively high, and the availability of MRI units in primary hospitals remains poor. Compared with those studies using MRI radiomic signature to predict ALNM status, our nomograms are much easier and simpler to perform, and the data needed in this model is easier and more convenient to be obtained.
Unlike previous studies that predicted ALNM, our study cohort included clinical lymph node staging of patients and other information to predict ALN status, resulting in a more accurate preoperative assessment of ALN status [30,31]. In this study, we developed and validated models that can simply and accurately predict ALNM status based on patient clinical information at a baseline level. Furthermore, we found that among these models, cN signature is a crucial factor for predicting ALNM in patients with breast cancer. Our models displayed an excellent ability to predict ALNM with good AUCs in the combine validation set (AUC 0.80, 95% CI 0.71-0.88) and the combine-pathGrade validation set (AUC 0.78, 95% CI 0.73-0.84). In addition, we constructed different nomograms to distinguish which signature plays the dominant role in the predictive model. When we removed the cN variable, the discriminative ability for predicting ALNM of the ROC curve decreased significantly, with the AUCs of 0.64 (95% CI 0.57-0.71) in the training set and 0.60 (95% CI 0.49-0.71) in the validation set. However, when we removed the pathGrade variable, the combine-pathGrade model still showed great discriminative ability for predicting ALNM. Our results suggested that cN played a dominant role in predicting ALNM, and nomograms, including cN, showed good abilities in predicting ALNM in breast cancer patients. In line with our study, previous studies showed that ALN condition based on breast ultrasound detection was a predictor of lymph node load [32]. However, in another research, the accuracy of the model using axillary ultrasound was not high, with an AUC of 0.585-0.719 [33]. Compared to these previous studies, our nomograms showed good predictive abilities. A previous study demonstrated that tumor lesion boundary, tumor size, and tumor quadrant locations were the most important factors affecting ALNM in cT1-2N0M0 stage breast cancer, and we would like to focus on these risk factors in our further study [34]. Besides, previous studies have focused on the predictive role of imaging features on ALNM, but little focus has been given to laboratory test indicators. In the present study, our results suggest that hemoglobin is a risk factor for ALNM, which has not been previously reported in studies, suggesting that an understanding of the laboratory parameters of patients helps predict ALN status. In the follow-up study, we plan to explore whether biochemical indicators, such as blood lipids and blood glucose, can predict the status of lymph node metastases in breast cancer patients.
Admittedly, this study has several limitations. First, our models were built based on data collected from a single center, and because this is a retrospective study, its clinical applicability may be reduced. Multicenter evidence will be needed to validate the models in the future before they can be put into clinical use. Second, we did not conduct subgroup analyses due to the small sample size. The possibility of ALNM may differ among breast cancer patients with different molecular subtyping. In future study, we will assess the ability of the predictive models to predict ALNM in each molecular subtype. Additionally, the cN relies heavily on the ultrasound results and the expertise of the doctors, which is subjective and little is known about the reproducibility of those methods. Fourth, due to medical limitations at the time of the patient's diagnosis approximately 10 years ago and the lack of genetic testing data, we focused only on the potential impact of clinical and pathological features on ALNM and did not consider the genetic features. For future prospective studies, researchers may combine transcriptome and gene mutation data for predicting ALNM. Finally, we have validated the effects of our nomograms by using the validation set which may overestimate the value of our model, and prospective external validation is lacking.

Conclusion
This study described a cN-based prediction model for ALNM in breast cancer patients, presenting a novel personalized clinical decision nomogram that can be used to predict ALNM status. The integrated nomogram is valuable for determining preoperative ALNM. When removing the pathGrade signature based on puncture pathology results from the integrated model, the nomogram did not reduce the predictive accuracy of the nomogram. But removing cN signature significantly reduced the predictive ability of the prediction model. Our results suggest that cN staging based on preoperative ultrasound is valuable for determining preoperative ALNM. The cN-based nomograms are useful clinical tools for predicting ALNM and can provide a preoperative prediction.