Early machine learning prediction of hospitalized patients at low risk of respiratory deterioration or mortality in community-acquired pneumonia: Derivation and validation of a multivariable model

Current prognostic tools for pneumonia predominantly focus on mortality, often neglecting other crucial outcomes such as the need for advanced respiratory support. The objective of this study was to develop and validate a tool that predicts the early risk of non-occurrence of respiratory deterioration or mortality. We conducted a single-center, retrospective cohort study involving hospitalized adult patients with community-acquired pneumonia (CAP) and acute hypoxic respiratory failure from January 2009 to December 2019 (n ═ 4379). We employed the gradient boosting machine (GBM) learning to create a model that estimates the likelihood of patients requiring advanced respiratory support (high-flow nasal cannula [HFNC], non-invasive mechanical ventilation [NIMV], and invasive mechanical ventilation [IMV]) or mortality during hospitalization. This model utilized readily available data, including demographic, physiologic, and laboratory data, sourced from electronic health records and obtained within the first 6 h of admission. Out of the cohort, 890 patients (25.2%) either required advanced respiratory support or died during their hospital stay. Our predictive model displayed superior discrimination and higher sensitivity (cross-validation C-statistic ═ 0.71; specificity ═ 0.56; sensitivity ═ 0.72) compared to the pneumonia severity index (PSI) (C-statistic ═ 0.65; specificity ═ 0.91; sensitivity ═ 0.24; P value < 0.001), while maintaining a negative predictive value (NPV) of approximately 0.85. These data demonstrate that our machine-learning model predicted the non-occurrence of respiratory deterioration or mortality among hospitalized CAP patients more accurately than the PSI. The enhanced sensitivity of this model holds the potential for reliably excluding low-risk patients from pneumonia clinical trials.


Introduction
Pneumonia remains a common cause of acute hypoxemic respiratory failure that requires hospitalization, with significant morbidity and mortality when the intensive care unit (ICU) transfer is delayed [1].Current prognostic and risk stratification tools for community-acquired pneumonia (CAP) primarily focus on mortality prediction, aiming to inform on illness severity and the initial site of care.However, there is limited evidence regarding the disease-specific prediction of deterioration during a patient's hospital stay [2].The pneumonia severity index (PSI) identifies patients with low risk of mortality more accurately than other simple prognostic tools, such as the confusion, urea, respiratory rate, blood pressure, and 65 years of age or older (CURB-65) score, the confusion, respiratory rate, blood pressure, and 65 years of age or older (CRB-65) score, and the age, dehydration, respiratory failure, orientation disturbance, and low blood pressure (A-DROP) score.Therefore, it is effective and safe in guiding the initial site of care (whether outpatient or inpatient) with broad generalizability and reproducibility [3][4][5][6][7][8].Prediction of mortality however does not provide accurate identification of patients who would benefit from intensified management strategies once they are hospitalized [9].This subset of high-risk patients has been defined as having severe CAP when ICU admission is the sole clinical surrogate [10].Other prognostic scoring tools, including the 2001 American Thoracic Society (ATS), 2007 Infectious Disease Society of America (IDSA)/ATS, and the systolic blood pressure, multilobar chest radiography, albumin level, respiratory rate, tachycardia, confusion, oxygen level, and pH level (SMART-COP) score, have performed better than the PSI in predicting ICU admission.However, these tools either directly or indirectly consider criteria that inherently reflect critical disease, disregarding the trajectory of the need for advanced respiratory support such as high-flow nasal cannula (HFNC) or non-invasive mechanical ventilation (NIMV), whether inside or outside the ICU setting [10][11][12][13].The objective of this study was to develop a tool using machine learning methods for the early risk prediction of non-occurrence of respiratory deterioration or mortality in hospitalized CAP patients.Such a tool could be useful for prognostic enrichment in clinical trials of CAP interventions by excluding low-risk patients [14].

Source of data and participants
A retrospective cohort from a single center was analyzed, comprising hospitalized adult patients (aged ≥ 18 years) with CAP from January 2009 to December 2019.The cohort was used to develop the model for respiratory deterioration, using routinely and readily available information from electronic health records.This encompassed demographic data, clinical features, and laboratory data obtained within the initial 6-h post-admission.Patients who denied the utilization of their medical records for research purposes were excluded (10%).The design and reporting of this observational study adhered to the guidelines specified by the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD).
CAP was defined as an acute infection of the lung parenchyma that is associated with clinical symptoms (cough, fever, pleuritic chest pain, and dyspnea) and a new radiographic infiltrate, not acquired in the hospital or healthcare setting, identified by the International Classification of Diseases (ICD) 9 (481-486) and 10 (J13, J15, and J18) codes and note search.The exclusion criteria were similar to other studies [15] and they included: lack of research authorization, prior hospitalization within the 15 days leading up to the admission, aspiration pneumonia, hospital/ventilator-acquired pneumonia (if diagnosed after 48-h post-admission), interstitial lung disease, leukopenia, neutropenia, acquired immunodeficiency syndrome (AIDS), and human immunodeficiency virus (HIV) infection.

Outcomes
The primary outcome encompassed a combination of the requirement for advanced respiratory support (which includes the use of HFNC, NIMV, and invasive mechanical ventilation [IMV]) or mortality during hospitalization.Given that this study focused on in-hospital decompensation, we excluded patients who were not hospitalized and those without a need for supplemental oxygen.The secondary outcomes of interest were hospital mortality, the need for IMV, and the need for NIMV.
For these specified outcomes, patients who have already met that status within the first 6 h of admission were excluded from further analysis.

Predictors
Predictor variables included age, sex, race, height, weight, blood pressure, heart rate, respiratory rate, temperature, medical comorbidities (such as congestive heart failure, chronic obstructive pulmonary disease [COPD], asthma, liver disease, neoplastic disease, and renal disease), laboratory data (white blood cell count with differentials [neutrophils, eosinophils, lymphocytes], bicarbonate, sodium, blood urea nitrogen [BUN], and blood gases), and clinical scores (PSI, Sequential Organ Failure Assessment [SOFA] score, and Acute Physiology and Chronic Health Evaluation [APACHE] III score), obtained within initial 6 h of admission.For predictors measured repeatedly or longitudinally within these 6 h, only the first observation was used.For analysis purposes, race was grouped as White or other/unknown.

Sample size
All eligible patients (n = 4379) meeting the criteria were included (Figure 1).For our primary outcome, which was the need for advanced respiratory support or mortality, 3528 patients had not reached that status at 6-h post-admission.
To develop a predictive model for the need of advanced respiratory support or mortality, we established that a sample size of n = 3528 would be expected to produce a model with a mean absolute prediction error of 0.028 in the predicted outcome probabilities.This was based on an outcome rate of 25.2% and the inclusion of 32 predictor variables in the model [16].Consequently, our sample was expected to produce a model with predicted values that would exhibit a small mean error when applied to new individuals.

Ethical statement
The Mayo Clinic Institutional Review Board (IRB) approved this study prior to its initiation (IRB number: 17-011140, modification approval date: June 2021, Title: Concordant versus discordant corticosteroid use with markers of inflammation in critically ill patients with pneumonia and ARDS).Informed consent was waived, and all procedures conformed to the ethical standards set by the Mayo Clinic IRB and the Helsinki Declaration of 1975.

Statistical analysis
Patient demographics, physiological parameters, and clinical and laboratory data are presented using the median (IQR) for continuous variables and frequency (percentage) for categorical variables.
To test the hypothesis that a combination of patient characteristics will accurately predict the need for advanced respiratory support or mortality, stochastic gradient boosting machine (GBM) learning was employed.A 5-fold repeated cross-validation (ten repeats) was used with a grid-search approach to select tuning parameters [17]: shrinkage, interaction depth, minimum number of observations in the terminal nodes, bag fraction, and number of trees.A threshold level for classification to advanced respiratory support or mortality was selected to maximize sensitivity to a negative predictive value (NPV) of at least 0.85.This approach was chosen over using Youden's Index or other ad hoc methods because the anticipated successful model would aim to screen out those at low risk (of needing advanced respiratory support or death) as a prognostic enrichment strategy for enrollment in clinical trials.Primary metrics for model development and validation included the area under the receiver operating characteristic curve (C-statistic).Metrics for threshold classification included sensitivity, specificity, positive predictive value (PPV), and NPV.
GBM models were also evaluated for secondary outcomes, which were in-hospital mortality, and the need for IMV and NIMV, using the same methods applied for the primary outcome.Those prediction models were developed using the subset of patients who were event-free at the prediction time (within 6 h of admission).
Our model was additionally tuned to exclude variables with less influence for the primary and secondary outcome (Model 1: all variables, Model 2: parsimonious model).DeLong's test was used to compare the area under the curve (AUC) scores for our parsimonious models against PSI and CURB-65 for the outcomes of advanced respiratory support or mortality, and solely for mortality.
Any missing predictors were treated as a distinct and possibly informative segment of the data, reflecting actual practices where such omissions are common.This allows the resulting GBM prediction model to handle missing (unmeasured) inputs and still produce the predicted probability of an event.There were no missing data for the primary and secondary outcomes or for the following predictors: age, gender, race, comorbidities, altered mental status, PSI, or CURB-65.Data management and analysis were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and R version 4.1.2(RStudio Team 2021, Boston, MA, USA).We used the R packages "gbm" and "caret" for model training.

Results
Demographics and clinical characteristics are described in Table S1.Patient outcomes within the cohort are detailed in Table 1.Of the cohort, a total of 890 patients (25.2%) needed advanced respiratory support or died in the hospital.
Variables with the highest importance in the final model were respiratory rate, weight, BUN, and systolic blood pressure (Figure 2).
The CURB-65 prediction of in-hospital mortality was similar to our GBM model, with a cross-validation AUC score of 0.695 (accuracy rate = 96.5%,95% CI 95.9%-97.0%;bicarbonate, respiratory rate, and systolic blood pressure (Figure 3 and Table S3).

Discussion
Although the PSI is effective and safe in guiding the initial site of care (whether inpatient or outpatient treatment), it poorly predicts which hospitalized patients might require intensified management [2,9].Other prognostic tools, such as the IDSA/ATS guidelines, SMART-COP, and early warning scores, have shown better performance in predicting ICU admissions based on clinical endpoints of IMV and/or vasopressor support [10,[11][12][13].However, these tools focus on a critical late stage of the disease course, where the endpoint of ICU admission identifies only a specific subgroup of high-risk patients.Moreover, ICU admission decisions are prone to multiple biases, including limited resources, advance directives, and hospital policies.
Importantly, these prognostic tools do not consider the need for advanced respiratory support methods, such as HFNC and NIMV, which are increasingly being used in both the ICU and non-ICU settings.
Therefore, it is important to explore clinical endpoints other than ICU admission and mortality when aiming for early prognostication of hospitalized patients with CAP.
Predicting the need or the lack of need for advanced respiratory support (HFNC/NIMV/IMV), early in the disease's course can provide valuable insights.Such predictions, unlike the clinical endpoints of HFNC/NIMV failure described in studies like those involving the ratio of oxygen saturation/FiO 2 to respiratory rate (ROX) index and the heart rate, acidosis, consciousness, oxygenation, and respiratory rate (HACOR) score [19][20][21][22], could potentially inform important research enrichment strategies.Such prediction tools could facilitate prognostic enrichment in clinical trials by excluding low-risk patients who are unlikely to benefit from an intervention.
In our large single-center cohort study of patients with CAP, we found that early prediction (within the first 6 h of admission) of hospitalized patients at low risk of respiratory deterioration or mortality was better using a machine-learning model compared to the PSI and the CURB-65.The PSI's higher specificity would classify more patients as low risk, while distinguishing poorly those at high risk due to its lower sensitivity.Nevertheless, with the same NPV, our model's higher sensitivity, compared to the PSI, assures a reduction in misclassification of high-risk patients as low-risk, thereby facilitating the absolute exclusion of those at low risk.While these findings may have limited clinical relevance, an R shiny application for the model is in development for use as a prognostic enrichment tool.This will help exclude low-risk patients in time-sensitive pneumonia clinical trials.
In our study, the machine-learning models for secondary outcomes showed a fair discriminatory performance when compared to the primary outcome.The model's prediction of in-hospital mortality was not statistically different compared to the PSI in this cohort, and the AUC was consistent with prior studies [23][24][25].Interestingly, the most important variables in the model included the bicarbonate and the absolute lymphocyte count, both of which are readily available but not included in the PSI.When comparing our findings with a recent machine-learning model developed to predict 30-day mortality in CAP patients, our model showed a lesser discriminatory performance compared to the causal probabilistic network (CPN) (AUC = 0.80) [26].The CPN model utilized data collected within the first 24 h of admission, in contrast to our model which utilized data within the first 6 h of admission.For similar reasons, the model's prediction of the need for IMV was acceptable but weaker when compared to the SMART-COP (AUC = 0.87) [13].The model's prediction of the need for NIMV also showed acceptable discriminatory capacity.To our understanding, no other study has reported similar findings utilizing data obtained within the first 6 h of admission to predict the need for NIMV during hospitalization.
The use of continuous variables rather than dichotomous variables, which can sometimes oversimplify variable interpretation, and the application of a 5-fold cross-validation are notable strengths of our study.However, several limitations also need to be highlighted including the potential bias existing within the dataset, inherent to its single-center nature.As a result, the findings presented in this paper are limited to the characteristics as seen in a large academic referral center.In this study, we excluded patients with mild ambulatory diseases and those who did not require oxygen in the first 6 h of admission.This exclusion was related to our specific cohort of interest.Additionally, the routinely measured clinical variables within the first 6 h of admission, which were less likely to be missing, could be insufficient for optimal model discrimination while unmeasured parameters, including but not limited to treatment interactions, genetic predisposition, and pathogen characteristics unknown at the time of admission, may have an important role in the risk of respiratory deterioration or mortality.Lastly, our study did not account for the potential occurrence of a second, independent pneumonia event or deaths unrelated to pneumonia.
Additional research is needed to evaluate the prediction of pneumonia-specific clinical endpoints in CAP, beyond just mortality, intubation needs, and ICU admissions, in order to better identify patients who are more or less likely to deteriorate shortly after being admitted.

Conclusion
Our findings demonstrate that a machine-learning model more accurately predicted the absence of respiratory deterioration or mortality among hospitalized CAP patients compared to the PSI.The model's higher sensitivity could help in effectively excluding low-risk patients from pneumonia clinical trials.

Figure 2 .Figure 3 .
Figure 2. Relative importance of the top 15 variables in the advanced respiratory support or mortality model.Resp: Respiratory; BUN: Blood urea nitrogen; BP: Blood pressure; NLR: Neutrophil to lymphocyte ratio.

FigureFigure
Figure S1.ROC plot comparing the validation model with the PSI model for advanced respiratory support or mortality.ROC: Receiver operating characteristic; PSI: Pneumonia severity index; AUC: Area under the curve.

Table 1 .
Patient outcomes of interest

Table 2 .
Prediction of advanced respiratory support or mortality AUC: Area under the curve; NPV: Negative predictive value; PPV: Positive predictive value; PSI: Pneumonia severity index; CURB-65: Confusion, urea, respiratory rate, blood pressure, and 65 years of age or older score.

Table S2 .
Variables used in the predictive models for advanced respiratory support or mortality Odeyemi et al.Machine learning predicts low-risk pneumonia outcomes 344 www.biomolbiomed.com

Table S3 .
Variables used in the predictive models for mortality PSI: Pneumonia severity index; COPD: Chronic obstructive pulmonary disease; BUN: Blood urea nitrogen; WBC: White blood cells; NLR: Neutrophil to lymphocyte ratio.