Informacionnye Tehnologii, 2022, vol. 28, no. 2, pp. 92-102

Ðóññêèé

ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 2. Vol. 28. 2022

DOI: 10.17587/it.28.92-102

Ya. E. Lvovich, Professor, Voronezh Institute of High Technologies, Voronezh, Russian Federation, I. L. Kashirina, Professor, M. A. Firyulina, Postgraduate Student, Voronezh State University, Voronezh, Russian Federation

Using Machine Learning Methods to Predict Mortality after Myocardial Infarction

This article describes the results of a study on predicting the risk of mortality after myocardial infarction, and the use of various tools for their interpretation. Several methods for predicting mortality risk were used for the analysis: the Kaplan-Meier model, machine learning models — the Cox model, logistic regression, Catboost gradient boosting. An increase in the accuracy of the results was achieved using various data balancing methods: oversampling and undersampling. The most accurate was the gradient boosting model after applying the balancing undersampling strategy using the method of random duplication of examples. The accuracy (percentage of correct answers) of this model was 0.85, and the aggregated metric AUC ROC was 0.828. According to the results of the study, it was revealed that statistical methods highlight a greater number of significant features than the gradient boosting method. The most significant factors determining the patient's condition after the onset of myocardial infarction: the effect of severity according to the KILLIP scale, the patient's age, undergoing percutaneous coronary intervention, and the patient's history of arterial hypertension. With the help of modern tools for visualizing the results, it was possible to make the gradient boosting method clear and interpretable. The SHAP method was used to stratify the risks for each patient, showing which traits made the main contribution to obtaining the predicted mortality risk value, as well as assessing the impact of each trait on many possible predictions.
Keywords: survival analysis, gradient boosting, logistic regression, Cox model, Kaplan-Meier method, feature importance, data balancing, oversampling, undersampling, SHAP-method

P. 92–102

Acknowledgments: The reported study was funded by RFBR, project number 20-37-90029

To the contents