DOI: 10.17587/prin.15.578-588
Possibility of Using the Attention Mechanism in Multimodal Recognition of Cardiovascular Diseases
M. R. Bogdanov, Ph.D., Associate Professor, bogdanov_marat@mail.ru,
G. R. Shakhmametova, Ph.D., Professor, shakhgouzel@mail.ru,
Ufa University of Science and Technology, Ufa, 450005, Republic of Bashkortostan, Russian Federation,
N. N. Oskin, CEO, nonik2@mail.ru,
Siberian Telemetry Company, Penza, 440000, Penza Branch, Russian Federation'
Corresponding author: Marat R. Bogdanov, Ph.D., Associate Professor, Ufa University of Science and Technology, Ufa, 450005, Republic of Bashkortostan, Russian Federation, E-mail: bogdanov_marat@mail.ru
Received on June 29, 2024
Accepted on September 18, 2024
The paper is about studying the possibility of using the attention mechanism in diagnosing various cardiovascular diseases. Biomedical data were presented in different modalities (text, images, and time series). A comparison of the efficiency of 5 transformers based on the attention mechanism (Dosovitsky transformer, compact convolutional transformer, transformer with external attention, transformer based on tokenization with patch shift and local self-attention, transformer based on multiple deep attention) was carried out with the Exception convolutional neural network, three fully connected neural networks (MLP-Mixer, Fnet, and gMLP), and the YOLO architecture on the problem of multi-class classification (16 classes of dangerous arrhythmias). High efficiency of transformers in diagnosing cardiac diseases was shown. The transformer based on tokenization with patch shift and local self-attention showed the greatest efficiency.
Keywords: attention mechanism, medical diagnostics, electrocardiogram, convolutional neural networks, transformers, fully connected neural networks, recurrent neural networks
pp. 578—588
For citation:
Bogdanov M. R., Shakhmametova G. R., Oskin N. N. On the Possibility of Using the Attention Mechanism in Multimodal Recognition of Cardiovascular Diseases, Programmnaya Ingeneria, 2024, vol. 15, no. 11, pp. 578—588. DOI: 10.17587/prin.15.578-588. (in Russian).
References:
- Dosovitskiy A., Beyer L., Kolesnikov A. et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929v2. 3 Jun 2021.
- Iuliano A. D., Roguski K. M., Chang H. et al. Estimates of global seasonal influenza-associated respiratory mortality: a modelling study, Lancet, 2018, vol. 391, no. 10127, pp. 1285—1300. DOI: 10.1016/S0140-6736(17)33293-2.
- ML: Attention — Attention mechanism, available at: https://qudata.com/ml/ru/NN_Attention.html (date of access 01.09.2024) (in Russian).
- Gizlyuk D. Neural networks are simple (Part 8): Attention mechanisms, available at: https://www.mql5.com/ru/articles/8765 (date of access 01.09.2024) (in Russian).
- Bahdanau D., Cho K., Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473v7. 19 May 2016.
- Vaswani A., Shazeer N., Parmar N. et al. Attention Is All You Need. arXiv:1706.03762v7. 2 Aug 2023.
- Gizlyuk D. Neural Networks Are Simple (Part 10): Multi-Head Attention, available at: https://www.mql5.com/ru/articles/8909 (date of access 01.09.2024) (in Russian).
- Cameron R. How to Fine-Tune LLM with Supervised Fine-Tuning, available at: https://habr.com/ru/articles/830396/ (date of access 01.09.2024) (in Russian).
- Lee H., Phatale S., Mansoor H. et al. RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. 2024. arXiv:2309.00267v3. 3 Sep 2024.
- Bai Y., Jones A., Ndousse K. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862. 2022.
- Zhenga Y., Gana W., Chena Z. et al. Large Language Models for Medicine: A Survey. arXiv:2405.13055v1. 20 May 2024.
- Wu N., Green B., Ben X. et al. Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. arXiv:2001.08317v1. 23 Jan 2020.
- Wang H., Liu C., Xi N. et al. Huatuo: Tuning llama model with chinese medical knowledge, arXiv preprint arXiv:2304.06975 (2023).
- Zeng A., Liu X., Du Z. et al. GLM-130B: An open bilingual pre-trained model, arXiv preprint arXiv:2210.02414. 2022.
- Singhal K., Azizi S., Tu T. et al. Large language models encode clinical knowledge, arXiv preprint arXiv:2212.13138. 2022.
- Lin X., Xu C., Xiong Z. et al. PanGu drug model: Learn a molecule like a human, Science China Life Sciences, 2023, vol. 66, pp. 879—882.
- Fang X., Wang F., Liu L. et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model, Nature Machine Intelligence, 2023, vol. 5, pp. 1087—1096.
- Zhu M., Chen Z., Yuan Y. DSI-Net: Deep synergistic interaction network for joint classification and segmentation with endoscope images, IEEE Transactions on Medical Imaging, 2021, vol. 40, pp. 3315—3325.
- Venigalla A., Frankle J., Carbin M. PubMed GPT: A domain-specific large language model for biomedical text, 2022, available at: https://www.mosaicml.com/blog/introducing-pubmed-gpt (date of access 01.09.2024).
- Yunxiang L., Zihan L., Kai Z. et al. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge, arXiv preprint arXiv: 2303.14070. 2023.
- Liu F., Zhu T., Wu X. et al. A medical multimodal large language model for future pandemics, NPJ Digital Medicine, 2023, vol. 6, article 226. DOI: 10.1038/s41746-023-00952-2.
- Greenwald S., Albrecht P., Moody G. et al. Estimating confidence limits for arrhythmia detector performance, Computers in Cardiology, 1985, vol. 12, pp. 383—386.
- Manilo L., Nemirko A., Evdakova E., Tatarinova A. ECG Database for Evaluating the Efficiency of Recognizing Dangerous Arrhythmias, 2021 IEEE Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine (CSGB), 2021, pp. 120—123. DOI: 10.1109/CSGB53040.2021.9496029.
- Francois C. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv:1610.02357v3. 4 Apr 2017.
- Tolstikhin I., Houlsby N., Kolesnikov A. et al. MLP-Mixer: An all-MLP. Architecture for Vision. arXiv:2105.01601v4. 11 Jun 2021.
- Lee-Thorp J., Ainslie J., Eckstein I. et al. FNet: Mixing Tokens with Fourier Transforms. arXiv:2105.03824v4.26. May 2022.
- Liu H., Dai Z., So D. et al. Pay Attention to MLPs. arXiv:2105.08050v2. 1 Jun 2021.
- Ultralytics YOLOv8, available at: https://github.com/ultra-lytics/ultralytics (date of access 01.09.2024).
- Hassani A., Walton S., Shah N. et al. Escaping the Big Data Paradigm with Compact Transformers. arXiv:2104.05704v4. 7 Jun 2022.
- Guo M.-H., Liu Z.-N., Mu T.-J. et al. Beyond Selfattention: External Attention using Two Linear Layers for Visual Tasks. arXiv:2105.02358v2. 31 May 2021.
- Lee S., Lee S., Song B. Vision Transformer for Small-Size Datasets. arXiv:2112.13492v1. 27 Dec 2021.
- Ilse M., Tomczak J., Welling M. Attention-based Deep Multiple Instance Learning. arXiv:1802.04712v4. 28 Jun 2018.
- Dietterich T., Lathrop R., Lozano-Perez T. Solving the multiple instance problem with axis-parallel rectangles, Artificial intelligence, 1997, vol. 89, no. 1—2, pp. 31—71.
- Maron O., Lozano-Perez T. A framework for multiple-instance learning, NIPS, 1998, pp. 570—576.
- Oquab M., Bottou L., Laptev I. et al. Weakly supervised object recognition with convolutional neural networks, NIPS, 2014, available at: https://inria.hal.science/hal-01015140v1 (date of access 01.09.2024).
- Quellec G., Cazuguel G., Cochener B. et al. Multiple-instance learning for medical image and video analysis, IEEE Reviews in Biomedical Engineering, 2017, vol. 10, pp. 213—234. DOI: 10.1109/RBME.2017.2651164.
- Liu G., Wu J., Zhou Z. Key instance detection in multi-instance learning, Proceedings of the Asian Conference on Machine Learning, 2012, vol. 25, pp. 253—268.
- Xu K., Ba J., Kiros R. et al. Show, Attend and Tell: Neural image caption generation with visual attention, arXiv:1502.03044. 2015.
- Bahdanau D., Cho K., Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014.
- Astorino A., Fuduli A., Gaudioso M., Vocaturo E. Multiple Instance Learning algorithm for medical image, available at: https://ceur-ws.org/Vol-2400/paper-46.pdf (date of access 01.09.2024).
- Sirinukunwattana K., Raza S., Ahmed E. et al. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images, IEEE Transactions on Medical Imaging, 2016, vol. 35, no. 5, pp. 1196—1206. DOI: 10.1109/TMI.2016.2525803.
- Manchev N. ML internals: Synthetic Minority Oversampling (SMOTE) Technique, available at: https://domino.ai/blog/smote-oversampling-technique (date of access 01.09.2024).
- Wang Z., Yan W., Oates T. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. arXiv:1611.06455v4.14. Dec 2016.