Attention-Driven Multi-Class Abnormality Detection in Video Capsule Endoscopy (VCE) Using Enhanced InceptionResNetV2
The paper, titled "Attention-Driven Multi-Class Abnormality Detection in Video Capsule Endoscopy (VCE) Using Enhanced InceptionResNetV2," focuses on improving the detection of gastrointestinal abnormalities using video capsule endoscopy (VCE) images. The approach integrates a modified InceptionResNetV2 model with attention mechanisms, enhancing the model's ability to focus on key features within the data. The study targets ten classes of abnormalities, including bleeding, polyps, and ulcers, and aims to assist clinicians by automating the analysis of VCE images, thus reducing the manual burden of interpreting vast amounts of visual data.
Key aspects of the research include the use of data augmentation to address class imbalances and prevent overfitting, the strategic inclusion of attention layers to focus on critical image regions, and the application of dropout and batch normalization for stable training. The model achieves a notable validation accuracy of 92.2%, surpassing other approaches like Custom CNNs, ResNet50, and VGG16, which underscores the effectiveness of the attention-based approach in this context.
The paper's results demonstrate significant improvements in metrics such as balanced accuracy, AUC-ROC, and sensitivity, making the model a promising tool for clinical applications in gastrointestinal diagnostics. It offers a scalable solution for automating abnormality detection in VCE, aiding in early diagnosis and better patient management. Future work involves validating the model in clinical settings and expanding its capabilities to cover more gastrointestinal conditions.