figshare
Browse

Predicting Stroke Risk Dataset

Download (314.53 kB)
Version 4 2025-04-10, 05:09
Version 3 2025-04-06, 05:48
Version 2 2025-04-06, 05:45
Version 1 2025-03-26, 10:39
dataset
posted on 2025-04-10, 05:09 authored by Khaled Mohamad AlmustafaKhaled Mohamad Almustafa

This study addresses stroke as a critical global health issue by employing a comprehensive, data-driven approach to improve early risk prediction and intervention. Utilizing a dataset of 5,110 records, the research combines statistical analysis, machine learning (ML) classification, clustering techniques, and survival modeling to identify key predictors of stroke. Descriptive analysis highlights age, average glucose level, BMI, hypertension, and heart disease as the most significant risk factors, with stroke prevalence reaching 13.25% among hypertensive individuals and 17.03% among those with heart disease. Former and current smokers also demonstrate elevated stroke risk. Clustering using PCA and t-SNE reveals high-risk groups characterized by older age and high glucose levels. ML evaluation shows that XGBoost provides the best precision-recall balance, while Naïve Bayes achieves the highest recall (0.404), offering greater sensitivity to stroke detection. Feature importance analysis consistently ranks glucose, BMI, and age as dominant predictors, with XGBoost assigning high weight to cardiovascular conditions. Survival analysis using Kaplan-Meier and Cox regression models shows stroke risk increases sharply after age 60, with hypertension linked to a 31.9% higher risk. The results emphasize the value of early screening and targeted intervention, suggesting future improvements via class-balancing techniques and real-time clinical tools.

Funding

This study was conducted under Internal Seed Grant “ISG- Case 137”, awarded by the Graduate Studies and Research Office (GSR) at the Gulf University for Science and Technology (GUST).

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC