<p dir="ltr">In response to the critical challenges of fall detection in smart cities—limited real-time performance, constrained accuracy, and deployment difficulties at the edge—this study proposes an interpretable, lightweight fall detection and alert system based on the YOLOv11-SEFA architecture. By integrating P2 feature enhancement and SimAM attention into the YOLOv11n backbone, the model achieves significant accuracy improvement (F1-score: 83.99, mAP@50: 88.6%) while maintaining low computational cost (6.6 GFLOPs, 2.67 MB). The system utilizes a four-layer sensing-to-cloud pipeline and random forest classification based on six-dimensional image structure features to perform multi-level (Level 0–3) fall risk prediction. SHAP analysis reveals that aspect ratio, distance to camera, and crowd presence are the most influential features, offering strong interpretability for deployment in complex environments. The system demonstrates excellent performance in confusion matrix analysis, PR curves, and ROC-AUC learning curves, confirming its robustness, generalizability, and suitability for edge deployment. Practical tests show sub-270ms latency, low power and bandwidth demands, and seamless integration into weak-current infrastructures. Future enhancements may include NAS-based adaptive model scaling, temporal behavior modeling, and privacy-preserving multimodal sensing, extending the system to broader applications in elderly behavior monitoring and health-risk perception in resilient smart city habitats.</p>