REAL-colon dataset
The REAL (Real-world multi-center Endoscopy Annotated video Library) - colon dataset comprises 60 recordings of real-world colonoscopies. These recordings come from four different clinical studies (001 to 004), with each study contributing 15 videos. Compressed folders titled SSS-VVV_frames contain video frames, where SSS indicates the clinical study (001 to 004) and VVV represents the video name (001 to 015).
For each patient/video, several clinical variables have been collected, including the endoscope brand, bowel cleanliness score (BBPS), number of surgically removed colon lesions, and more. This data is stored in the lesion_info.csv file. Each removed lesion has been annotated with a bounding box in each video frame where it appeared, by trained image annotation specialists supervised by expert gastroenterologists. These annotations are available in 60 compressed folders titled SSS-VVV_annotations, each containing the video annotations for its respective video. Polyp information, including histology, size, and anatomical site, has been recorded in the lesion_info.csv file.
For full details on the dataset and to cite this work, please refer to:
Biffi, C., Antonelli, G., Bernhofer, S. et al. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 11, 539 (2024). Available at: https://doi.org/10.1038/s41597-024-03359-0
A GitHub repository containing python code to facilitate the process of downloading and exploring the dataset is available at https://github.com/cosmoimd/real-colon-dataset
Key stats:
- 60 recordings, 15 for each of the 4 centers
- 2757723 total frames
- 132 removed colorectal polyps
- 351264 bounding box annotations
The dataset is composed of the following files:
- 60 compressed folders named `{SSS}-{VVV}_frames` with the frames from each recording
- 60 compressed folders named `{SSS}-{VVV}_annotation` with the annotations from each recordings
- video_info.csv file, a file with the metadata for each video
- lesion_info.csv, a file with the metadata for each lesion
- dataset_description.md, a readme file with information about the dataset
History
Research Institution(s)
Cosmo Intelligent Medical DevicesContact email
acherubini@cosmoimd.comAssociated Preprint DOI
I confirm there is no human personally identifiable information in the files or description shared
- Yes
I confirm the files and description shared may be publicly distributed under the license selected
- Yes