figshare
Browse
Training set of Clinical meta-data- HPV challenge.csv (15.32 kB)

Training set of clinical meta-data for HPV Challenge

Download (30.58 kB)
Version 3 2016-12-22, 21:41
Version 2 2016-12-02, 18:53
Version 1 2016-12-02, 00:59
dataset
posted on 2016-12-22, 21:41 authored by Clifton D. FullerClifton D. Fuller, Abdallah MohamedAbdallah Mohamed, Hesham ElhalawaniHesham Elhalawani

This MD Anderson Cancer Center set of anonymized high-quality computed tomography (CT) scans with contrast represent a comparatively homogeneous, uniform cohort of 288 oropharynx cancer patients with detailed clinical history, consistent follow-up of > 2 years, known etiological/biological correlates (specifically, human papilloma virus status). Our major target is to assess/validate the radiomics workflow and predictive capacity of radiomics signatures from challenge participants.

We imported the CT scans from the patients’ electronic medical records, that were performed before the initiation of the radiation treatment course. All the patients were treated using the IMRT modality. Some patients were simultaneously prescribed chemotherapy. We intended that the CT films would be as much representative of the original simulation CT scans that were used for treatment planning, in which no contrast was injected according to our institutional policy.

Specifically, we posted around one-half of the CT scans from the dataset (136 patients), in DICOM-RT format, on the Kaggle in Class server system, as a “training set”. DICOM-RT files were fully anonymized, with expert physician segmenting primary tumor and lymph node as regions of interest, to eliminate segmentation-related uncertainty for challengers. 

The primary oropharyngeal tumor was segmented in red. Whereas, the metastatic cervical lymph nodes were segmented individually, rather than on the basis of the nodal level classification system. 

Both training and test sets include the following data for each DICOM-RT case:

  • age
  • gender
  • race
  • tumor side and subsite
  • T-category
  • N-category
  • AJCC stage
  • Pathologic grade
  • smoking status (in pack-years)

Challenge participants will also be able to download a “test" dataset, which includes the remaining randomly selected 152 patients' DICOM files and relevant clinical meta-data, with local control status blinded.Challenge participants will also be able to download a “test" dataset, which includes the remaining randomly selected half of the dataset, with local control status blinded.Challenge participants will also be able to download a “test" dataset, with the remaining random selected half of the dataset, which will have the HPV status blinded.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC